Skip to content

Cookbook

Not yet fully vetted

These recipes are illustrative and have not all been verified end-to-end against live catalogs — some may need adjusting (column names, catalog names, exact verb arguments) for your data. Treat them as starting points rather than guaranteed-runnable scripts, and check the linked guides for the authoritative API.

Runnable end-to-end recipes for the science workflows ACID was built for. Each entry leads with the science question, then gives a single self-contained snippet you can drop into a script or notebook.

Every recipe assumes import acid and import astropy.units as u. acid is module-level and singleton-by-default: the recipes call acid.open(...) / acid.sql.query(...) directly on the process-wide default connection — the first call lazy-initializes it, reading the catalog source path from your configuration (the ACID_PATH environment variable or ~/.config/acid). Catalog names used in the snippets follow the real ones (gaia_dr3, twomass_psc, ztf, panstarrs1, rubin_object) when realistic; the synthetic fixtures (a, b, c, lc) appear when a snippet is meant to run against the test suite. Substitute the catalogs you actually have registered.

For task-shaped explanations of the underlying verbs, see the user guide; the recipes below stitch those verbs together.

Gaia × Rubin counterparts in one science field

Find the Gaia DR3 source associated with every Rubin DP1 object in a science field, with the separation in arcseconds and the Gaia photometry attached. A textbook position-by-position counterpart match.

gaia_x_rubin_field.py
import acid
import astropy.units as u

gaia  = acid.open("gaia_dr3")
rubin = acid.open("rubin_object")

# 1° science field. The cone scopes both catalogs uniformly;
# the radius-vs-margin rule is enforced at analyze time.
with acid.in_cone((150.5, 2.2), radius=0.5 * u.deg):
    counterparts = (rubin
                    .crossmatch(gaia, radius=1 * u.arcsec, dist_col="d_arcsec")
                    .select("objectId, source_id, d_arcsec, "
                            "g_psfMag, phot_g_mean_mag, parallax")
                    .to_polars())

print(counterparts.shape)
print(counterparts["d_arcsec"].describe())

maxmatch=1 (the default) gives one Gaia source per Rubin object — the closest one within 1″. For "every Gaia source within 1″", pass maxmatch=-1.

If you're at full-sky scale, drop the with acid.in_cone(...): block; the query is the same. Develop with the cone, run without it (see Debug small, run big).

ZTF transient candidates → Pan-STARRS host photometry

For every ZTF alert candidate near a target field, find the Pan-STARRS host (the nearest PS1 source within 1.5″) and pull its multi-band photometry.

ztf_x_ps1_hosts.py
import acid
import astropy.units as u

ztf = acid.open("ztf_alerts")
ps1 = acid.open("panstarrs1")

candidates = (ztf
              .where("classtar > 0.5 AND magpsf < 19.5")
              .crossmatch(ps1, radius=1.5 * u.arcsec, dist_col="d_arcsec",
                          how="left")
              .select("candid, ra, dec, magpsf, "
                      "objID, gMeanPSFMag, rMeanPSFMag, iMeanPSFMag, "
                      "d_arcsec")
              .to_polars())

print(f"{len(candidates):,} candidates")
print(f"  with PS1 host: {candidates['objID'].is_not_null().sum():,}")
print(f"  hostless:      {candidates['objID'].is_null().sum():,}")

how="left" keeps every candidate (matched or not). The columns from PS1 are NULL for hostless candidates — useful for separating extragalactic transients (host present) from genuinely hostless ones.

If ZTF positions are at a different epoch than PS1, propagate before registering (see Crossmatch guide §1).

Sky-binned source counts (a HEALPix density map)

Produce a HEALPix histogram of Gaia source counts at order N, for making a sky-density figure.

gaia_density.py
import acid
from acid import agg

gaia = acid.open("gaia_dr3")

# Bin to Norder=8 (~13 arcmin pixels). _healpix_29 is the
# order-29 pixel ID present in every HATS catalog; right-
# shifting by 2 × (29 - 8) gives the order-8 pixel.
density = (gaia
           .where("phot_g_mean_mag BETWEEN 14 AND 18")
           .group_by("_healpix_29 / 4398046511104 AS hpix_8")  # 4398046511104 = 4^(29 - 8)
           .aggregate(n=agg.count())
           .to_polars())

print(density.head())

The _healpix_29 / 4^(29-8) divisor is integer arithmetic — push the division into the GROUP BY rather than computing a separate column, and Polars folds it cleanly. Result is one row per non-empty pixel.

This pattern generalises to any sky-binned aggregate: replace agg.count() with agg.mean("phot_g_mean_mag") for a mean-magnitude map, or agg.std("parallax") for a parallax-scatter map.

Light curves for a list of targets

Given a CSV of target coordinates, fetch the Rubin DP1 forced-source light curve for each — one row per target, the detections as time-ordered lists.

The target list goes straight in as a virtual catalog (no offline HATS import). The shape is crossmatch by position to anchor each target to a Rubin object, then a nested equi-join on the object ID so each target's detections fold into per-object lists.

lightcurves_for_targets.py
import acid
import astropy.units as u

# Your target list: a CSV with RA/DEC columns (degrees, ICRS).
targets   = acid.open("targets.csv", ra="RA", dec="DEC")
rubin_obj = acid.open("rubin_object")         # Rubin DP1 object table
rubin_fs  = acid.open("rubin_forcedsource")   # Rubin DP1 forced-source light curves

# Position match → Rubin object, then nested equi-join by objectId.
lc = (targets
      .crossmatch(rubin_obj, radius=1 * u.arcsec, dist_col="d_arcsec")
      .join(rubin_fs, on="objectId", how="left",
            nested=True, order_by="midpointMjdTai"))

lc.save("/scratch/target_lightcurves", name="target_lc")   # stays queryable
# one row per matched object; midpointMjdTai / band / psfFlux are lists.
# Or, for one flat file to hand to a notebook / collaborator:
lc.export("target_lightcurves.parquet")                    # leaves the system

save writes a queryable HATS catalog (streaming — any size); export gathers the full result in memory and writes one flat file (right for a modest target-list output, not a full-sky one). For an in-memory frame (pandas / Astropy) instead of a file, pass it to acid.open(frame, ra=..., dec=...) the same way. See bring your own target list and the light-curves task guide for the full treatment (nested joins, column pruning, per-object UDF reductions).

Anti-join: Rubin objects without a Gaia counterpart

Find every Rubin object inside a field that has no Gaia source within 1″ — a candidate list for "Rubin-only" detections (potential variables not in Gaia, faint sources, transients, artifacts).

rubin_without_gaia.py
import acid
import astropy.units as u

rubin = acid.open("rubin_object")
gaia  = acid.open("gaia_dr3")

with acid.in_cone((150.5, 2.2), radius=0.5 * u.deg):
    anti = (rubin
            .crossmatch(gaia, radius=1 * u.arcsec, how="left", dist_col="d")
            .where("d IS NULL")     # the anti-join key
            .select("objectId, ra, dec, g_psfMag, r_psfMag")
            .to_polars())

print(f"{len(anti):,} Rubin objects without a Gaia counterpart within 1\"")

how="left" + WHERE d IS NULL is the canonical anti-join pattern in ACID. The pattern works the same in the SQL surface: LEFT JOIN gaia ON XMATCH(radius_arcsec => 1.0, dist_col => 'd') WHERE d IS NULL.

Restrict to a survey footprint

Count Gaia sources inside DELVE's published footprint. Footprints ship as point_map.fits in most HATS catalogs; ACID picks them up automatically when you name a catalog in IN_MOC(...) / Catalog.in_region(<catalog_name>).

import acid

gaia  = acid.open("gaia_dr3")
delve = acid.open("delve_dr3")     # uses delve_dr3/point_map.fits

n = gaia.in_region(delve).execute()   # or .to_polars(), etc.
print(n)
import acid

r = acid.sql.query("""
    SELECT COUNT(*) AS n
    FROM   gaia_dr3 AS g
    WHERE  IN_MOC(g, 'delve_dr3')
""")
r.show()

The footprint restriction prunes whole partitions before any worker reads data; this is one of the fastest queries you can run over a billion-row catalog. See Sky regions & footprints.

Materialize an intermediate, then iterate

A heavy crossmatch you want to run once, then explore many times. Catalog.save(path, name=...) writes a HATS tree and registers it on the connection. Subsequent queries against the saved name skip the matcher entirely.

materialize_then_iterate.py
import acid
import astropy.units as u

# Heavy: full-sky Gaia × 2MASS, one-time cost.
bright = (acid.open("gaia_dr3")
            .where("phot_g_mean_mag < 16")
            .crossmatch(acid.open("twomass_psc"),
                        radius=1 * u.arcsec, dist_col="d_arcsec")
            .save("/scratch/bright_gaia_2mass",
                  name="bright_x_2mass"))

# Light: iterate against the cached catalog.
bright.where("d_arcsec < 0.5").head(20).show()
bright.where("j_m < 14").head(20).show()
acid.sql.query("""
    SELECT COUNT(*) AS n
    FROM   bright_x_2mass
    WHERE  d_arcsec < 0.3
""").show()

save returns a fresh Catalog pointing at the saved tree; the same name is also reachable through acid.open("bright_x_2mass") and from SQL. The output is a valid HATS tree that LSDB and hats.read_hats(...) read without modification.

Top-K closest matches

The 100 closest Gaia/2MASS pairs in a field. .sort(...) + .limit(K) pushes the LIMIT to each partition, so the work scales with K, not with the total match count.

top_k.py
import acid
import astropy.units as u

gaia    = acid.open("gaia_dr3")
twomass = acid.open("twomass_psc")

closest = (gaia.crossmatch(twomass, radius=5 * u.arcsec, dist_col="d")
                .sort("d")
                .limit(100)
                .to_polars())
print(closest)

Self-crossmatch (pairs within one catalog)

Find close stellar pairs in Gaia — sources within 5″ of each other. Open the same catalog twice with different aliases; the source_id < source_id_b clause keeps each pair exactly once.

self_crossmatch.py
import acid

r = acid.sql.query("""
    SELECT a.source_id AS id1,
           b.source_id AS id2,
           d_arcsec
    FROM   gaia_dr3 AS a
    JOIN   gaia_dr3 AS b
        ON XMATCH(radius_arcsec => 5.0, mode => 'all', dist_col => 'd_arcsec')
    WHERE  a.source_id < b.source_id
    ORDER BY d_arcsec
    LIMIT 100
""")
r.show()

Run a recipe from the CLI

Each Python recipe maps to a one-liner acid query:

# Pretty-print to stdout.
acid query "SELECT COUNT(*) FROM gaia_dr3 WHERE IN_MOC(gaia_dr3, 'delve_dr3')" \
    --db /data/hats

# Run a longer query from a file, write a HATS tree.
acid query -f /scripts/gaia_x_rubin.sql --db /data/hats \
    --output /scratch/gaia_x_rubin/ --workers 32

# Restrict to a debug cone right from the shell.
acid query -f /scripts/gaia_x_rubin.sql --db /data/hats \
    --cone 150.5,2.2,0.5

See the CLI reference for the full flag list.

See also