Skip to content

Light curves for a list of targets

You have a list of targets and a light-curve catalog (Rubin's objectForcedSource / object, ZTF's source catalog, the fixture's lc) and you want every detection of each target, ideally as a per-target list you can plot or fit. There are two ways to associate the targets with the detections:

  1. ID-based.join(..., on=...) when both catalogs already carry the same integer object ID (Rubin's objectobjectForcedSource, both keyed on objectId). This is the headline shape, and the one you want whenever the IDs exist.
  2. Position-based.crossmatch(...) by sky position when the targets and the light-curve catalog do not share an object-ID column (e.g. your own target list against a survey's source table).

For either, nested=True (on join or crossmatch) gives you the shape you actually want: one row per target, each detection column a time-ordered list — instead of one row per detection.

Your target list does not need to be a HATS catalog

If your targets are a CSV, parquet, FITS table, or an in-memory pandas / Astropy table, open them directly with acid.open("targets.csv", ra="RA", dec="DEC") — see bring your own target list. No offline HATS import.

ID-based — both catalogs share an object ID

Rubin's object and objectForcedSource catalogs both carry objectId. When that holds you don't need a positional match — an ordinary integer-ID join is correct and cheaper (no kd-tree). With nested=True you get one row per object with the source columns folded into per-object lists:

object_lightcurves.py
import acid

acid.init("catalogs.yaml", workers=8)

light = (acid.open("object")
         .where("g_mag < 22")
         .join(acid.open("objectForcedSource"),
               on="objectId",
               how="left",
               nested=True,
               order_by="midpointMjdTai"))   # sort each list by time

tbl = light.to_astropy()
# one row per object; the source columns are per-object lists.
import acid

acid.init("catalogs.yaml", workers=8)

# SQL produces the exploded (one-row-per-detection) shape; the
# per-object list fold is a fluent-only feature.
r = acid.sql.query("""
    SELECT obj.objectId, obj.ra, obj.dec, fs.midpointMjdTai, fs.psfFlux
    FROM   object AS obj
    JOIN   objectForcedSource AS fs ON obj.objectId = fs.objectId
    WHERE  obj.g_mag < 22
""")
df = r.to_polars()

What you get back from the fluent form: one row per object, with the objectForcedSource columns as list<T> columns, each sorted by midpointMjdTai so element i of every list is the same detection. Objects with no detections (only on a how="left" join) get empty lists ([]), not a phantom [null].

on takes a single column name (used on both sides) or a (left, right) tuple when the columns are named differently. Both keys must be integer-ID columns; for non-integer or compound keys, drop into acid.sql.query(...).

The locality contract for an equi-join

An equi-join asserts the right catalog is localized: rows sharing a key (a source and its parent object) sit at the same HEALPix position to within the right catalog's margin-cache radius (radius 0 = exact pixel — the layout Rubin's object / objectForcedSource tables use). The right catalog must carry _healpix_29 and a declared margin cache, or acid rejects the join with a ValidationError. This is the normal layout for a survey's object/source tables; a position-less lookup table (no coordinates) is a different tool — see attaching a lookup table.

Select which columns get listed

A trailing .select(...) on the right operand narrows which columns are folded into lists — and, via projection pushdown, which are read from parquet at all:

light = (acid.open("object")
         .join(acid.open("objectForcedSource").select("objectId, midpointMjdTai, psfFlux"),
               on="objectId", how="left",
               nested=True, order_by="midpointMjdTai"))
# lists only midpointMjdTai + psfFlux; the rest is never read.

The operand .select(...) must keep the join key (objectId). Dropping it is a clear compile-time error.

Position-based — crossmatch your targets to a source catalog

Use this when your target catalog (Gaia, your own published source list) does not share an object ID with the source catalog. The crossmatch establishes the association by sky position; nested=True folds the detections into per-target lists:

targets_to_lightcurves.py
import acid
import astropy.units as u

acid.init("catalogs.yaml", workers=8)

targets = acid.open("a")           # your target catalog (or a virtual one)
src     = acid.open("ztf_source")  # a source/detection catalog

light = (targets
         .crossmatch(src, radius=0.5 * u.arcsec,
                     maxmatch=-1,           # every detection within the radius
                     how="left",            # keep targets with no detections
                     nested=True, order_by="mjd")
         .to_astropy())

maxmatch=-1 collects every source within the radius (not just the nearest), which is what you want for a light curve; nested=True then groups them per target. If you only want the count per target, the reduction shortcut stays fully in acid:

n_per_target = (targets
                .crossmatch(src, radius=0.5 * u.arcsec, maxmatch=-1)
                .group_by("id")
                .count())                  # one row per target, `count` column

Radius and the margin cache

A light-curve crossmatch is where the radius vs. margin cache rule bites hardest: source catalogs are large, and you want the radius to bracket the per-detection astrometric scatter, not to be limited by how the cache was built. If your match radius exceeds the right catalog's margin-cache radius, acid rejects the query at compile time — rebuild the cache or shrink the radius. See radius vs. margin cache for the exact error and the fix.

The other crossmatch caveat — acid treats all RA/Dec as J2000, with no epoch propagation — matters for high-PM targets. A Gaia J2016 source matched against a J2000 source catalog may have moved a non-trivial fraction of a 0.5″ radius. Propagate to J2000 before matching; see epoch — all RA/Dec are J2000.

Combining: targets → crossmatch → nested join

The common production shape is "match my targets to a survey's object table by position, then attach light curves by ID." Crossmatch first, then a nested equi-join folds the detections per matched object:

targets_to_object_to_lc.py
import acid
import astropy.units as u

acid.init("catalogs.yaml", workers=8)

light = (acid.open("a")                                    # your targets
         .crossmatch(acid.open("object"),
                     radius=0.5 * u.arcsec, dist_col="d_arcsec")
         .join(acid.open("objectForcedSource"),
               on="objectId", how="left",
               nested=True, order_by="midpointMjdTai")
         .to_astropy())

The crossmatch RHS may itself be a join — see composable joins — but the crossmatch-then-join chain above is the readable, common form.

Reduce a light curve to a scalar with a Python function

A nested join's list columns are ordinary columns. Feed them to with_columns to compute a per-object statistic (epoch count, mean flux, a period fit) without leaving acid:

n_epochs.py
import acid
import numpy as np

acid.init("catalogs.yaml", workers=8)

def n_epochs(midpointMjdTai):      # numpy mode: an object-array of per-object lists
    return np.array([len(x) for x in midpointMjdTai], dtype="i8")

summary = (acid.open("object")
           .join(acid.open("objectForcedSource"),
                 on="objectId", how="left",
                 nested=True, order_by="midpointMjdTai")
           .with_columns("n_epochs", n_epochs,
                         columns=["midpointMjdTai"], schema="i8")
           .select("objectId, n_epochs")
           .to_astropy())

See Python functions on partitions for the full with_columns / @acid.function surface, including stateful UDFs (e.g. a template library loaded once per worker for SED fitting).

What ACID won't do for you

  • Non-decomposable per-target statistics (median, mode, arbitrary percentiles) — rejected with ValidationError. Compute them on the per-object lists with a with_columns UDF, or in Polars / Astropy after .to_polars() / .to_astropy(). See Why no agg.median?.
  • Period-folding, light-curve fitting, variability metrics — out of scope as built-in verbs, but a with_columns UDF runs your own NumPy / SciPy / Astropy code per partition (above).

See also