Light curves for a list of targets¶
You have a list of targets and a light-curve catalog (Rubin's
objectForcedSource / object, ZTF's source catalog, the fixture's
lc) and you want every detection of each target, ideally as a
per-target list you can plot or fit. There are two ways to associate
the targets with the detections:
- ID-based —
.join(..., on=...)when both catalogs already carry the same integer object ID (Rubin'sobject↔objectForcedSource, both keyed onobjectId). This is the headline shape, and the one you want whenever the IDs exist. - Position-based —
.crossmatch(...)by sky position when the targets and the light-curve catalog do not share an object-ID column (e.g. your own target list against a survey's source table).
For either, nested=True (on join or crossmatch) gives you the
shape you actually want: one row per target, each detection column a
time-ordered list — instead of one row per detection.
Your target list does not need to be a HATS catalog
If your targets are a CSV, parquet, FITS table, or an in-memory
pandas / Astropy table, open them directly with
acid.open("targets.csv", ra="RA", dec="DEC") — see
bring your own target list.
No offline HATS import.
ID-based — both catalogs share an object ID¶
Rubin's object and objectForcedSource catalogs both carry
objectId. When that holds you don't need a positional match — an
ordinary integer-ID join is correct and cheaper (no kd-tree). With
nested=True you get one row per object with the source columns folded
into per-object lists:
import acid
acid.init("catalogs.yaml", workers=8)
light = (acid.open("object")
.where("g_mag < 22")
.join(acid.open("objectForcedSource"),
on="objectId",
how="left",
nested=True,
order_by="midpointMjdTai")) # sort each list by time
tbl = light.to_astropy()
# one row per object; the source columns are per-object lists.
import acid
acid.init("catalogs.yaml", workers=8)
# SQL produces the exploded (one-row-per-detection) shape; the
# per-object list fold is a fluent-only feature.
r = acid.sql.query("""
SELECT obj.objectId, obj.ra, obj.dec, fs.midpointMjdTai, fs.psfFlux
FROM object AS obj
JOIN objectForcedSource AS fs ON obj.objectId = fs.objectId
WHERE obj.g_mag < 22
""")
df = r.to_polars()
What you get back from the fluent form: one row per object, with the
objectForcedSource columns as list<T> columns, each sorted by
midpointMjdTai so element i of every list is the same detection.
Objects with no detections (only on a how="left" join) get empty
lists ([]), not a phantom [null].
on takes a single column name (used on both sides) or a (left,
right) tuple when the columns are named differently. Both keys must be
integer-ID columns; for non-integer or compound keys, drop into
acid.sql.query(...).
The locality contract for an equi-join
An equi-join asserts the right catalog is localized: rows
sharing a key (a source and its parent object) sit at the same
HEALPix position to within the right catalog's margin-cache radius
(radius 0 = exact pixel — the layout Rubin's object /
objectForcedSource tables use). The right catalog must carry
_healpix_29 and a declared margin cache, or acid rejects the
join with a ValidationError. This is the normal layout for a
survey's object/source tables; a position-less lookup table (no
coordinates) is a different tool — see
attaching a lookup table.
Select which columns get listed¶
A trailing .select(...) on the right operand narrows which columns
are folded into lists — and, via projection pushdown, which are read
from parquet at all:
light = (acid.open("object")
.join(acid.open("objectForcedSource").select("objectId, midpointMjdTai, psfFlux"),
on="objectId", how="left",
nested=True, order_by="midpointMjdTai"))
# lists only midpointMjdTai + psfFlux; the rest is never read.
The operand .select(...) must keep the join key (objectId).
Dropping it is a clear compile-time error.
Position-based — crossmatch your targets to a source catalog¶
Use this when your target catalog (Gaia, your own published source
list) does not share an object ID with the source catalog. The
crossmatch establishes the association by sky position; nested=True
folds the detections into per-target lists:
import acid
import astropy.units as u
acid.init("catalogs.yaml", workers=8)
targets = acid.open("a") # your target catalog (or a virtual one)
src = acid.open("ztf_source") # a source/detection catalog
light = (targets
.crossmatch(src, radius=0.5 * u.arcsec,
maxmatch=-1, # every detection within the radius
how="left", # keep targets with no detections
nested=True, order_by="mjd")
.to_astropy())
maxmatch=-1 collects every source within the radius (not just the
nearest), which is what you want for a light curve; nested=True then
groups them per target. If you only want the count per target, the
reduction shortcut stays fully in acid:
n_per_target = (targets
.crossmatch(src, radius=0.5 * u.arcsec, maxmatch=-1)
.group_by("id")
.count()) # one row per target, `count` column
Radius and the margin cache¶
A light-curve crossmatch is where the radius vs. margin cache rule
bites hardest: source catalogs are large, and you want the radius to
bracket the per-detection astrometric scatter, not to be limited by how
the cache was built. If your match radius exceeds the right catalog's
margin-cache radius, acid rejects the query at compile time — rebuild
the cache or shrink the radius. See
radius vs. margin cache
for the exact error and the fix.
The other crossmatch caveat — acid treats all RA/Dec as J2000, with
no epoch propagation — matters for high-PM targets. A Gaia J2016
source matched against a J2000 source catalog may have moved a
non-trivial fraction of a 0.5″ radius. Propagate to J2000 before
matching; see
epoch — all RA/Dec are J2000.
Combining: targets → crossmatch → nested join¶
The common production shape is "match my targets to a survey's object table by position, then attach light curves by ID." Crossmatch first, then a nested equi-join folds the detections per matched object:
import acid
import astropy.units as u
acid.init("catalogs.yaml", workers=8)
light = (acid.open("a") # your targets
.crossmatch(acid.open("object"),
radius=0.5 * u.arcsec, dist_col="d_arcsec")
.join(acid.open("objectForcedSource"),
on="objectId", how="left",
nested=True, order_by="midpointMjdTai")
.to_astropy())
The crossmatch RHS may itself be a join — see composable joins — but the crossmatch-then-join chain above is the readable, common form.
Reduce a light curve to a scalar with a Python function¶
A nested join's list columns are ordinary columns. Feed them to
with_columns to compute a per-object statistic (epoch count, mean
flux, a period fit) without leaving acid:
import acid
import numpy as np
acid.init("catalogs.yaml", workers=8)
def n_epochs(midpointMjdTai): # numpy mode: an object-array of per-object lists
return np.array([len(x) for x in midpointMjdTai], dtype="i8")
summary = (acid.open("object")
.join(acid.open("objectForcedSource"),
on="objectId", how="left",
nested=True, order_by="midpointMjdTai")
.with_columns("n_epochs", n_epochs,
columns=["midpointMjdTai"], schema="i8")
.select("objectId, n_epochs")
.to_astropy())
See Python functions on partitions for the full
with_columns / @acid.function surface, including stateful UDFs
(e.g. a template library loaded once per worker for SED fitting).
What ACID won't do for you¶
- Non-decomposable per-target statistics (median, mode, arbitrary
percentiles) — rejected with
ValidationError. Compute them on the per-object lists with awith_columnsUDF, or in Polars / Astropy after.to_polars()/.to_astropy(). See Why noagg.median?. - Period-folding, light-curve fitting, variability metrics — out of
scope as built-in verbs, but a
with_columnsUDF runs your own NumPy / SciPy / Astropy code per partition (above).
See also¶
- Crossmatching catalogs — the position-based step, with the proper-motion / radius-vs-margin caveats and nested crossmatch.
- Aggregating —
collect_listsandagg.listfor the single-catalog list fold. - Python functions on partitions — reducing a light-curve list to a scalar with your own code.
- Working with results & exporting — what comes out and how to write it.