Glossary¶
Short anchor definitions of the astronomy terms and ACID-specific vocabulary you'll meet across the docs. Each entry links back to the page that uses it in context.
HATS¶
The Hierarchical Adaptive Tile Storage format used by LINCC Frameworks (LSDB, hats-import) for sky-partitioned parquet catalogs. Each partition is one HEALPix pixel at a given Norder. See https://hats.readthedocs.io.
HEALPix¶
A scheme for tiling the sphere into equal-area pixels. Two parameters: Norder (how fine — 0 is coarsest, 29 is the finest order used by HATS) and Npix (the pixel ID within that order).
Norder / Npix¶
A HATS partition is uniquely identified by (Norder, Npix).
Norder=5 gives 12,288 pixels over the whole sky; each step in
Norder splits every pixel into 4.
Partition¶
A single parquet file (or directory of parquet files) for one
(Norder, Npix) of a HATS catalog.
Margin cache¶
A companion catalog that holds a thin border of rows around each partition (typically a few arcsec wide) so a crossmatch on the anchor side never silently misses a match that sits just across a partition boundary.
MOC¶
A Multi-Order Coverage map: a region of sky encoded as a set of
HEALPix pixels at possibly mixed orders. Used by surveys to
describe footprints (DES, 2MASS, …). acid's IN_MOC(...)
predicate filters rows against a MOC.
Anchor catalog¶
The first table in a query's FROM clause. acid partitions its
work along the anchor's partitions; coordinates from the anchor
drive every XMATCH in the query.
Right catalog¶
A catalog joined to the anchor via XMATCH (or an ordinary equi-join).
XMATCH¶
acid's spherical-distance join operator. Used in ON clauses:
JOIN b ON XMATCH(radius_arcsec => 1.0, mode => 'nearest' | 'all').
dist_col¶
An option to XMATCH(...): passing dist_col => '<name>' surfaces
the great-circle distance (in arcseconds) between the anchor and the
matched right-side row as a named column, which you can then reference
like any ordinary column in SELECT, WHERE, or ORDER BY.
IN_MOC¶
A predicate that returns true when a row's sky position lies inside a named MOC footprint.
Refinement tree¶
The mechanism acid uses to align adaptive-Norder catalogs (where
different sky regions are partitioned at different Norders). You
won't see it in the API; it just makes adaptive queries correct.
_healpix_29¶
A column present in HATS catalogs holding the order-29 pixel ID of
each row. acid uses range filters on this column for fast
spatial pruning.
point_map.fits¶
A standard HATS file at the catalog root holding per-cell row
counts (a HEALPix image). acid requires one for every HATS
catalog in a query — it's how the planner sizes work tuples to your RAM
budget — and also loads it as the footprint MOC for IN_MOC(<alias>,
'<catalog_name>'). Every acid output writes one; a missing or
0/1-mask map is a ValidationError. See RAM budget.
Virtual catalog¶
A raw data file (.csv/.parquet/.fits/…) or an in-memory frame
(pandas / polars / NumPy / pyarrow / Astropy) opened directly with
acid.open(src, ra=…, dec=…) — spilled once to a memory-mapped Arrow
file and treated like a HATS catalog for the session. The
bring-your-own-target-list on-ramp; no offline HATS import. See
bring your own target list.
Nested catalog / nested join¶
The one-row-per-object shape produced by crossmatch(..., nested=True)
or join(..., nested=True): each match/partner is folded into a
per-row list<T> column instead of exploding into one row per pair —
the canonical light-curve layout. order_by= sets the within-list
order; LEFT-unmatched rows get empty lists. See
Light curves.
Broadcast join¶
Catalog.join(<frame>, on=…) against a small, position-less in-memory
lookup table: the frame is read whole into every worker and hash-joined
locally on an integer key — no coordinates, no reshuffle. The
id→label-attachment path. See
attaching a lookup table.
ram_budget¶
The total RAM the planner budgets for a query; work tuples are sized so
each fits ram_budget / workers. Default 0.25 × available RAM
(cgroup-aware). Set via acid.init(ram_budget=…), --ram-budget, or
ACID_RAM_BUDGET (bytes or 64GB/512MiB). The primary phase-1 OOM
lever. See RAM budget.
Localized (equi-join / group_by)¶
A spatial assertion that rows sharing a key (a source and its parent
object, or two list elements grouped together) sit within one HEALPix
neighborhood — to within the catalog's margin-cache radius. An
equi-join requires its right side to be localized; group_by(...,
localized=True) opts a list fold into the partition-local (no-reduce)
fast path. Correct only when the assertion holds.
Row-group pushdown¶
A parquet feature: when a query filters on a column, the parquet
reader can skip whole row-groups whose min/max statistics rule
them out. acid leans on this heavily for _healpix_29 and
column-projection pruning.
ICRS¶
The International Celestial Reference System — the modern astronomy
reference frame. ACID treats every catalog's stored RA/Dec as ICRS
and assumes no epoch propagation (see J2000 below). All
--cone / in_cone coordinates are ICRS degrees. See
Crossmatching catalogs §1.
J2000¶
The astronomical epoch ACID assumes for every input catalog's stored RA/Dec — no propagation, no proper motion, no parallax. Catalogs at other epochs (Gaia is published at J2016.0) need to be propagated before registering. See Crossmatching catalogs §1.
Decomposable aggregate¶
An aggregate that can be computed as a combine over per-partition
partials: COUNT, SUM, AVG, MIN, MAX, STDDEV, VARIANCE,
BOOL_AND, BOOL_OR. ACID supports all decomposable aggregates;
non-decomposable ones (MEDIAN, MODE, COUNT(DISTINCT), …) are
rejected at analyze time. See Aggregating.
Partial aggregation¶
ACID's two-phase aggregate strategy: phase 1 computes per-partition
partials (e.g. AVG as (sum, count)), phase 2 combines them.
Avoids writing every row to disk for SELECT COUNT(*). See
Aggregating.
Cursor pixel¶
The HEALPix pixel ACID is currently processing as one unit of work,
typically the anchor catalog's partition pixel. The right catalog
is "refined to the cursor" when its partitions are at a different
order. Used in ARCHITECTURE.md; you rarely see it at the user
boundary.
Work-tuple¶
The unit ACID's process pool schedules: a (anchor partition, right
partition(s)) pairing the engine resolves and executes as one
partition of work. Visible in the ACID_PROFILE_OUT JSON.
cpu_cap()¶
ACID's cgroup-aware core ceiling — min(sched_getaffinity, cgroup
CPU quota) — the single source of truth for "how many cores can
this process actually use". Used by workers="auto" and the
per-worker thread budget instead of the host's raw os.cpu_count().
See Performance & parallelism.
inmem_row_limit¶
The row threshold past which ACID spills phase-1 partials to disk
instead of holding them in RAM. Default 50 M rows; settable via
acid.init(...). See
Performance — Memory & spill.
Composition verb / materialization verb¶
ACID's Catalog methods split in two: composition verbs
(where, select, crossmatch, …) build a query lazily, returning
a new Catalog; materialization verbs (head, execute,
to_pandas, save, …) run it. See
Debug small, run big.