Skip to content

Glossary

Short anchor definitions of the astronomy terms and ACID-specific vocabulary you'll meet across the docs. Each entry links back to the page that uses it in context.

HATS

The Hierarchical Adaptive Tile Storage format used by LINCC Frameworks (LSDB, hats-import) for sky-partitioned parquet catalogs. Each partition is one HEALPix pixel at a given Norder. See https://hats.readthedocs.io.

HEALPix

A scheme for tiling the sphere into equal-area pixels. Two parameters: Norder (how fine — 0 is coarsest, 29 is the finest order used by HATS) and Npix (the pixel ID within that order).

Norder / Npix

A HATS partition is uniquely identified by (Norder, Npix). Norder=5 gives 12,288 pixels over the whole sky; each step in Norder splits every pixel into 4.

Partition

A single parquet file (or directory of parquet files) for one (Norder, Npix) of a HATS catalog.

Margin cache

A companion catalog that holds a thin border of rows around each partition (typically a few arcsec wide) so a crossmatch on the anchor side never silently misses a match that sits just across a partition boundary.

MOC

A Multi-Order Coverage map: a region of sky encoded as a set of HEALPix pixels at possibly mixed orders. Used by surveys to describe footprints (DES, 2MASS, …). acid's IN_MOC(...) predicate filters rows against a MOC.

Anchor catalog

The first table in a query's FROM clause. acid partitions its work along the anchor's partitions; coordinates from the anchor drive every XMATCH in the query.

Right catalog

A catalog joined to the anchor via XMATCH (or an ordinary equi-join).

XMATCH

acid's spherical-distance join operator. Used in ON clauses: JOIN b ON XMATCH(radius_arcsec => 1.0, mode => 'nearest' | 'all').

dist_col

An option to XMATCH(...): passing dist_col => '<name>' surfaces the great-circle distance (in arcseconds) between the anchor and the matched right-side row as a named column, which you can then reference like any ordinary column in SELECT, WHERE, or ORDER BY.

IN_MOC

A predicate that returns true when a row's sky position lies inside a named MOC footprint.

Refinement tree

The mechanism acid uses to align adaptive-Norder catalogs (where different sky regions are partitioned at different Norders). You won't see it in the API; it just makes adaptive queries correct.

_healpix_29

A column present in HATS catalogs holding the order-29 pixel ID of each row. acid uses range filters on this column for fast spatial pruning.

point_map.fits

A standard HATS file at the catalog root holding per-cell row counts (a HEALPix image). acid requires one for every HATS catalog in a query — it's how the planner sizes work tuples to your RAM budget — and also loads it as the footprint MOC for IN_MOC(<alias>, '<catalog_name>'). Every acid output writes one; a missing or 0/1-mask map is a ValidationError. See RAM budget.

Virtual catalog

A raw data file (.csv/.parquet/.fits/…) or an in-memory frame (pandas / polars / NumPy / pyarrow / Astropy) opened directly with acid.open(src, ra=…, dec=…) — spilled once to a memory-mapped Arrow file and treated like a HATS catalog for the session. The bring-your-own-target-list on-ramp; no offline HATS import. See bring your own target list.

Nested catalog / nested join

The one-row-per-object shape produced by crossmatch(..., nested=True) or join(..., nested=True): each match/partner is folded into a per-row list<T> column instead of exploding into one row per pair — the canonical light-curve layout. order_by= sets the within-list order; LEFT-unmatched rows get empty lists. See Light curves.

Broadcast join

Catalog.join(<frame>, on=…) against a small, position-less in-memory lookup table: the frame is read whole into every worker and hash-joined locally on an integer key — no coordinates, no reshuffle. The id→label-attachment path. See attaching a lookup table.

ram_budget

The total RAM the planner budgets for a query; work tuples are sized so each fits ram_budget / workers. Default 0.25 × available RAM (cgroup-aware). Set via acid.init(ram_budget=…), --ram-budget, or ACID_RAM_BUDGET (bytes or 64GB/512MiB). The primary phase-1 OOM lever. See RAM budget.

Localized (equi-join / group_by)

A spatial assertion that rows sharing a key (a source and its parent object, or two list elements grouped together) sit within one HEALPix neighborhood — to within the catalog's margin-cache radius. An equi-join requires its right side to be localized; group_by(..., localized=True) opts a list fold into the partition-local (no-reduce) fast path. Correct only when the assertion holds.

Row-group pushdown

A parquet feature: when a query filters on a column, the parquet reader can skip whole row-groups whose min/max statistics rule them out. acid leans on this heavily for _healpix_29 and column-projection pruning.

ICRS

The International Celestial Reference System — the modern astronomy reference frame. ACID treats every catalog's stored RA/Dec as ICRS and assumes no epoch propagation (see J2000 below). All --cone / in_cone coordinates are ICRS degrees. See Crossmatching catalogs §1.

J2000

The astronomical epoch ACID assumes for every input catalog's stored RA/Dec — no propagation, no proper motion, no parallax. Catalogs at other epochs (Gaia is published at J2016.0) need to be propagated before registering. See Crossmatching catalogs §1.

Decomposable aggregate

An aggregate that can be computed as a combine over per-partition partials: COUNT, SUM, AVG, MIN, MAX, STDDEV, VARIANCE, BOOL_AND, BOOL_OR. ACID supports all decomposable aggregates; non-decomposable ones (MEDIAN, MODE, COUNT(DISTINCT), …) are rejected at analyze time. See Aggregating.

Partial aggregation

ACID's two-phase aggregate strategy: phase 1 computes per-partition partials (e.g. AVG as (sum, count)), phase 2 combines them. Avoids writing every row to disk for SELECT COUNT(*). See Aggregating.

Cursor pixel

The HEALPix pixel ACID is currently processing as one unit of work, typically the anchor catalog's partition pixel. The right catalog is "refined to the cursor" when its partitions are at a different order. Used in ARCHITECTURE.md; you rarely see it at the user boundary.

Work-tuple

The unit ACID's process pool schedules: a (anchor partition, right partition(s)) pairing the engine resolves and executes as one partition of work. Visible in the ACID_PROFILE_OUT JSON.

cpu_cap()

ACID's cgroup-aware core ceiling — min(sched_getaffinity, cgroup CPU quota) — the single source of truth for "how many cores can this process actually use". Used by workers="auto" and the per-worker thread budget instead of the host's raw os.cpu_count(). See Performance & parallelism.

inmem_row_limit

The row threshold past which ACID spills phase-1 partials to disk instead of holding them in RAM. Default 50 M rows; settable via acid.init(...). See Performance — Memory & spill.

Composition verb / materialization verb

ACID's Catalog methods split in two: composition verbs (where, select, crossmatch, …) build a query lazily, returning a new Catalog; materialization verbs (head, execute, to_pandas, save, …) run it. See Debug small, run big.