Sky regions & footprints¶
For row-by-row cuts (magnitude, color, flag — where("g_mag < 18")),
see filtering rows. This page covers the spatial
restriction surface: scoping a query to a sky region (a cone, a MOC,
a survey footprint, another catalog's footprint). Two routes, same
underlying machinery:
Catalog.in_region(...)— the fluent verb. Accepts a registered MOC name, a peerCatalog's footprint, a FITS path, amocpy.MOCobject, or a HATS directory with apoint_map.fits.IN_MOC(<alias>, '<name>')— the SQL predicate. Per-row boolean, restricted to a top-level conjunctiveWHEREposition.acid.in_cone(center, radius=...)— a context manager that scopes a circular cone to every query executed inside awithblock; both surfaces respect it.
ACID uses the HEALPix _healpix_29 column to push the region down to
parquet row-group statistics (the same fast path as
_healpix_29 row-group pushdown),
so a region restriction prunes whole row groups before any rows are
decoded.
What a MOC is¶
A MOC (Multi-Order Coverage map) is a region of sky encoded as a
set of HEALPix pixels at multiple Norder resolutions — coarse where
the region is solid, fine along boundaries. The IVOA specifies the
FITS encoding; the major surveys publish their footprints as MOC FITS
files (DES DR2, 2MASS, DELVE, …). See the
glossary for the formal definition.
For ACID's purposes, you do not need to know the encoding — just that a
MOC is a sky region you point acid at and acid restricts the query
to rows inside it.
Registering a MOC¶
Three ways to register one:
import acid
acid.init("catalogs.yaml")
# 1. From a FITS file on disk.
acid.register_moc("des_dr2", "/data/mocs/des_dr2.fits")
# 2. From a mocpy.MOC object you already have in memory.
from mocpy import MOC
m = MOC.from_string("4/0-100", "ascii")
acid.register_moc("custom", m)
# 3. From a YAML config (set up at init time, not at runtime).
The YAML form lives in your registry file:
catalogs:
gaia_dr3: { path: /data/gaia_dr3 }
twomass: { path: /data/twomass_psc }
mocs:
des_dr2: /data/mocs/des_dr2.fits
known_artifacts: /data/mocs/artifacts.fits
Catalogs that ship with a point_map.fits (the HATS standard
footprint file) also expose that footprint by their catalog name.
If you don't register "twomass" as a MOC but a registered catalog
named twomass has a point_map.fits, IN_MOC(a, 'twomass') and
a.in_region("twomass") both lazy-load it — useful for cross-survey
footprint intersection without a separate MOC file.
When both an explicit MOC and an auto-loaded catalog footprint share a name, the explicit registration wins.
Filtering by a registered MOC¶
Fluent — in_region(...)¶
Catalog.in_region(region) is the fluent verb. The region argument
accepts five things:
gaia.in_region("des_dr2") # registered MOC name
gaia.in_region(twomass) # peer Catalog -> its footprint
gaia.in_region("/data/mocs/des_dr2.fits") # FITS path
gaia.in_region(some_mocpy_moc) # mocpy.MOC object
gaia.in_region("/data/twomass_hats") # HATS dir -> point_map.fits
A registered catalog name (the first form) takes precedence over a
path on disk — so gaia.in_region("twomass") resolves through the
registry the same way the SQL form does, and the compiled SQL uses the
same MOC name.
For the catalog-handle form (in_region(twomass)), the catalog must
come from the same Connection; cross-connection handles are rejected
with ValueError. The catalog must have a point_map.fits at its
root, otherwise it is a RegistryError (with a hint to pass a FITS
path or a mocpy.MOC instead).
SQL — IN_MOC(<alias>, '<name>')¶
The SQL predicate is per-row boolean. The first argument is the table
alias whose (ra, dec) is being tested; the second is the registered
MOC name (or auto-resolvable catalog name):
SELECT a.source_id, a.ra, a.dec
FROM gaia_dr3 AS a
WHERE IN_MOC(a, 'des_dr2')
AND NOT IN_MOC(a, 'known_artifacts')
Multiple IN_MOC predicates AND-combined in a WHERE clause are
folded into one effective MOC by the analyzer — the predicate fires
once per row, with the intersection and exclusions baked in. That is
why the AND-chain above is cheap even with two MOCs.
IN_MOC is also accepted in SELECT as a per-row boolean (and in
CASE and ORDER BY as long as the same MOC is also AND-ed into the
top-level WHERE — the analyzer rejects naked positions outside the
WHERE, see Where IN_MOC is rejected
below):
SELECT source_id,
IN_MOC(a, 'des_dr2') AS in_des
FROM gaia_dr3 AS a
WHERE IN_MOC(a, 'des_dr2') -- required for the SELECT use to fold
Cone restriction — in_cone(...)¶
A cone (one center + a radius) is the most common region in
interactive analysis: "let me iterate on a 1° patch around (180°, 0°)
before I run the whole sky". ACID exposes it as a context manager
that applies to every query inside the with block — fluent or SQL —
and lifts when the block ends:
import acid
import astropy.units as u
acid.init("catalogs.yaml", workers=8)
gaia = acid.open("gaia_dr3")
bright = gaia.where("phot_g_mean_mag < 16")
with acid.in_cone((180.0, 0.0), radius=1 * u.deg):
small = bright.to_polars()
r = acid.sql.query("SELECT COUNT(*) FROM gaia_dr3")
# both are restricted to the cone
# back outside the block — full sky, same query object:
big = bright.to_polars()
The cone is execution-time: it applies to whichever queries run
inside the with block, not to the Catalog you built. The bright
handle above is reused both scoped and full-sky.
center is (ra_deg, dec_deg) (ICRS) or an astropy.SkyCoord;
radius is either a plain number (degrees) or an
astropy.units.Quantity.
The cone is exact — distances are computed with a haversine
expression — and is pushed down to parquet row groups via the cone's
MOC representation (the same path as IN_MOC).
Cones do not nest
Only one cone is active at a time. Entering a second in_cone
block while one is on the stack is rejected with ValidationError.
The "true intersection of two non-concentric cones" is a lens, not
a cone, and ACID refuses to silently approximate. Compose the two
regions into a single cone before entering, or use in_region(...)
with a MOC that represents the intersection.
Where IN_MOC is rejected¶
IN_MOC is a footprint restriction. Inside an AND-ed top-level
WHERE (optionally NOT-negated), the analyzer can fold it into the
effective MOC, push it down to row-group statistics, and prune work
tuples at enumeration time. Outside that position, none of those fast
paths apply — and a silent perf cliff is worse than a hard rejection.
The analyzer rejects IN_MOC in the following positions with
ValidationError:
- Inside a disjunction (
IN_MOC(...) OR ...). - In a
JOIN ONclause. - In a
SELECTprojection without a matchingWHEREterm. - In
ORDER BY,HAVING, or aCASEbranch on its own.
The fix in every case is the same: move the region restriction to a
top-level conjunctive WHERE. If your region truly is an OR-of-MOCs,
union them in mocpy first and register the result as one MOC.
Composing region filters¶
You can combine in_region / IN_MOC with row filters and with each
other. The pieces stack — ACID combines them into one effective region
before pushing down:
When to use which surface¶
- A cone for interactive iteration →
acid.in_cone(...). It applies to every query executed inside the block, including SQL. - A fixed survey footprint → register the MOC once via
register_moc(...)(or the YAMLmocs:block) and usein_region("name")orIN_MOC(a, 'name'). - A catalog's published footprint → either register it explicitly
or rely on the auto-load:
in_region("twomass")resolves totwomass'spoint_map.fitsif no explicittwomassMOC is registered.
See also¶
- Filtering rows — non-spatial cuts (magnitude, color, flags). Region filters compose with row filters.
- Crossmatching catalogs — restricting one or both sides of a crossmatch to a region.
- SQL escape hatch — the inline-subquery / CTE pre-filter form, useful for narrowing a side that already has a region restriction.
- Concepts: MOC footprints — what a MOC is and why it's the right encoding for sky regions.