Astronomical Catalog Inference Driver¶
Cross-match and query sky-survey catalogs in Python or SQL — laptop to cluster, billions of rows.
ACID is a tool for astronomers who work with large sky-survey catalogs — Vera C. Rubin Observatory, Gaia, DES, 2MASS, DELVE, SkyMapper, ZTF, and the next generation. It lets you:
- Analyze catalogs of any size — from a small cone search on your laptop to the full Rubin night on a cluster, with the same query.
- Cross-match positions across surveys and join light curves to source tables in a single line of code.
- Run in parallel out of the box, with no infrastructure setup and no partition handling on your part.
You write plain Python — or SQL, if you prefer. acid handles the rest.
(acid reads and writes the HATS
catalog format used by Rubin/LSDB, Gaia DR3, and most other modern
surveys — see Concepts when you want the
details.)
import acid
import astropy.units as u
gaia = acid.open("gaia_dr3")
twomass = acid.open("twomass")
matched = (gaia
.in_region("des_dr2_footprint")
.crossmatch(twomass, radius=1 * u.arcsec, dist_col="d_arcsec")
.select("source_id, designation, d_arcsec"))
tbl = matched.to_astropy()
You can also do this in SQL!
The same query through the SQL interface, acid.sql.query(...):
That's the whole idea. The longer version:
- Point
acidat your catalogs — it works out of the box with the major surveys (Gaia DR3, Rubin DP1, DES, 2MASS, DELVE, SkyMapper, ZTF, …), auto-discovering every catalog in a directory. A YAML config is there for custom setups, but most users never write one. - Write your queries in Python — open a notebook, call
acid.open(...), and chain familiar verbs (where,crossmatch,group_by, …). Prefer SQL?acid.sql.query(...)takes plain SQL with exactly two extensions:XMATCH(radius_arcsec => ...)for spherical cross-matching andIN_MOC(<alias>, ...)for restricting to a survey footprint. - Get results back in familiar Python — as a pandas
DataFrameor anastropy.table.Table, ready formatplotliband the rest of your toolkit. For multi-million-row results, switch to high-performance formats like Polars or Arrow with a single method call. Or write straight to parquet to hand off to other tools.
Where to next?¶
-
Set up
acidwithpiporuvin two minutes. -
Run your first crossmatch in five minutes against a public HATS catalog.
-
Two end-to-end notebooks: a first crossmatch, and an interactive EDA session.
-
The concepts you need (HATS, MOCs, margin caches, the query dialect), explained for astronomers.
-
A dozen self-contained recipes — count, crossmatch, filter, top-K, self-match, anti-join, materialize, write.
-
Every SQL feature, every CLI flag, every public Python entry point, every error message.
Who is this for?¶
If you are an astronomer who:
- Is comfortable in Python;
- Wants to crossmatch, filter, count, or aggregate across millions to billions of rows without writing big-data-handling code yourself;
- Wants something that will make the same code run from a laptop to a cluster —
acid is built for you.
You do not need to know Polars, Apache Arrow, multiprocessing, or HEALPix arithmetic to use it. You will see those words in this documentation, but only when they help you make a decision.
Status¶
acid is under rapid development. The public API
(acid.open, acid.sql, Catalog, Result) is still evolving
and may break between releases; the on-disk output format and
accepted SQL subset may shift too. Pin a specific version if you're
embedding acid in a pipeline. See the
reference for what's supported today.
To hear when a new version ships, subscribe to release announcements — low traffic, one email per release.