Astronomical Catalog Inference Driver¶
Analyze sky-survey catalogs with Python and SQL — at any scale.
ACID is a tool for astronomers who work with large sky-survey catalogs — Vera C. Rubin Observatory, Gaia, DES, 2MASS, DELVE, SkyMapper, ZTF, and the next generation. It lets you:
- Analyze catalogs of any size — from a small cone search on your laptop to the full Rubin night on a cluster, with the same query.
- Cross-match positions across surveys and join light curves to source tables in a single line of code.
- Run in parallel out of the box, with no infrastructure setup and no partition handling on your part.
You write plain SQL and Python. acid handles the rest.
(acid reads and writes the HATS
catalog format used by Rubin/LSDB, Gaia DR3, and most other modern
surveys — see Concepts when you want the
details.)
SELECT g.source_id, t.designation, XMATCH_DISTANCE(t) AS d_arcsec
FROM gaia_dr3 AS g
JOIN twomass AS t ON XMATCH(radius_arcsec => 1.0)
WHERE IN_MOC(g, 'des_dr2_footprint')
That's the whole idea. The longer version:
- Point
acidat your catalogs — it works out of the box with the major surveys (Gaia DR3, Rubin DP1, DES, 2MASS, DELVE, SkyMapper, ZTF, …), auto-discovering every catalog in a directory. A YAML config is there for custom setups, but most users never write one. - Write your queries in Python — open a notebook, call
acid.sql(...), and ask for what you want. The only new syntax isXMATCH(radius_arcsec => ...)for spherical cross-matching andIN_MOC(<alias>, ...)for restricting to a survey footprint. - Get results back in familiar Python — as a pandas
DataFrameor anastropy.table.Table, ready formatplotliband the rest of your toolkit. For multi-million-row results, switch to high-performance formats like Polars or Arrow with a single method call. Or write straight to parquet to hand off to other tools.
Where to next?¶
-
:material-rocket-launch: Install
Set up
acidwithpiporuvin two minutes. -
:material-flash: Quickstart
Run your first crossmatch in five minutes against a public HATS catalog.
-
:material-school: Tutorials
Two end-to-end notebooks: a first crossmatch, and an interactive EDA session.
-
:material-book-open-page-variant: User guide
The concepts you need (HATS, MOCs, margin caches, the query dialect), explained for astronomers.
-
:material-silverware-fork-knife: Cookbook
A dozen self-contained recipes — count, crossmatch, filter, top-K, self-match, anti-join, materialize, write.
-
:material-bookshelf: Reference
Every SQL feature, every CLI flag, every public Python entry point, every error message.
Who is this for?¶
If you are an astronomer who:
- Works with HATS catalogs (Rubin DP1, Gaia DR3, DES, 2MASS, DELVE, SkyMapper, …) or other HEALPix-partitioned parquet datasets;
- Is comfortable in Python and knows the gist of SQL (you've written
SELECT ... FROM ... WHERE ...before); - Wants to crossmatch, filter, count, or aggregate across millions to billions of rows without writing partition-handling code yourself —
acid is built for you.
You do not need to know DuckDB, Apache Arrow, multiprocessing, or HEALPix arithmetic to use it. You will see those words in this documentation, but only when they help you make a decision.
Status¶
acid is under rapid development. The public API
(acid.connect, acid.sql, Session, Result) is still evolving
and may break between releases; the on-disk layout, rewriter, and
per-partition SQL will too. Pin a specific version if you're
embedding acid in a pipeline. See the
reference for what's supported today.