Astronomical Catalog Inference Driver¶

Cross-match and query sky-survey catalogs in Python or SQL — laptop to cluster, billions of rows.

ACID is a tool for astronomers who work with large sky-survey catalogs — Vera C. Rubin Observatory, Gaia, DES, 2MASS, DELVE, SkyMapper, ZTF, and the next generation. It lets you:

Analyze catalogs of any size — from a small cone search on your laptop to the full Rubin night on a cluster, with the same query.
Cross-match positions across surveys and join light curves to source tables in a single line of code.
Run in parallel out of the box, with no infrastructure setup and no partition handling on your part.

You write plain Python — or SQL, if you prefer. acid handles the rest.

(acid reads and writes the HATS catalog format used by Rubin/LSDB, Gaia DR3, and most other modern surveys — see Concepts when you want the details.)

import acid
import astropy.units as u

gaia    = acid.open("gaia_dr3")
twomass = acid.open("twomass")

matched = (gaia
           .in_region("des_dr2_footprint")
           .crossmatch(twomass, radius=1 * u.arcsec, dist_col="d_arcsec")
           .select("source_id, designation, d_arcsec"))

tbl = matched.to_astropy()

You can also do this in SQL!

The same query through the SQL interface, acid.sql.query(...):

SELECT g.source_id, t.designation, d_arcsec
FROM   gaia_dr3 AS g
JOIN   twomass  AS t ON XMATCH(radius_arcsec => 1.0, dist_col => 'd_arcsec')
WHERE  IN_MOC(g, 'des_dr2_footprint')

That's the whole idea. The longer version:

Point acid at your catalogs — it works out of the box with the major surveys (Gaia DR3, Rubin DP1, DES, 2MASS, DELVE, SkyMapper, ZTF, …), auto-discovering every catalog in a directory. A YAML config is there for custom setups, but most users never write one.
Write your queries in Python — open a notebook, call acid.open(...), and chain familiar verbs (where, crossmatch, group_by, …). Prefer SQL? acid.sql.query(...) takes plain SQL with exactly two extensions: XMATCH(radius_arcsec => ...) for spherical cross-matching and IN_MOC(<alias>, ...) for restricting to a survey footprint.
Get results back in familiar Python — as a pandas DataFrame or an astropy.table.Table, ready for matplotlib and the rest of your toolkit. For multi-million-row results, switch to high-performance formats like Polars or Arrow with a single method call. Or write straight to parquet to hand off to other tools.

Where to next?¶

Install

Set up acid with pip or uv in two minutes.
Quickstart

Run your first crossmatch in five minutes against a public HATS catalog.
Tutorials

Two end-to-end notebooks: a first crossmatch, and an interactive EDA session.
User guide

The concepts you need (HATS, MOCs, margin caches, the query dialect), explained for astronomers.
Cookbook

A dozen self-contained recipes — count, crossmatch, filter, top-K, self-match, anti-join, materialize, write.
Reference

Every SQL feature, every CLI flag, every public Python entry point, every error message.

Who is this for?¶

If you are an astronomer who:

Is comfortable in Python;
Wants to crossmatch, filter, count, or aggregate across millions to billions of rows without writing big-data-handling code yourself;
Wants something that will make the same code run from a laptop to a cluster —

acid is built for you.

You do not need to know Polars, Apache Arrow, multiprocessing, or HEALPix arithmetic to use it. You will see those words in this documentation, but only when they help you make a decision.

Status¶

acid is under rapid development. The public API (acid.open, acid.sql, Catalog, Result) is still evolving and may break between releases; the on-disk output format and accepted SQL subset may shift too. Pin a specific version if you're embedding acid in a pipeline. See the reference for what's supported today.

To hear when a new version ships, subscribe to release announcements — low traffic, one email per release.