Astronomical Catalog Inference Driver

Analyze sky-survey catalogs with Python and SQL — at any scale.

ACID is a tool for astronomers who work with large sky-survey catalogs — Vera C. Rubin Observatory, Gaia, DES, 2MASS, DELVE, SkyMapper, ZTF, and the next generation. It lets you:

  • Analyze catalogs of any size — from a small cone search on your laptop to the full Rubin night on a cluster, with the same query.
  • Cross-match positions across surveys and join light curves to source tables in a single line of code.
  • Run in parallel out of the box, with no infrastructure setup and no partition handling on your part.

You write plain SQL and Python. acid handles the rest.

(acid reads and writes the HATS catalog format used by Rubin/LSDB, Gaia DR3, and most other modern surveys — see Concepts when you want the details.)

SELECT g.source_id, t.designation, XMATCH_DISTANCE(t) AS d_arcsec
FROM   gaia_dr3 AS g
JOIN   twomass  AS t ON XMATCH(radius_arcsec => 1.0)
WHERE  IN_MOC(g, 'des_dr2_footprint')

That's the whole idea. The longer version:

  • Point acid at your catalogs — it works out of the box with the major surveys (Gaia DR3, Rubin DP1, DES, 2MASS, DELVE, SkyMapper, ZTF, …), auto-discovering every catalog in a directory. A YAML config is there for custom setups, but most users never write one.
  • Write your queries in Python — open a notebook, call acid.sql(...), and ask for what you want. The only new syntax is XMATCH(radius_arcsec => ...) for spherical cross-matching and IN_MOC(<alias>, ...) for restricting to a survey footprint.
  • Get results back in familiar Python — as a pandas DataFrame or an astropy.table.Table, ready for matplotlib and the rest of your toolkit. For multi-million-row results, switch to high-performance formats like Polars or Arrow with a single method call. Or write straight to parquet to hand off to other tools.

Where to next?

  • :material-rocket-launch: Install

    Set up acid with pip or uv in two minutes.

  • :material-flash: Quickstart

    Run your first crossmatch in five minutes against a public HATS catalog.

  • :material-school: Tutorials

    Two end-to-end notebooks: a first crossmatch, and an interactive EDA session.

  • :material-book-open-page-variant: User guide

    The concepts you need (HATS, MOCs, margin caches, the query dialect), explained for astronomers.

  • :material-silverware-fork-knife: Cookbook

    A dozen self-contained recipes — count, crossmatch, filter, top-K, self-match, anti-join, materialize, write.

  • :material-bookshelf: Reference

    Every SQL feature, every CLI flag, every public Python entry point, every error message.

Who is this for?

If you are an astronomer who:

  • Works with HATS catalogs (Rubin DP1, Gaia DR3, DES, 2MASS, DELVE, SkyMapper, …) or other HEALPix-partitioned parquet datasets;
  • Is comfortable in Python and knows the gist of SQL (you've written SELECT ... FROM ... WHERE ... before);
  • Wants to crossmatch, filter, count, or aggregate across millions to billions of rows without writing partition-handling code yourself —

acid is built for you.

You do not need to know DuckDB, Apache Arrow, multiprocessing, or HEALPix arithmetic to use it. You will see those words in this documentation, but only when they help you make a decision.

Status

acid is under rapid development. The public API (acid.connect, acid.sql, Session, Result) is still evolving and may break between releases; the on-disk layout, rewriter, and per-partition SQL will too. Pin a specific version if you're embedding acid in a pipeline. See the reference for what's supported today.