Skip to content

Quickstart

In five minutes, you'll download a real HATS catalog, open it, and run a crossmatch — no theory, no config files.

You'll need acid installed; see Installation.

1. Download a small catalog

Grab a cone-shaped subset of the 2MASS point-source catalog from the LSDB mirror — about 100,000 sources around (RA, Dec) = (50°, -50°):

acid download two_mass --cone 50,-50,2

A bare catalog name is resolved against the LSDB mirror, and — since you didn't say where to put it — downloaded into ~/datasets/two_mass, acid's default catalog directory (created for you on first use). That default is why acid.open(...) below needs no setup: with nothing else set, both ends look in ~/datasets.

~/datasets/two_mass/ now contains a valid HATS catalog. You can poke at it the same way — by name:

acid inspect two_mass

2. Open a catalog

Just import acid and call acid.open(...). The worker pool spins up on the first query and is shared across everything you run; you don't manage it:

import acid

twomass = acid.open("two_mass")
print(twomass.columns)        # cheap; reads cached metadata only
print(twomass.describe())     # row count, partitions, footprint, schema

With no configuration, acid searches your catalog path (ACID_PATH, defaulting to ~/datasets) — the same place the download just landed — so acid.open("two_mass") finds it by name.

acid.open("two_mass") returns a lazy Catalog handle. No data is read yet — composition is metadata-only.

3. Filter, project, materialize

Compose with verbs, then trigger execution with a terminal method (.head, .to_astropy, .to_polars, ...):

bright = (acid.open("two_mass")
            .where("j_m < 14.0")
            .select("designation, ra, decl, j_m")
            .limit(10))

bright.head(10).show()        # pretty-print to stdout
tbl = bright.to_astropy()     # also: .to_polars(), .to_arrow(), .to_pandas()

The query is plain SQL inside .where(...) and .select(...). acid reads only the columns you asked for from disk, even though the catalog has 60 more.

Result.show(n) uses the same fixed-width renderer the CLI does, so the output matches acid query "...". print(result) instead renders the result as a Polars DataFrame — with its shape: header and Polars's own row truncation.

4. Crossmatch two catalogs

Download a second catalog over the same region:

acid download gaia_dr3 gaia --cone 50,-50,2

On the mirror this catalog's collection is named gaia_dr3; the second argument stores it locally as plain gaia (a bare name → ~/datasets/gaia), so we can refer to it as gaia from here on. (two_mass above needed only one argument — its collection name was already the name we wanted.)

Not sure what's out there? acid search lists catalogs you can download from the online mirrors, and acid list shows the ones already on your machine.

Now ask: "for every Gaia source, find any 2MASS source within 1 arcsec":

import acid
import astropy.units as u

gaia = acid.open("gaia")
twomass = acid.open("two_mass")

matches = (gaia.crossmatch(twomass, radius=1*u.arcsec)
               .select("source_id, designation"))

matches.head(20).show()

Three things to note:

  • radius=1*u.arcsec is the only non-standard bit. Quantities are required — bare floats are rejected so units never get guessed wrong.
  • By default you get the single closest match per anchor row (maxmatch=1); pass maxmatch=-1 for every match within the radius, and how="left" to keep anchors that have no match.
  • You don't normally set the worker count — acid sizes the pool to your machine automatically. Reach for workers=N only to fix a problem, e.g. drop it if a query runs out of memory (fewer workers, more headroom each).

5. Restrict to a region of sky

If you only care about a small part of the sky — to debug a query before running it full-sky — run it inside an acid.in_cone(...) block. The cone is applied when a query executes inside the block, to every query (fluent and SQL) materialized there. The same Catalog object runs scoped inside the block and full-sky outside it:

gaia = acid.open("gaia")
twomass = acid.open("two_mass")

matches = (gaia.crossmatch(twomass, radius=1*u.arcsec)
               .select("source_id, designation"))

with acid.in_cone((50.0, -50.0), radius=0.5*u.deg):
    # Iterate cheaply: only partitions overlapping the cone are read.
    small_tbl = matches.to_astropy()

# Outside the block, the *same* query runs full-sky:
big_tbl = matches.to_astropy()

acid enumerates only partitions overlapping the cone (skipping the rest), then enforces dist ≤ radius exactly via a great-circle predicate — no boundary artefacts. Cones do not nest; one block at a time (a nested in_cone raises ValidationError).

6. Save or export the result

A pipeline ends in one of two terminal verbs — save for a result that stays queryable, export for one that leaves as a single file.

save — a HATS catalog you can reuse and hand off to LSDB. A bare name joins your catalog library (it lands under your ACID_PATH root), so later sessions re-open it by name:

gaia = acid.open("gaia")
twomass = acid.open("two_mass")

saved = (gaia
         .crossmatch(twomass, radius=1*u.arcsec)
         .select("source_id, designation")
         .save("gxt"))               # → <ACID_PATH>/gxt, registered as "gxt"

# `saved` is a normal Catalog handle bound to the freshly written tree.
# In this *and* future sessions, the name resolves by acid.sql / acid.open:
print(acid.sql.query("SELECT COUNT(*) AS n FROM gxt"))

The written tree is a standards-compliant HATS catalog — any HATS reader (including LSDB) opens it directly. save streams partition by partition, so it scales to full-sky outputs. (Pass an explicit path like ./out/gaia_x_2mass to write somewhere specific instead of the library.)

export — one flat file for another tool. For a target list or paper table, export writes a single CSV / parquet / FITS file (format by extension or format=) and returns its path:

path = (gaia
        .crossmatch(twomass, radius=1*u.arcsec)
        .select("source_id, designation")
        .export("crossmatch.csv"))

export gathers the whole result in memory before writing — perfect for selective queries, but use save (streaming) for anything full-sky.

See Working with results & exporting for the full output-format menu.

7. Drop into SQL when you need to

The fluent verbs cover crossmatches, joins, filters, projections, and aggregations (group_by / aggregate — see the aggregation guide). But some queries just read better as SQL, and a few shapes are SQL-only — when that's the case, hand the same connection a plain SQL string with acid.sql.query(...):

r = acid.sql.query("""
    SELECT g.source_id,
           COUNT(*)   AS n,
           AVG(d)     AS avg_d
    FROM   gaia     AS g
    JOIN   two_mass AS t ON XMATCH(radius_arcsec => 1.0, mode => 'all', dist_col => 'd')
    GROUP BY g.source_id
    HAVING COUNT(*) >= 2
    ORDER BY avg_d ASC
    LIMIT  100
""")
print(r)

Where to next?

You've already seen the core API: acid.open(...), the verbs, acid.in_cone(...), acid.sql.query(...). Three good directions:

  • Your first crossmatch — a notebook that turns the example above into a small science story with plots.
  • Cookbook — short self-contained recipes for the patterns you'll hit on real data (footprint filtering, self-crossmatch, top-K, materialize-and-reuse, ...).
  • Connections and Writing queries — the lifecycle of a Connection, when to use acid.sql.query(...) vs the fluent verbs, and the full SQL dialect ACID supports.

Picking an output type

.to_astropy() returns an astropy Table — the natural fit for catalog work (units, coordinates, FITS round-trips). For anything heavy — group-by, filtering, joins on multi-million-row results — .to_polars() is typically 5–50× faster than pandas and just as easy to read. .to_pandas() exists if a downstream library needs it, but reach for .to_astropy() or .to_polars() first.

And to hear when a new version lands, subscribe to release announcements — low traffic, one email per release.