Quickstart

In five minutes, you'll download a real HATS catalog, run a SQL query against it, and look at the result in pandas. No theory, no config files — just commands you can paste.

You'll need acid installed; see Installation.

1. Download a small catalog

Grab a cone-shaped subset of the 2MASS point-source catalog from the LSDB mirror — about 100,000 sources around (RA, Dec) = (50°, -50°):

acid download \
    https://data.lsdb.io/hats/two_mass/two_mass \
    ./data/two_mass \
    --cone 50,-50,2

./data/two_mass/ now contains a valid HATS catalog. You can poke at it with acid inspect:

acid inspect ./data/two_mass

2. Run your first query

Open Python, point acid.sql at the directory, and ask a question:

import acid

r = acid.sql(
    "SELECT COUNT(*) AS n FROM two_mass",
    catalogs="./data/",         # directory of HATS catalogs (auto-discovered)
)

print(r.df())
#       n
# 0  101324

That's it. acid.sql(...) returns a Result; .df() gives you a pandas DataFrame.

3. Filter and project

r = acid.sql(
    """
    SELECT designation, ra, dec, j_m
    FROM   two_mass
    WHERE  j_m < 14.0
    ORDER BY j_m
    LIMIT 10
    """,
    catalogs="./data/",
)

r.df()

The query is plain SQL. acid reads only the columns you asked for (designation, ra, dec, j_m) from disk, even though the catalog has 60 more columns.

4. Crossmatch two catalogs

Download a second catalog over the same region:

acid download \
    https://data.lsdb.io/hats/gaia_dr3/gaia \
    ./data/gaia \
    --cone 50,-50,2

Now ask: "for every Gaia source, find any 2MASS source within 1 arcsec":

r = acid.sql(
    """
    SELECT g.source_id,
           t.designation,
           XMATCH_DISTANCE(t) AS d_arcsec
    FROM   gaia     AS g
    JOIN   two_mass AS t ON XMATCH(radius_arcsec => 1.0)
    ORDER BY d_arcsec
    LIMIT  20
    """,
    catalogs="./data/",
    workers=4,
)

r.df()

Three things to note:

  • XMATCH(radius_arcsec => 1.0) is the only non-standard syntax. It asks for spherical matching at 1 arcsec.
  • XMATCH_DISTANCE(t) exposes the match distance in arcsec; use it in SELECT, WHERE, ORDER BY — wherever.
  • workers=4 runs four partitions in parallel. Bump it on bigger boxes.

5. Save the result

To get a parquet file (single file, not a partitioned catalog):

r.write_parquet("crossmatch.parquet", layout="single")

Or write a full HATS catalog you can hand to LSDB:

acid.run(
    "SELECT g.source_id, t.designation FROM gaia AS g "
    "JOIN two_mass AS t ON XMATCH(radius_arcsec => 1.0)",
    catalogs="./data/",
    output="./out/gaia_x_2mass/",
    workers=4,
)

Then open it back up:

import hats
cat = hats.read_hats("./out/gaia_x_2mass/")

Where to next?

You've already seen the core API: acid.sql(query, catalogs=..., workers=...). Three good directions:

  • Your first crossmatch — a notebook that turns the example above into a small science story with plots.
  • Writing queries — the full SQL dialect: what joins, aggregates, and predicates acid supports.
  • Concepts — what HATS, HEALPix, margin caches, and MOCs are, and why they matter.

Want it faster?

r.df() returns a pandas DataFrame because that's what most astronomers reach for. For anything heavier than .head() on multi-million-row results — group-by, filtering, joins — r.to_polars() is typically 5–50× faster than pandas and just as easy to read. You can always come back with polars_df.to_pandas().