Quickstart¶
In five minutes, you'll download a real HATS catalog, run a SQL query against it, and look at the result in pandas. No theory, no config files — just commands you can paste.
You'll need acid installed; see Installation.
1. Download a small catalog¶
Grab a cone-shaped subset of the 2MASS point-source catalog from the
LSDB mirror — about 100,000 sources around (RA, Dec) = (50°, -50°):
./data/two_mass/ now contains a valid HATS catalog. You can poke at it
with acid inspect:
2. Run your first query¶
Open Python, point acid.sql at the directory, and ask a question:
import acid
r = acid.sql(
"SELECT COUNT(*) AS n FROM two_mass",
catalogs="./data/", # directory of HATS catalogs (auto-discovered)
)
print(r.df())
# n
# 0 101324
That's it. acid.sql(...) returns a Result; .df() gives you a
pandas DataFrame.
3. Filter and project¶
r = acid.sql(
"""
SELECT designation, ra, dec, j_m
FROM two_mass
WHERE j_m < 14.0
ORDER BY j_m
LIMIT 10
""",
catalogs="./data/",
)
r.df()
The query is plain SQL. acid reads only the columns you asked for
(designation, ra, dec, j_m) from disk, even though the catalog
has 60 more columns.
4. Crossmatch two catalogs¶
Download a second catalog over the same region:
Now ask: "for every Gaia source, find any 2MASS source within 1 arcsec":
r = acid.sql(
"""
SELECT g.source_id,
t.designation,
XMATCH_DISTANCE(t) AS d_arcsec
FROM gaia AS g
JOIN two_mass AS t ON XMATCH(radius_arcsec => 1.0)
ORDER BY d_arcsec
LIMIT 20
""",
catalogs="./data/",
workers=4,
)
r.df()
Three things to note:
XMATCH(radius_arcsec => 1.0)is the only non-standard syntax. It asks for spherical matching at 1 arcsec.XMATCH_DISTANCE(t)exposes the match distance in arcsec; use it inSELECT,WHERE,ORDER BY— wherever.workers=4runs four partitions in parallel. Bump it on bigger boxes.
5. Save the result¶
To get a parquet file (single file, not a partitioned catalog):
Or write a full HATS catalog you can hand to LSDB:
acid.run(
"SELECT g.source_id, t.designation FROM gaia AS g "
"JOIN two_mass AS t ON XMATCH(radius_arcsec => 1.0)",
catalogs="./data/",
output="./out/gaia_x_2mass/",
workers=4,
)
Then open it back up:
Where to next?¶
You've already seen the core API: acid.sql(query, catalogs=...,
workers=...). Three good directions:
- Your first crossmatch — a notebook that turns the example above into a small science story with plots.
- Writing queries — the full SQL dialect: what
joins, aggregates, and predicates
acidsupports. - Concepts — what HATS, HEALPix, margin caches, and MOCs are, and why they matter.
Want it faster?
r.df() returns a pandas DataFrame because that's what most
astronomers reach for. For anything heavier than .head() on
multi-million-row results — group-by, filtering, joins —
r.to_polars() is typically 5–50× faster than pandas and just
as easy to read. You can always come back with
polars_df.to_pandas().