Verbs or SQL — which should I use?¶

You're writing a query. ACID gives you two surfaces — the fluent Catalog verbs and acid.sql.query(...) — and they produce the same query under the hood. So the choice is ergonomic, except for a handful of shapes where it is genuinely capability: a few things exist only in SQL, and a few things neither surface can express. This page is the decision aid.

TL;DR¶

If in doubt, fluent. It composes, autocompletes, and chains naturally into .save(...) / .to_pandas().
SQL-only: inline subquery / CTE pre-filter on a JOIN's right side. Everything else fluent can express.
Neither: window functions, non-decomposable aggregates (MEDIAN, MODE, COUNT(DISTINCT)), SELECT DISTINCT.

The full picture is in the two tables below.

Decision table¶

When both surfaces support a shape, the choice is ergonomic — pick the one your downstream code already uses.

Want to do …	Fluent (`Catalog`)	`acid.sql.query(...)`	Use this one
Filter on row columns (`mag < 18`, `flags = 0`)	`.where("...")`	`WHERE ...`	Fluent
Pick / rename columns	`.select(...)`	`SELECT ...`	Fluent
Crossmatch by position	`.crossmatch(other, radius=...)`	`JOIN b ON XMATCH(r => ...)`	Either; fluent preferred for chains
Ordinary equi-join on an integer ID	`.join(other, on=...)`	`JOIN ... ON a.id = b.id`	Either
Decomposable aggregates (`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, `STDDEV`, `VARIANCE`, `BOOL_AND`, `BOOL_OR`)	`.group_by(...).aggregate(...)` + `acid.agg.*`	`GROUP BY ...` + `COUNT(...)` etc.	Either; fluent gives flat output names
Top-K (`ORDER BY ... LIMIT K`)	`.sort(...).limit(K)`	`ORDER BY ... LIMIT K`	Either
MOC footprint restriction	`.in_region(name_or_path_or_moc)`	`WHERE IN_MOC(<alias>, '<name>')`	Either
Inline subquery / CTE pre-filter on a JOIN's right side	not supported	`JOIN (SELECT * FROM b WHERE ...) AS x ON XMATCH(...)`	SQL only
Catch typos with autocomplete	yes (Python objects)	no (string)	Fluent
Paste a SQL string from a notebook / colleague	translate it	run it	SQL
`Catalog.save(path)` to write a HATS tree and re-register it	yes	not directly (use `r.save(path)`)	Fluent

What neither surface helps with¶

A few SQL shapes are rejected by the analyzer on both surfaces. These are not a choice between fluent and SQL — neither runs them. Each links to the canonical error message in the errors reference.

Shape	What happens
Window functions (`ROW_NUMBER() OVER (PARTITION BY ...)`)	Rejected — partition-local architecture; no window can see across partitions
Non-decomposable aggregates (`MEDIAN`, `MODE`, percentiles, `COUNT(DISTINCT)`)	Rejected — aggregate down with decomposable verbs and finish in Polars / pandas
`SELECT DISTINCT`, bare `GROUP BY`, unbounded `ORDER BY`	Rejected — rewrite as a real aggregate or top-K, or unique on the materialized result
Compound `ON XMATCH(...) AND ...`	Neither — extra predicates go in `WHERE`

The literal error strings, and the rewrites, are in reference/errors.md — Aggregates and reference/errors.md — Crossmatch / XMATCH.

Why the choice is mostly ergonomic¶

Both surfaces compile to the same engine-neutral query plan, run on the same partition-walking executor, and apply the same per-partition predicate / column pushdown. A .crossmatch(b, radius=1 * u.arcsec) and an ON XMATCH(radius_arcsec => 1.0) are the same query.

The places this isn't purely ergonomic are the cases the analyzer explicitly enforces:

Inline subqueries / CTE pre-filters (the (SELECT * FROM <cat> WHERE ...) AS alias shape in FROM / JOIN-RHS positions) only work through acid.sql.query(...). The fluent surface has no equivalent verb. See the SQL escape hatch for the supported shape and predicate grammar.
Compound ON predicates (XMATCH(...) AND a.mag < 20) are rejected with a ValidationError regardless of surface — move the extra predicate to a WHERE clause.
Window functions are rejected with a ValidationError regardless of surface. There is no port for them; the architecture is partition-local, and a window function needs to see neighbors across partitions.
Non-decomposable aggregates (MEDIAN, MODE, percentiles, COUNT(DISTINCT), SELECT DISTINCT, bare GROUP BY, unbounded ORDER BY) are rejected with a ValidationError regardless of surface. See the aggregation guide's "Why no agg.median?" section for the why.

And one place fluent is uniquely nice:

The acid.agg constructors (agg.count(), agg.mean("mag")) give you aggregate names with Python autocompletion and let you pass the output column name as a keyword — aggregate(n=agg.count(), mean_mag=agg.mean("mag")) produces output columns named n and mean_mag without writing an AS clause for each.
Fluent output names are flat — id, id_b, ra, ra_b — even after a crossmatch. The SQL surface still understands alias.col syntax in WHERE / SELECT, which is more familiar if you're translating an existing query, but the column names you see in the result are the same flat ones either way.

Worked example — the same crossmatch in both surfaces¶

The "shortest crossmatch" from the crossmatch guide, in both surfaces, side by side. Fixtures a and b are the test catalogs (b is a shifted by 0.3″ in RA); substitute gaia_dr3 and twomass_psc for real data.

FluentSQL

crossmatch_fluent.py

import acid
import astropy.units as u

acid.init("catalogs.yaml", workers=8)
a = acid.open("a")
b = acid.open("b")
matched = a.crossmatch(b, radius=1 * u.arcsec, dist_col="d_arcsec")

df = matched.to_astropy()

crossmatch_sql.py

import acid

acid.init("catalogs.yaml", workers=8)
r = acid.sql.query("""
    SELECT a.*, b.*, d_arcsec
    FROM   a
    JOIN   b ON XMATCH(radius_arcsec => 1.0, dist_col => 'd_arcsec')
""")
df = r.to_polars()

Both produce the same rows and the same column names — a's columns, b's columns with _b collision suffix where they clash, and the d_arcsec separation column in arcseconds. Use whichever surface fits the rest of your code.

Mixing surfaces in one flow¶

You can — and often will — mix the two. The fluent surface returns a Catalog; the SQL surface returns a Result. Two common shapes:

Build with fluent, hand off to SQL. Save a derived catalog with .save(...), then write SQL against the saved name.

hot = (acid.open("rubin_object")
         .where("g_mean - r_mean < -0.2")
         .save("/scratch/hot_objects", name="hot_objects"))

r = acid.sql.query("""
    SELECT hot_objects.id, lc.mjd, lc.flux
    FROM   hot_objects
    JOIN   forced_source AS lc ON hot_objects.id = lc.object_id
""")

Build with SQL, finish with pandas / Polars. Reach for SQL when you need one of its unique shapes (an inline subquery on the RHS, a window function paired with a manual aggregation), materialize the Result, then continue in Polars or pandas:

r = acid.sql.query("""
    SELECT a.id, b.id AS b_id, d_arcsec
    FROM   (SELECT * FROM gaia_dr3 WHERE phot_g_mean_mag < 16) AS a
    JOIN   (SELECT * FROM twomass_psc WHERE j_m < 14) AS b
        ON XMATCH(radius_arcsec => 1.0, dist_col => 'd_arcsec')
""")

# The inline-subquery pre-filter shape is SQL-only; finish in Polars.
df = r.to_polars()

A single ACID query is the unit of partition-parallel work. Use the surface that makes that single query easiest to read; switch between them at materialization boundaries (.save(...), r.to_polars()).