Verbs or SQL — which should I use?¶
You're writing a query. ACID gives you two surfaces — the fluent
Catalog verbs and acid.sql.query(...) — and they produce the same query
under the hood. So the choice is ergonomic, except for a handful
of shapes where it is genuinely capability: a few things exist only
in SQL, and a few things neither surface can express. This page is
the decision aid.
TL;DR¶
- If in doubt, fluent. It composes, autocompletes, and chains
naturally into
.save(...)/.to_pandas(). - SQL-only: inline subquery / CTE pre-filter on a JOIN's right side. Everything else fluent can express.
- Neither: window functions, non-decomposable aggregates
(
MEDIAN,MODE,COUNT(DISTINCT)),SELECT DISTINCT.
The full picture is in the two tables below.
Decision table¶
When both surfaces support a shape, the choice is ergonomic — pick the one your downstream code already uses.
| Want to do … | Fluent (Catalog) |
acid.sql.query(...) |
Use this one |
|---|---|---|---|
Filter on row columns (mag < 18, flags = 0) |
.where("...") |
WHERE ... |
Fluent |
| Pick / rename columns | .select(...) |
SELECT ... |
Fluent |
| Crossmatch by position | .crossmatch(other, radius=...) |
JOIN b ON XMATCH(r => ...) |
Either; fluent preferred for chains |
| Ordinary equi-join on an integer ID | .join(other, on=...) |
JOIN ... ON a.id = b.id |
Either |
Decomposable aggregates (COUNT, SUM, AVG, MIN, MAX, STDDEV, VARIANCE, BOOL_AND, BOOL_OR) |
.group_by(...).aggregate(...) + acid.agg.* |
GROUP BY ... + COUNT(...) etc. |
Either; fluent gives flat output names |
Top-K (ORDER BY ... LIMIT K) |
.sort(...).limit(K) |
ORDER BY ... LIMIT K |
Either |
| MOC footprint restriction | .in_region(name_or_path_or_moc) |
WHERE IN_MOC(<alias>, '<name>') |
Either |
| Inline subquery / CTE pre-filter on a JOIN's right side | not supported | JOIN (SELECT * FROM b WHERE ...) AS x ON XMATCH(...) |
SQL only |
| Catch typos with autocomplete | yes (Python objects) | no (string) | Fluent |
| Paste a SQL string from a notebook / colleague | translate it | run it | SQL |
Catalog.save(path) to write a HATS tree and re-register it |
yes | not directly (use r.save(path)) |
Fluent |
What neither surface helps with¶
A few SQL shapes are rejected by the analyzer on both surfaces. These are not a choice between fluent and SQL — neither runs them. Each links to the canonical error message in the errors reference.
| Shape | What happens |
|---|---|
Window functions (ROW_NUMBER() OVER (PARTITION BY ...)) |
Rejected — partition-local architecture; no window can see across partitions |
Non-decomposable aggregates (MEDIAN, MODE, percentiles, COUNT(DISTINCT)) |
Rejected — aggregate down with decomposable verbs and finish in Polars / pandas |
SELECT DISTINCT, bare GROUP BY, unbounded ORDER BY |
Rejected — rewrite as a real aggregate or top-K, or unique on the materialized result |
Compound ON XMATCH(...) AND ... |
Neither — extra predicates go in WHERE |
The literal error strings, and the rewrites, are in
reference/errors.md — Aggregates
and reference/errors.md — Crossmatch / XMATCH.
Why the choice is mostly ergonomic¶
Both surfaces compile to the same engine-neutral query plan, run on
the same partition-walking executor, and apply the same per-partition
predicate / column pushdown. A .crossmatch(b, radius=1 * u.arcsec)
and an ON XMATCH(radius_arcsec => 1.0) are the same query.
The places this isn't purely ergonomic are the cases the analyzer explicitly enforces:
- Inline subqueries / CTE pre-filters (the
(SELECT * FROM <cat> WHERE ...) AS aliasshape in FROM / JOIN-RHS positions) only work throughacid.sql.query(...). The fluent surface has no equivalent verb. See the SQL escape hatch for the supported shape and predicate grammar. - Compound
ONpredicates (XMATCH(...) AND a.mag < 20) are rejected with aValidationErrorregardless of surface — move the extra predicate to aWHEREclause. - Window functions are rejected with a
ValidationErrorregardless of surface. There is no port for them; the architecture is partition-local, and a window function needs to see neighbors across partitions. - Non-decomposable aggregates (
MEDIAN,MODE, percentiles,COUNT(DISTINCT),SELECT DISTINCT, bareGROUP BY, unboundedORDER BY) are rejected with aValidationErrorregardless of surface. See the aggregation guide's "Why noagg.median?" section for the why.
And one place fluent is uniquely nice:
- The
acid.aggconstructors (agg.count(),agg.mean("mag")) give you aggregate names with Python autocompletion and let you pass the output column name as a keyword —aggregate(n=agg.count(), mean_mag=agg.mean("mag"))produces output columns namednandmean_magwithout writing anASclause for each. - Fluent output names are flat —
id,id_b,ra,ra_b— even after a crossmatch. The SQL surface still understandsalias.colsyntax inWHERE/SELECT, which is more familiar if you're translating an existing query, but the column names you see in the result are the same flat ones either way.
Worked example — the same crossmatch in both surfaces¶
The "shortest crossmatch" from the crossmatch guide,
in both surfaces, side by side. Fixtures a and b are the test
catalogs (b is a shifted by 0.3″ in RA); substitute gaia_dr3 and
twomass_psc for real data.
Both produce the same rows and the same column names — a's columns,
b's columns with _b collision suffix where they clash, and the
d_arcsec separation column in arcseconds. Use whichever surface fits
the rest of your code.
Mixing surfaces in one flow¶
You can — and often will — mix the two. The fluent surface returns a
Catalog; the SQL surface returns a Result. Two common shapes:
Build with fluent, hand off to SQL. Save a derived catalog with
.save(...), then write SQL against the saved name.
hot = (acid.open("rubin_object")
.where("g_mean - r_mean < -0.2")
.save("/scratch/hot_objects", name="hot_objects"))
r = acid.sql.query("""
SELECT hot_objects.id, lc.mjd, lc.flux
FROM hot_objects
JOIN forced_source AS lc ON hot_objects.id = lc.object_id
""")
Build with SQL, finish with pandas / Polars. Reach for SQL when
you need one of its unique shapes (an inline subquery on the RHS, a
window function paired with a manual aggregation), materialize the
Result, then continue in Polars or pandas:
r = acid.sql.query("""
SELECT a.id, b.id AS b_id, d_arcsec
FROM (SELECT * FROM gaia_dr3 WHERE phot_g_mean_mag < 16) AS a
JOIN (SELECT * FROM twomass_psc WHERE j_m < 14) AS b
ON XMATCH(radius_arcsec => 1.0, dist_col => 'd_arcsec')
""")
# The inline-subquery pre-filter shape is SQL-only; finish in Polars.
df = r.to_polars()
A single ACID query is the unit of partition-parallel work. Use the
surface that makes that single query easiest to read; switch
between them at materialization boundaries (.save(...),
r.to_polars()).
See also¶
- Crossmatching catalogs — the canonical place to see both surfaces side by side in a longer worked example.
- Writing SQL (the escape hatch) — the full
acid.sqlpage: inline subqueries / CTEs, the rejection rules, the worked cases where SQL is the right reach. - SQL features — the canonical "what does the analyzer accept and what does it reject" table.
- Debug small, run big — the other concept page; orthogonal to the surface choice (works the same on either).