Skip to content

Verbs or SQL — which should I use?

You're writing a query. ACID gives you two surfaces — the fluent Catalog verbs and acid.sql.query(...) — and they produce the same query under the hood. So the choice is ergonomic, except for a handful of shapes where it is genuinely capability: a few things exist only in SQL, and a few things neither surface can express. This page is the decision aid.

TL;DR

  • If in doubt, fluent. It composes, autocompletes, and chains naturally into .save(...) / .to_pandas().
  • SQL-only: inline subquery / CTE pre-filter on a JOIN's right side. Everything else fluent can express.
  • Neither: window functions, non-decomposable aggregates (MEDIAN, MODE, COUNT(DISTINCT)), SELECT DISTINCT.

The full picture is in the two tables below.

Decision table

When both surfaces support a shape, the choice is ergonomic — pick the one your downstream code already uses.

Want to do … Fluent (Catalog) acid.sql.query(...) Use this one
Filter on row columns (mag < 18, flags = 0) .where("...") WHERE ... Fluent
Pick / rename columns .select(...) SELECT ... Fluent
Crossmatch by position .crossmatch(other, radius=...) JOIN b ON XMATCH(r => ...) Either; fluent preferred for chains
Ordinary equi-join on an integer ID .join(other, on=...) JOIN ... ON a.id = b.id Either
Decomposable aggregates (COUNT, SUM, AVG, MIN, MAX, STDDEV, VARIANCE, BOOL_AND, BOOL_OR) .group_by(...).aggregate(...) + acid.agg.* GROUP BY ... + COUNT(...) etc. Either; fluent gives flat output names
Top-K (ORDER BY ... LIMIT K) .sort(...).limit(K) ORDER BY ... LIMIT K Either
MOC footprint restriction .in_region(name_or_path_or_moc) WHERE IN_MOC(<alias>, '<name>') Either
Inline subquery / CTE pre-filter on a JOIN's right side not supported JOIN (SELECT * FROM b WHERE ...) AS x ON XMATCH(...) SQL only
Catch typos with autocomplete yes (Python objects) no (string) Fluent
Paste a SQL string from a notebook / colleague translate it run it SQL
Catalog.save(path) to write a HATS tree and re-register it yes not directly (use r.save(path)) Fluent

What neither surface helps with

A few SQL shapes are rejected by the analyzer on both surfaces. These are not a choice between fluent and SQL — neither runs them. Each links to the canonical error message in the errors reference.

Shape What happens
Window functions (ROW_NUMBER() OVER (PARTITION BY ...)) Rejected — partition-local architecture; no window can see across partitions
Non-decomposable aggregates (MEDIAN, MODE, percentiles, COUNT(DISTINCT)) Rejected — aggregate down with decomposable verbs and finish in Polars / pandas
SELECT DISTINCT, bare GROUP BY, unbounded ORDER BY Rejected — rewrite as a real aggregate or top-K, or unique on the materialized result
Compound ON XMATCH(...) AND ... Neither — extra predicates go in WHERE

The literal error strings, and the rewrites, are in reference/errors.md — Aggregates and reference/errors.md — Crossmatch / XMATCH.

Why the choice is mostly ergonomic

Both surfaces compile to the same engine-neutral query plan, run on the same partition-walking executor, and apply the same per-partition predicate / column pushdown. A .crossmatch(b, radius=1 * u.arcsec) and an ON XMATCH(radius_arcsec => 1.0) are the same query.

The places this isn't purely ergonomic are the cases the analyzer explicitly enforces:

  • Inline subqueries / CTE pre-filters (the (SELECT * FROM <cat> WHERE ...) AS alias shape in FROM / JOIN-RHS positions) only work through acid.sql.query(...). The fluent surface has no equivalent verb. See the SQL escape hatch for the supported shape and predicate grammar.
  • Compound ON predicates (XMATCH(...) AND a.mag < 20) are rejected with a ValidationError regardless of surface — move the extra predicate to a WHERE clause.
  • Window functions are rejected with a ValidationError regardless of surface. There is no port for them; the architecture is partition-local, and a window function needs to see neighbors across partitions.
  • Non-decomposable aggregates (MEDIAN, MODE, percentiles, COUNT(DISTINCT), SELECT DISTINCT, bare GROUP BY, unbounded ORDER BY) are rejected with a ValidationError regardless of surface. See the aggregation guide's "Why no agg.median?" section for the why.

And one place fluent is uniquely nice:

  • The acid.agg constructors (agg.count(), agg.mean("mag")) give you aggregate names with Python autocompletion and let you pass the output column name as a keyword — aggregate(n=agg.count(), mean_mag=agg.mean("mag")) produces output columns named n and mean_mag without writing an AS clause for each.
  • Fluent output names are flatid, id_b, ra, ra_b — even after a crossmatch. The SQL surface still understands alias.col syntax in WHERE / SELECT, which is more familiar if you're translating an existing query, but the column names you see in the result are the same flat ones either way.

Worked example — the same crossmatch in both surfaces

The "shortest crossmatch" from the crossmatch guide, in both surfaces, side by side. Fixtures a and b are the test catalogs (b is a shifted by 0.3″ in RA); substitute gaia_dr3 and twomass_psc for real data.

crossmatch_fluent.py
import acid
import astropy.units as u

acid.init("catalogs.yaml", workers=8)
a = acid.open("a")
b = acid.open("b")
matched = a.crossmatch(b, radius=1 * u.arcsec, dist_col="d_arcsec")

df = matched.to_astropy()
crossmatch_sql.py
import acid

acid.init("catalogs.yaml", workers=8)
r = acid.sql.query("""
    SELECT a.*, b.*, d_arcsec
    FROM   a
    JOIN   b ON XMATCH(radius_arcsec => 1.0, dist_col => 'd_arcsec')
""")
df = r.to_polars()

Both produce the same rows and the same column names — a's columns, b's columns with _b collision suffix where they clash, and the d_arcsec separation column in arcseconds. Use whichever surface fits the rest of your code.

Mixing surfaces in one flow

You can — and often will — mix the two. The fluent surface returns a Catalog; the SQL surface returns a Result. Two common shapes:

Build with fluent, hand off to SQL. Save a derived catalog with .save(...), then write SQL against the saved name.

hot = (acid.open("rubin_object")
         .where("g_mean - r_mean < -0.2")
         .save("/scratch/hot_objects", name="hot_objects"))

r = acid.sql.query("""
    SELECT hot_objects.id, lc.mjd, lc.flux
    FROM   hot_objects
    JOIN   forced_source AS lc ON hot_objects.id = lc.object_id
""")

Build with SQL, finish with pandas / Polars. Reach for SQL when you need one of its unique shapes (an inline subquery on the RHS, a window function paired with a manual aggregation), materialize the Result, then continue in Polars or pandas:

r = acid.sql.query("""
    SELECT a.id, b.id AS b_id, d_arcsec
    FROM   (SELECT * FROM gaia_dr3 WHERE phot_g_mean_mag < 16) AS a
    JOIN   (SELECT * FROM twomass_psc WHERE j_m < 14) AS b
        ON XMATCH(radius_arcsec => 1.0, dist_col => 'd_arcsec')
""")

# The inline-subquery pre-filter shape is SQL-only; finish in Polars.
df = r.to_polars()

A single ACID query is the unit of partition-parallel work. Use the surface that makes that single query easiest to read; switch between them at materialization boundaries (.save(...), r.to_polars()).

See also

  • Crossmatching catalogs — the canonical place to see both surfaces side by side in a longer worked example.
  • Writing SQL (the escape hatch) — the full acid.sql page: inline subqueries / CTEs, the rejection rules, the worked cases where SQL is the right reach.
  • SQL features — the canonical "what does the analyzer accept and what does it reject" table.
  • Debug small, run big — the other concept page; orthogonal to the surface choice (works the same on either).