SQL features¶
This page is the canonical "what does ACID's SQL surface accept, and
what does it reject" reference. It is what acid.sql.query(...) will run
without complaint and the analyzer-enforced rules it applies. Use it
when something raises and you want to know why; use the
SQL escape hatch guide for worked examples.
ACID's SQL is parsed by SQLGlot and the accepted subset is what ACID's analyzer accepts. Two extensions are unique to ACID:
XMATCH(radius_arcsec => ...)in aJOIN ... ONclause for spherical crossmatching.IN_MOC(<alias>, '<name>')as a per-row predicate restricting a query to a MOC footprint.
Both have rules about where they may appear; both are documented below.
At a glance¶
| Feature | Supported? | Notes |
|---|---|---|
SELECT columns / expressions |
yes | Including aliased expressions (floor(mag) AS bin). |
WHERE predicates (conjunctive, disjunctive) |
yes | Per-partition pushdown for column / cone / MOC predicates. |
GROUP BY (with a supported aggregate) |
yes | See Aggregates below. |
Bare GROUP BY (no aggregate) |
no | ValidationError. Use an aggregate or de-duplicate the result. |
HAVING |
yes | Over aggregate output / group keys. |
ORDER BY ... LIMIT K (top-K) |
yes | LIMIT is pushed to each partition. |
ORDER BY ... (no LIMIT) |
no | ValidationError. Add a LIMIT, or sort the materialized result. |
LIMIT K, OFFSET |
yes | |
SELECT DISTINCT |
no | ValidationError. .to_polars().unique() / GROUP BY + aggregate. |
COUNT(DISTINCT col) (and other DISTINCT aggregates) |
no | ValidationError. .to_polars().n_unique(). |
INNER JOIN ... ON a.x = b.x (integer-ID equi-join) |
yes | Both keys integer ID columns. The right catalog must be localized (a _healpix_29 column + a margin cache); see the SQL escape hatch. |
LEFT JOIN ... ON a.x = b.x |
yes | |
LEFT JOIN ... ON XMATCH(...) |
yes | |
RIGHT JOIN, FULL JOIN, CROSS JOIN |
no | ValidationError. Swap roles and use LEFT JOIN. |
XMATCH(...) mixed with other predicates in ON |
no | ValidationError (XMATCH must be the entire ON predicate). |
IN_MOC(<alias>, '<name>') in conjunctive WHERE |
yes | Optionally NOT IN_MOC(...). Fast partition / row-group pushdown. |
IN_MOC(...) in SELECT / ORDER BY / HAVING / CASE |
no | ValidationError. MOC membership is a restriction, not a per-row scalar. |
IN_MOC(...) inside OR (any disjunction) |
no | ValidationError. Restructure into conjunctive WHERE. |
IN_MOC(...) in JOIN ... ON |
no | ValidationError. Move to a top-level WHERE. |
Inline subquery in FROM / XMATCH JOIN RHS / ordinary JOIN RHS |
yes | One shape only: (SELECT * FROM <catalog> [WHERE <row-level predicate>]) AS alias. |
CTE (WITH ... AS (...)) in any of those positions |
yes | Same shape restriction as the inline subquery form. |
CTE-to-CTE references / WITH RECURSIVE |
no | ParseError. The CTE body's FROM must name a registered catalog. |
Decomposable aggregates (COUNT, SUM, AVG, MIN, MAX, STDDEV, VARIANCE, BOOL_AND, BOOL_OR) |
yes | Phase-1 partials, partial aggregation. |
Non-decomposable aggregates (MEDIAN, MODE, percentiles, STRING_AGG) |
no | ValidationError. Aggregate down decomposably, finish in Polars / pandas. |
Window functions (ROW_NUMBER() OVER (...)) |
no | ValidationError (Window functions are not supported across partitioned catalogs.). |
CASE WHEN ... THEN ... ELSE ... END |
yes | In SELECT, WHERE, HAVING. Not for IN_MOC (see above). |
| Arithmetic / comparison / boolean operators | yes | |
IS NULL / IS NOT NULL, BETWEEN, IN (...) |
yes |
The rest of this page expands each row that has rules attached.
The XMATCH extension¶
JOIN <right> ON XMATCH(radius_arcsec => 1.0)
JOIN <right> ON XMATCH(r => 1.0)
JOIN <right> ON XMATCH(r => 1.0, mode => 'all')
JOIN <right> ON XMATCH(r => 1.0, dist_col => 'sep')
LEFT JOIN <right> ON XMATCH(r => 1.0)
Allowed keyword arguments¶
| Keyword | Required | Type | Notes |
|---|---|---|---|
radius_arcsec |
yes | float | The match radius, in arcseconds. Alias: r. Must be positive. |
mode |
no | string | 'nearest' (default) or 'all'. Lower-case; case-insensitive. |
dist_col |
no | string | Output column for the great-circle separation, in arcseconds. |
Anything else is rejected:
ValidationError: unknown XMATCH keyword argument(s): ['k']
hint: supported keywords are radius_arcsec (alias r), mode, and dist_col.
Positional arguments and non-literal values are rejected at parse
time (only literal values are accepted in XMATCH() kwargs).
Allowed positions¶
XMATCH must be the entire ON predicate of a JOIN (or
LEFT JOIN). The analyzer rejects every other position.
| Position | Verdict |
|---|---|
JOIN b ON XMATCH(...) |
yes |
LEFT JOIN b ON XMATCH(...) |
yes |
JOIN b ON XMATCH(...) AND a.mag < 20 |
no — XMATCH must be the entire ON predicate; compound predicates like 'XMATCH(...) AND ...' are not supported. Move the extra term to WHERE. |
RIGHT JOIN, FULL JOIN, CROSS JOIN ... ON XMATCH(...) |
no — unsupported XMATCH join type 'RIGHT'; acid supports INNER (JOIN) and LEFT JOIN only. Swap sides. |
XMATCH(...) in SELECT / WHERE / etc. |
no — XMATCH is a join predicate, not a scalar function. |
Right-side requirements¶
For every XMATCH, the analyzer checks the right catalog's margin cache at analyze time, before any data is read:
ValidationError: XMATCH right table 'twomass_psc' has no neighbor_path (margin cache) configured
hint: build one with `acid hats build-margin <catalog>`.
ValidationError: XMATCH radius_arcsec=5.0 exceeds 'twomass_psc'`s neighbor_margin_arcsec=1.0; matches near partition boundaries would be silently missed
hint: rebuild the margin cache at a larger radius, or shrink XMATCH radius.
See the crossmatch guide for the why and the fix.
The right catalog must also expose ra_col / dec_col in the registry
(XMATCH right table 'x' has no ra_col/dec_col in the registry), as
must the anchor (anchor table 'a' has no ra_col/dec_col in the registry).
The IN_MOC extension¶
The first argument is a table alias in the query (the side the
predicate is restricting). The second is a string literal naming
either a registered MOC or a registered catalog (in the latter case
ACID auto-loads the catalog's point_map.fits footprint).
Allowed positions¶
IN_MOC(...) is a footprint restriction, not a per-row scalar. The
analyzer allows it in exactly one place:
- A top-level AND-ed term in a
WHEREclause, optionally negated withNOT(any even number ofNOTs is fine —NOT NOT IN_MOC(...)collapses to the positive predicate).
Every other position is rejected:
| Position | Verdict |
|---|---|
WHERE IN_MOC(a, 'des') |
yes |
WHERE IN_MOC(a, 'des') AND mag < 18 |
yes |
WHERE NOT IN_MOC(a, 'mask') |
yes |
WHERE IN_MOC(a, 'des') OR IN_MOC(a, 'delve') |
no — IN_MOC() inside WHERE must sit in a conjunctive position (top-level AND-chain, optionally negated with NOT). Found inside Or; move the predicate to the top-level WHERE or to a SELECT projection (which evaluates per-row). Move the disjunction inside a single combined MOC, or split into two queries and union the results downstream. |
SELECT IN_MOC(a, 'des') AS in_field |
no — IN_MOC() is only supported as a footprint restriction in WHERE (a top-level AND-ed term, optionally negated with NOT) or via the fluent .in_region(...) verb. It is not allowed in SELECT projections, ORDER BY, HAVING, or CASE. |
ORDER BY IN_MOC(a, 'des'), HAVING IN_MOC(...) |
no — same message as above. |
CASE WHEN IN_MOC(a, 'des') THEN ... END |
no — same message as above. |
JOIN b ON IN_MOC(a, 'des') |
no — IN_MOC() is not supported in JOIN ON predicates; move it to WHERE or a SELECT projection. |
IN_MOC(...) inside an inline subquery's WHERE |
no — IN_MOC(...) is not allowed inside an inline subquery's WHERE. Use a top-level WHERE; that path has the fast partition / row-group pruning for MOC predicates. |
The fluent equivalent is Catalog.in_region(...); see
Sky regions & footprints.
MOC name resolution¶
The second argument is resolved in this order:
- A MOC registered explicitly via
Connection.register_moc(name, source)or amocs:section in the YAML config. - A catalog name registered on the connection. The analyzer
lazily loads the catalog's
point_map.fitsfootprint under that name; subsequent uses are cached.
Unknown names raise IN_MOC() references unknown MOC '<name>' with a
"did you mean…?" suggestion drawn from the registered MOC and catalog
names.
Unknown aliases raise IN_MOC() references unknown table alias
'<alias>' (with a "did you mean…?" suggestion from the known
aliases).
Inline subqueries and CTE pre-filters¶
Three positions accept a restricted subquery / CTE form:
- the anchor
FROM, - an
XMATCH JOIN's right-hand side, - an ordinary
JOIN's right-hand side.
The accepted inner shape is exactly:
The WHERE clause is folded into ACID's per-partition pre-filter
plumbing. For XMATCH with mode => 'nearest' this gives "nearest
surviving match" semantics — the matcher only sees passing rows.
The full design rationale (the three-layer validation, why the
predicate grammar is restricted to a SQL ∩ Polars intersection) is in
the archive at docs/archive/SUBQUERY-RHS.md.
Restrictions on the inner SELECT¶
The inner SELECT must be SELECT *. The analyzer rejects (as
ParseError) every other shape:
| Inner clause / construct | Verdict |
|---|---|
SELECT id, ra, dec FROM ... |
no — inline subquery must be 'SELECT *'. Narrower projections are not supported in v1. |
SELECT * EXCEPT (col) FROM ... |
no — inline subquery must be plain 'SELECT *' — EXCLUDE / EXCEPT / REPLACE / RENAME are not supported. |
JOIN, GROUP BY, HAVING |
no — inline subquery cannot contain JOIN (and same for the others). |
ORDER BY, LIMIT, OFFSET |
no — inline subquery cannot contain ORDER BY (and same for the others). |
DISTINCT, QUALIFY, WINDOW, WITH |
no — same shape of error. |
| Nested subqueries / CTEs in body | no — inline subquery's FROM must be a single physical table. |
WITH RECURSIVE |
no — WITH RECURSIVE is not supported; CTEs in acid must reference a registered catalog directly. |
CTE body's FROM is another CTE |
no — CTE 'a' body's FROM ('b') is itself a CTE; CTE-to-CTE references are not supported. |
Restrictions on the inner WHERE¶
The inner WHERE is narrower than the outer one because it is lowered
through Polars' pl.sql_expr for parquet pushdown, which means it has
to stay in the subset Polars can push down to parquet row-group
filters. The allowlist:
| Construct | Allowed |
|---|---|
Column references (mag, flags, ra), literals (numeric / string / bool / null) |
yes |
Comparisons: =, <>, <, <=, >, >= |
yes |
Logical: AND, OR, NOT, parentheses |
yes |
Null-test: IS NULL, IS NOT NULL |
yes |
BETWEEN ... AND ... |
yes |
IN (<literal>, <literal>, ...) |
yes |
Arithmetic: +, -, *, /, %, unary - |
yes |
Function calls (floor(...), sqrt(...), anything else) |
no — function calls (<name>) are not allowed inside an inline subquery's WHERE. |
CAST(x AS type) / x::type |
no — CAST / ::type is not allowed inside an inline subquery's WHERE. |
LIKE, ILIKE, SIMILAR TO |
no — string-matching dialects diverge between engines. |
IN (<subquery>) |
no — IN (<subquery>) is not allowed inside an inline subquery's WHERE; use IN (<literal>, ...) instead. |
| Subqueries | no |
IN_MOC(...) (it has its own fast path at the outer WHERE) |
no |
| Window functions / aggregates | no |
| Correlated references to outer aliases | no — column qualifier 'x' does not refer to this subquery's inner scope. |
The outer (top-level) WHERE has no such allowlist — it runs after the data is in memory and accepts the full SQLGlot-parsed expression set.
If you need a wider inner shape — a narrower projection, a JOIN, an
aggregate — materialize the intermediate with Catalog.save(...),
register it on the connection, and reference it as a plain table in
the outer query.
Aggregates¶
The full task-shaped story is in the aggregation guide. The reference summary:
Decomposable (supported)¶
COUNT, SUM, MIN, MAX, AVG, STDDEV / STDDEV_POP,
VARIANCE / VARIANCE_POP, BOOL_AND, BOOL_OR, plus
BIT_AND / BIT_OR / BIT_XOR as their bitwise-aggregate aliases.
These run as partial aggregation: phase 1 computes per-partition partials (one row per group per partition), phase 2 combines them. The cost scales with the number of groups, not the number of rows.
Non-decomposable (rejected)¶
ValidationError: MEDIAN() cannot be executed across partitions. Supported aggregates: COUNT, SUM, MIN, MAX, AVG, STDDEV, STDDEV_POP, VARIANCE, VARIANCE_POP, BOOL_AND, BOOL_OR.
The rejected family is: MEDIAN, MODE, PERCENTILE_CONT,
PERCENTILE_DISC, QUANTILE, STRING_AGG, GROUP_CONCAT. The
escape hatch is to aggregate down with the decomposable verbs and
finish in Polars or pandas (r.to_polars().group_by(...).agg(pl.col("mag").median())).
Deprecated global-reduce shapes (rejected)¶
These four shapes need every row of every partition at once and have no decomposable form. They are rejected with a rewrite hint, not a fallback:
| Rejected shape | Hint in the error |
|---|---|
SELECT DISTINCT ... |
Project the columns and de-duplicate the materialized result (e.g. db.sql(...).to_polars().unique()), or GROUP BY with an aggregate. |
COUNT(DISTINCT ...) (and other DISTINCT aggregates) |
Compute the distinct count on the materialized result (e.g. db.sql(...).to_polars().n_unique()). |
ORDER BY ... (no LIMIT) |
Add a LIMIT for top-K, or sort the materialized result. |
GROUP BY ... (no aggregate) |
Add an aggregate (COUNT/SUM/AVG/MIN/MAX/...), or de-duplicate the materialized result. |
Window functions¶
Rejected at analyze time. The architecture is partition-local, so a
window like ROW_NUMBER() OVER (PARTITION BY object_id ORDER BY mjd)
that needs to see rows across partitions has no correct execution.
For "per-object" patterns that fit a single partition's rows
(e.g. one row per object, computed from many forced-source rows), the
canonical replacement is GROUP BY object_id with a decomposable
aggregate.
The GROUP BY SELECT-alias gotcha¶
The analyzer resolves GROUP BY terms to physical column names, and
a SELECT alias of a computed expression is not a column. This breaks:
The error is a generic column-resolution failure — ValidationError:
unknown column 'bin' — the analyzer doesn't recognize the alias as
referring to the SELECT expression. (If your traceback shows this
message and you're grouping by an aliased computed expression, this
section is the cause.) The fix on the SQL side is to group by the
expression itself:
The fluent surface has no such limit — group_by("CAST(mag AS BIGINT)
AS bin") recognizes the alias, groups by the underlying expression,
and names the output column bin for you. If you find yourself
writing the same expression on both sides of a GROUP BY, prefer the
fluent surface (see the aggregation guide).
Ordinary joins¶
The right table must be a registered catalog (an unresolved relation
has no on-disk parquet to scan — there is no attached-DB /
table-function backend, so the analyzer rejects it explicitly):
When the right side is an inline subquery whose inner FROM is unregistered:
ValidationError: inline subquery on JOIN right side requires the inner FROM table ('tmp') to be a registered catalog
hint: register the catalog with acid.Registry, or drop the subquery and use a plain table reference.
Both keys in the ON predicate must be integer-ID columns (see
the SQL escape hatch). Use
acid.sql.query(...) for arbitrary join keys; the fluent Catalog.join only
accepts integer-ID columns.
Frame, units, and conventions¶
The dialect rules above are about shape. The data conventions are about meaning:
- All RA/Dec are J2000 (ICRS). No epoch propagation or proper motion is applied. See the crossmatch guide's epoch section.
XMATCH(radius_arcsec => ...)is in arcseconds. Ther => ...alias is the same unit.dist_col => '<name>'produces a float64 column in arcseconds. Nullable on unmatched LEFT-join rows.acid.in_cone(center, radius=...)cones are in degrees, ICRS. The CLI's--cone RA,DEC,RADIUS_DEGflag is the same convention.
See also¶
- Writing SQL (the escape hatch) — worked examples for the SQL surface.
- Crossmatching catalogs — the full XMATCH story, plus the four astronomy-correctness checks.
- Sky regions & footprints — IN_MOC
worked examples and the fluent
in_regionequivalent. - Aggregating — the decomposable-aggregate story, the "why no MEDIAN" rule, and top-K.
- Errors reference — every typed exception, with the concrete error block and the concrete fix.