Errors¶
Every error ACID raises is a subclass of acid.AcidError. The
hierarchy is small and the messages are designed to be acted on
directly — most come with a hint: line pointing at the fix, and many
come with a suggestion: ("did you mean…?") for typo'd identifiers.
This page is the troubleshooting reference: each class has its own section, with the concrete error block you're likely to see and the concrete fix to type next. For symptom-shaped entries ("my query is slow", "no results from a LEFT JOIN") see the troubleshooting guide instead.
import acid
try:
with acid.Connection("/data/hats") as db:
db.sql("SELECT BOGUS FROM doesnt_exist")
except acid.AcidError as e:
print("acid said:", e)
A single except acid.AcidError catches every library failure. Each
subclass is also exported at the package root (acid.ValidationError,
acid.RegistryError, …).
AcidError¶
The base class. Carries an optional query (the original SQL), a
span (line / column / length pointing into query), a hint, and a
suggestion. When printed, it renders a multi-line clang-style
excerpt with a caret pointing at the offending token:
ValidationError: anchor table 'gaia' is not present in the catalog registry
at line 2, column 8:
1 | SELECT a.id, a.ra, a.dec
2 | FROM gaia AS a
| ^^^^
hint: register the catalog (via YAML or Registry.from_directory) before querying it.
did you mean 'gaia_dr3'?
You generally do not catch AcidError directly except as the catch-all
above; the specific subclasses below are what most code wants.
ParseError¶
Raised when the SQL string can't be parsed, or when an extension
(XMATCH, IN_MOC, inline subquery, CTE) is shape-invalid.
The structural rejections in this class are about the AST shape, not
the semantics — those are ValidationError (see below).
Common shapes you'll see:
→ Pass a number, not a column reference. XMATCH(radius_arcsec => 1.0),
not XMATCH(radius_arcsec => a.threshold).
ParseError: XMATCH() requires keyword arguments (e.g. radius_arcsec => 1.0); got positional 1.0
hint: example: JOIN b ON XMATCH(radius_arcsec => 1.0)
→ Use named arguments: XMATCH(radius_arcsec => 1.0), not
XMATCH(1.0).
ParseError: inline subquery must be 'SELECT *' (got [...])
hint: narrower projections (e.g. 'SELECT id, ra', 'SELECT t.*') are not supported in v1; use 'SELECT * FROM <catalog> [WHERE ...]'.
→ The inline subquery / CTE pre-filter form is restricted. See the SQL reference for the full grammar.
ParseError: WITH RECURSIVE is not supported; CTEs in acid must reference a registered catalog directly
→ Recursive CTEs are not supported. Rewrite as a sequence of saved
catalogs (Catalog.save(...)) or compute the recursion outside ACID.
Wraps SQLGlot's own ParseError when the SQL doesn't tokenize / parse at the lexical level. The line / column / token highlight comes through to the printed error.
ValidationError¶
The largest class. Raised when the query parses but is not a shape ACID can execute — an unsupported predicate position, a non-decomposable aggregate, a margin-cache violation, an unknown catalog. Every ValidationError fires before any data is read; you never get a half-written result from a ValidationError.
The most common families:
Crossmatch / XMATCH¶
ValidationError: XMATCH radius_arcsec=5.0 exceeds 'twomass_psc's neighbor_margin_arcsec=1.0; matches near partition boundaries would be silently missed
hint: rebuild the margin cache at a larger radius, or shrink XMATCH radius.
→ Either shrink the radius, or rebuild the right catalog's margin cache at the radius you need:
(see the crossmatch guide for why this is rejected and not just slow).
ValidationError: XMATCH right table 'twomass_psc' has no neighbor_path (margin cache) configured
hint: build one with `acid hats build-margin <catalog>`.
→ Same fix: acid hats build-margin <path-to-catalog>. Some published
HATS catalogs don't ship a margin cache; build one locally once.
ValidationError: XMATCH must be the entire ON predicate; compound predicates like 'XMATCH(...) AND ...' are not supported
hint: move the extra predicate to a WHERE clause.
→ Split the ON:
-- bad
JOIN b ON XMATCH(radius_arcsec => 1.0) AND a.mag < 18
-- good
JOIN b ON XMATCH(radius_arcsec => 1.0)
WHERE a.mag < 18
ValidationError: unsupported XMATCH join type 'RIGHT'; acid supports INNER (JOIN) and LEFT JOIN only
→ Swap the operands and use LEFT JOIN. RIGHT JOIN a ON XMATCH(...)
becomes FROM b LEFT JOIN a ON XMATCH(...).
ValidationError: crossmatch margin shortfall: catalog 'c' can be pulled up to 6" toward the query root (the sum of the match radii above it), but has a margin cache of only 5". Boundary matches would be silently missed — on a LEFT path that fabricates false 'no counterpart' rows. Build a margin cache of radius ≥ 6" (`acid hats build-margin`) for 'c', or reduce the match radius.
→ In a composable (bushy) chain, a leaf reached through nested spatial matches can be pulled toward the root by the sum of the match radii on its path — so it needs a margin cache at least that wide, not just wide enough for the single nearest join. Rebuild the named catalog's margin at the radius the message states, or reduce a match radius on the path.
Equi-join locality¶
An ordinary .join(<Catalog>, on=…) asserts the right catalog is
localized — rows sharing a key lie within its margin-cache radius —
so it requires a spatial index and a margin cache:
ValidationError: equi join: catalog 'lc' has no spatial index (no hats_col_healpix column). The equi-join contract is locality — rows sharing a key lie within the margin radius of their partner — which needs the pixel column. For a position-less lookup table, pass an in-memory frame (broadcast join) instead.
ValidationError: equi join: catalog 'lc' has no margin cache configured. The equi-join contract is locality within the margin-cache radius (0 declares exact-pixel locality); build one with `acid hats build-margin`, or pass an in-memory frame (broadcast join) for a position-less lookup table.
→ For a survey's object/source tables (which are co-partitioned by
design), build the right catalog's margin cache. For a small,
position-less lookup table (an id→label map), don't make it a
catalog at all — pass an in-memory frame to .join(<frame>, on=…), the
broadcast join.
Missing point_map.fits¶
ValidationError: catalog 'gaia_dr3' has no point_map.fits (/data/hats/gaia_dr3). RAM-budget work-tuple sizing needs the row-count map every hats-import (and acid) output carries. Regenerate the catalog with a current importer, or set ACID_WORK_AUTOSIZE=0 to use the legacy layout-driven enumeration.
→ Every HATS catalog in a query needs a point_map.fits of per-cell
row counts (not a 0/1 footprint mask) so the planner can size work
tuples to your RAM budget. Every acid output and current
hats-import catalog carries one; regenerate an older catalog with a
current importer.
IN_MOC / regions¶
ValidationError: IN_MOC() inside WHERE must sit in a conjunctive position (top-level AND-chain, optionally negated with NOT). Found inside Or; move the predicate to the top-level WHERE or to a SELECT projection (which evaluates per-row).
→ IN_MOC inside an OR is rejected. If you need "this MOC or that
MOC", build the union of the two MOCs ahead of time and register it
under one name:
import mocpy
combined = mocpy.MOC.from_fits("des.fits").union(mocpy.MOC.from_fits("delve.fits"))
db.register_moc("des_or_delve", combined)
then write WHERE IN_MOC(a, 'des_or_delve').
ValidationError: IN_MOC() is only supported as a footprint restriction in WHERE (a top-level AND-ed term, optionally negated with NOT) or via the fluent .in_region(...) verb. It is not allowed in SELECT projections, ORDER BY, HAVING, or CASE.
→ Move the IN_MOC call into a WHERE clause. If you need a per-row
"is this point in the footprint?" boolean, compute it after
materialization (load the MOC in Python and test each row).
ValidationError: IN_MOC() references unknown MOC 'twomass'
hint: register it with Session.register_moc() or in the YAML 'mocs:' section, or pass a registered catalog name to auto-load its point_map.fits footprint.
did you mean 'twomass_psc'?
→ Check the name. If it's a catalog with a point_map.fits, use the
catalog name itself — ACID will auto-load it. Otherwise register a
MOC under that name.
Aggregates¶
ValidationError: MEDIAN() cannot be executed across partitions. Supported aggregates: COUNT, SUM, MIN, MAX, AVG, STDDEV, STDDEV_POP, VARIANCE, VARIANCE_POP, BOOL_AND, BOOL_OR.
→ See the aggregation guide's "Why no agg.median?"
section. Aggregate down
with the supported verbs first, then compute the median in Polars or
pandas:
df = (cat.where("band = 'r'")
.select("source_id, mag")
.to_polars())
median_r = df.select(pl.col("mag").median()).item()
ValidationError: SELECT DISTINCT is no longer supported. Project the columns and de-duplicate the materialized result (e.g. `db.sql(...).to_polars().unique()`), or GROUP BY with an aggregate.
→ Two options: replace with GROUP BY ... + COUNT(*), or
de-duplicate the materialized result downstream.
ValidationError: ORDER BY without LIMIT (a full global sort) is no longer supported. Add a LIMIT for top-K, or sort the materialized result.
→ Add a LIMIT K for top-K (it gets pushed to each partition), or
sort the Result locally with r.to_polars().sort(...).
ValidationError: GROUP BY without a supported aggregate is no longer supported. Add an aggregate (COUNT/SUM/AVG/MIN/MAX/...), or de-duplicate the materialized result.
→ Add the aggregate you actually want. A bare GROUP BY is the same
as SELECT DISTINCT; rewrite it explicitly.
→ The architecture is partition-local; no window function can see
across partitions correctly. For "per-object" patterns that fit in
one partition, use GROUP BY <object_id> with a decomposable
aggregate instead.
Registry / unknown names¶
ValidationError: anchor table 'gaia' is not present in the catalog registry
hint: register the catalog (via YAML or Registry.from_directory) before querying it.
did you mean 'gaia_dr3'?
→ Either fix the typo (most common — the "did you mean" suggestion is
usually right), or register the missing catalog: db.register_catalog(
"gaia", path="/data/hats/gaia_dr3"), or list it under catalogs: in
your YAML.
ValidationError: column 'g_mag' not found on catalog 'gaia_dr3'
hint: did you mean 'phot_g_mean_mag'?
→ db.open("<name>").columns lists the available column names, as
does the acid inspect <catalog> --schema CLI.
Output — save vs export¶
ValidationError: save() writes a HATS catalog (a directory tree); 'out.csv' looks like a single-file export. Use .export('out.csv') for a flat CSV / parquet / FITS file, or pass a directory path (trailing '/') to force a HATS tree named 'out.csv'.
→ save writes a queryable HATS catalog directory; a single-file
extension almost always means you wanted export
(one CSV / parquet / FITS file). If you genuinely want a HATS directory
named out.csv, pass a trailing slash: save("out.csv/").
ValidationError: export('matches'): no file extension to infer a format from. Add a recognized extension (.parquet/.pq, .csv, .fits/.fit) or pass format=..., or use .save(...) to write a HATS catalog directory.
→ export writes one flat file and needs to know which: give the path a
recognized extension, pass format="csv" (etc.), or — if you wanted a
queryable HATS catalog — use save instead. export never writes HATS.
Cone / region geometry¶
ValidationError: acid.Connection.in_cone: nested in_cone blocks are not supported (only one cone may be active at a time). Outer cone is in_cone(ra=180, dec=0, r_deg=1). Exit the outer block first, or compose the two regions into a single cone (or MOC via Catalog.in_region) before entering.
→ Only one in_cone block is active at a time. Either exit the outer
block first, or pre-compute a single MOC that covers both regions and
use Catalog.in_region(moc) (which composes).
RegistryError¶
Raised for problems with catalog / MOC registration: missing path,
unreadable HATS layout, missing ra_col / dec_col declarations,
duplicate-name registration with overwrite=False.
Common shapes:
RegistryError: catalog 'gaia_dr3' not found; searched roots [/data/hats, /home/user/data] and registered names. Pass an absolute path, register the catalog via register_catalog(...), or use a YAML config.
→ Either pass an absolute path to db.open(...), register the catalog
explicitly with db.register_catalog("gaia_dr3", path="/some/where"), or
add the path under catalogs: in your YAML.
→ Catalog.save(path, name="rubin_subset", overwrite=True) if you
mean to replace it. Without overwrite=True, an existing name is
not silently shadowed.
RegistryError: save('gxt'): a catalog named 'gxt' already exists at /data/shared/gxt (earlier on ACID_PATH than the writable root /home/user/datasets); saving there would be shadowed — a later acid.open('gxt') would keep resolving the existing one. Pick a different name, or pass an explicit path (e.g. ./gxt). overwrite=True does not override this (the shadowing catalog may be shared / read-only).
→ A bare-name save lands under your first writable ACID_PATH root,
but an earlier (e.g. read-only / shared) root already has a catalog of
that name — so a future acid.open("gxt") would keep finding the
existing one, not yours. Choose a different name, or write to an explicit
path. overwrite=True does not override this: the shadowing catalog
isn't yours to replace.
RegistryError: in_region: catalog 'gaia_dr3' has no point_map.fits at /data/hats/gaia_dr3 — pass a FITS path, a mocpy.MOC, or a catalog with a HATS-standard footprint.
→ Some HATS catalogs ship without a point_map.fits. Use an
explicitly registered MOC instead: db.register_moc("gaia_footprint",
"/path/to/footprint.fits").
ExecutionError¶
Raised when a per-partition execution failed — corrupt parquet, OOM, out-of-disk, a Polars error the engine couldn't avoid. The first failure aborts the whole job (ACID does not collect 5000 identical errors). Engine-side errors are translated to this class before they reach the user, so the message is engine-neutral.
→ Common causes the user can act on:
- A corrupt parquet file — re-download the catalog, or run
acid validate <query>to confirm the query is fine and the data isn't. -
OOM on a big result. ACID spills phase-1 partials past
inmem_row_limit(default 50 M rows) automatically, so the first move is not to lowerworkers. Reach for one of these, in order:- Stream the result with
r.batches()instead of materializing it whole. The spill is already disk-backed;r.to_pandas()/r.to_polars()load it into memory and is what blows up. Replace with a loop overr.batches(...)and process in chunks. See Working with results — Streaming. - Lower
inmem_row_limitso phase 1 spills earlier and the phase-2 reduce stays disk-backed end-to-end:acid.Connection(..., inmem_row_limit=10_000_000). See Performance — Memory & spill. - Then reduce
workersif the OOM is in phase 1 (the per-partition working set fits in RAM ×workers, so fewer workers ⇒ less concurrent RSS).--workers 1is the memory-tightest mode.
- Stream the result with
-
Disk-full — check
df -hon the connection's tmpdir (the spill directory) and on the--outputtarget. - A Polars-level type error — usually a
where(...)predicate that doesn't match the schema. Runacid validate(orConnection.validate(...)) to surface analyze-time errors; if validate is clean and execute fails, the schema and predicate are disagreeing —db.open(name).describe()to inspect.
This is the one error class where the first failure is not the same
as the underlying cause. If you suspect intermittent failures, run
with workers=1 to serialize and isolate.
OutputError¶
Raised when the output sink fails — disk full mid-write, write
permission lost, schema mismatch in a streamed Parquet write,
malformed --output path. Distinct from ExecutionError, which is
about the engine / partition side.
→ Check disk space (df -h <output-dir>), write permission, and that
the parent directory exists. For a HATS-format output (--format hats
or Catalog.save(...)), the target must not already exist unless you
pass overwrite=True / --force.
ConnectionClosedError¶
Raised when a Catalog is used after the Connection it was opened on
has been closed or garbage-collected — most often after acid.shutdown()
(or exiting a with acid.Connection(...) as db: block).
ConnectionClosedError: Catalog's Connection has been closed or garbage collected; open a new connection (acid.open(...) or acid.Connection(...)) and re-create the query
ConnectionClosedError: Connection has been closed; open a new one via acid.init(...) or acid.Connection(...)
→ With the module-level API the default Connection lazy-reinits on the next call, so the common case is just: re-open the catalog and rebuild the query.
import acid
cat = acid.open("gaia_dr3").where("phot_g_mean_mag < 18")
acid.shutdown() # tears the default Connection down
cat.to_polars() # raises — its Connection is gone
cat = acid.open("gaia_dr3").where("phot_g_mean_mag < 18") # re-open after shutdown
df = cat.to_polars() # fine — lazy-reinit built a fresh default
Catalog handles do not outlive their Connection. With an explicit
acid.Connection(...), keep your materialization calls inside the with
block; to share a result across processes or sessions, Catalog.save(...)
it to disk and re-open it later.
ConfigError¶
Raised for problems with the acid.conf config layer: a --config /
ACID_CONFIG pointing at a missing file, an acid.conf that fails
to parse, a malformed setting value (e.g. workers that is neither
an int nor auto), or an unknown key passed to acid config get/set/unset.
→ Fix the syntax in the offending file, or use acid config set
workers 8 (or auto) to write a known-good value. Run acid config
list to see the parsed settings layer-by-layer.
Class reference¶
The error classes themselves, for completeness. Each inherits from
AcidError and shares its constructor (message, query=, span=,
hint=, suggestion=).
acid.AcidError ¶
Bases: Exception
Base class for all acid errors.
Subclasses (ParseError, ValidationError etc.) inherit the
same constructor and renderer. Library callers can catch
AcidError to handle every acid-originated failure uniformly.
from_sqlglot
classmethod
¶
Wrap a sqlglot.errors.ParseError into an acid ParseError.
Extracts line/col/description/highlight from the
SQLGlot exception's errors list (see sqlglot/errors.py:37-71).
Falls back to str(exc) when no structured info is available.
from_node
classmethod
¶
from_node(query: Optional[str], node: 'Optional[exp.Expression]', message: str, *, hint: Optional[str] = None, suggestion: Optional[str] = None) -> 'AcidError'
Construct with position resolved from a SQLGlot AST node.
SQLGlot 30.x does not attach line/col to expressions, so we
re-tokenize query and find the token sequence that matches
node. When unresolvable (or when query is unavailable),
span is left None and the renderer omits the caret.
acid.ValidationError ¶
Bases: AcidError
Query violates acid semantics (unsupported predicate, join type, unknown column, etc.).
acid.ExecutionError ¶
acid.OutputError ¶
Bases: AcidError
Output sink failed (disk full, write permission lost, schema
mismatch in a streamed Parquet write, etc.). Distinct from
ExecutionError, which describes engine-side or
partition-execution failures.
acid.ConnectionClosedError ¶
Bases: AcidError
Raised when a Catalog (or other Connection-bound handle)
is used after its owning Connection was closed or GC'd.
acid.ConfigError ¶
Bases: AcidError
Raised for problems with the acid.conf config layer: a
--config / ACID_CONFIG pointing at a missing file, an
acid.conf that fails to parse, a malformed setting value (e.g.
workers that is neither an int nor auto), or an unknown key
passed to acid config get/set/unset. See docs/archive/CONFIG-SYSTEM.md.
See also¶
- Troubleshooting — symptom-shaped entries ("my query is slow", "no results from a LEFT JOIN with XMATCH").
- SQL features — the canonical reference for which query shapes raise which errors at analyze time.
- Crossmatching catalogs — the margin-cache and J2000 rules in their natural setting.
- Aggregating — the decomposable / non- decomposable distinction.