Errors¶

Every error ACID raises is a subclass of acid.AcidError. The hierarchy is small and the messages are designed to be acted on directly — most come with a hint: line pointing at the fix, and many come with a suggestion: ("did you mean…?") for typo'd identifiers.

This page is the troubleshooting reference: each class has its own section, with the concrete error block you're likely to see and the concrete fix to type next. For symptom-shaped entries ("my query is slow", "no results from a LEFT JOIN") see the troubleshooting guide instead.

import acid

try:
    with acid.Connection("/data/hats") as db:
        db.sql("SELECT BOGUS FROM doesnt_exist")
except acid.AcidError as e:
    print("acid said:", e)

A single except acid.AcidError catches every library failure. Each subclass is also exported at the package root (acid.ValidationError, acid.RegistryError, …).

`AcidError`¶

The base class. Carries an optional query (the original SQL), a span (line / column / length pointing into query), a hint, and a suggestion. When printed, it renders a multi-line clang-style excerpt with a caret pointing at the offending token:

ValidationError: anchor table 'gaia' is not present in the catalog registry
  at line 2, column 8:
   1 | SELECT a.id, a.ra, a.dec
   2 | FROM   gaia AS a
       |        ^^^^
  hint: register the catalog (via YAML or Registry.from_directory) before querying it.
  did you mean 'gaia_dr3'?

You generally do not catch AcidError directly except as the catch-all above; the specific subclasses below are what most code wants.

`ParseError`¶

Raised when the SQL string can't be parsed, or when an extension (XMATCH, IN_MOC, inline subquery, CTE) is shape-invalid.

The structural rejections in this class are about the AST shape, not the semantics — those are ValidationError (see below).

Common shapes you'll see:

ParseError: only literal values are accepted in XMATCH() kwargs; got a.mag

→ Pass a number, not a column reference. XMATCH(radius_arcsec => 1.0), not XMATCH(radius_arcsec => a.threshold).

ParseError: XMATCH() requires keyword arguments (e.g. radius_arcsec => 1.0); got positional 1.0
hint: example: JOIN b ON XMATCH(radius_arcsec => 1.0)

→ Use named arguments: XMATCH(radius_arcsec => 1.0), not XMATCH(1.0).

ParseError: inline subquery must be 'SELECT *' (got [...])
hint: narrower projections (e.g. 'SELECT id, ra', 'SELECT t.*') are not supported in v1; use 'SELECT * FROM <catalog> [WHERE ...]'.

→ The inline subquery / CTE pre-filter form is restricted. See the SQL reference for the full grammar.

ParseError: WITH RECURSIVE is not supported; CTEs in acid must reference a registered catalog directly

→ Recursive CTEs are not supported. Rewrite as a sequence of saved catalogs (Catalog.save(...)) or compute the recursion outside ACID.

Wraps SQLGlot's own ParseError when the SQL doesn't tokenize / parse at the lexical level. The line / column / token highlight comes through to the printed error.

`ValidationError`¶

The largest class. Raised when the query parses but is not a shape ACID can execute — an unsupported predicate position, a non-decomposable aggregate, a margin-cache violation, an unknown catalog. Every ValidationError fires before any data is read; you never get a half-written result from a ValidationError.

The most common families:

Crossmatch / XMATCH¶

ValidationError: XMATCH radius_arcsec=5.0 exceeds 'twomass_psc's neighbor_margin_arcsec=1.0; matches near partition boundaries would be silently missed
hint: rebuild the margin cache at a larger radius, or shrink XMATCH radius.

→ Either shrink the radius, or rebuild the right catalog's margin cache at the radius you need:

acid hats build-margin /path/to/twomass_psc --margin-arcsec 10

(see the crossmatch guide for why this is rejected and not just slow).

ValidationError: XMATCH right table 'twomass_psc' has no neighbor_path (margin cache) configured
hint: build one with `acid hats build-margin <catalog>`.

→ Same fix: acid hats build-margin <path-to-catalog>. Some published HATS catalogs don't ship a margin cache; build one locally once.

ValidationError: XMATCH must be the entire ON predicate; compound predicates like 'XMATCH(...) AND ...' are not supported
hint: move the extra predicate to a WHERE clause.

→ Split the ON:

-- bad
JOIN b ON XMATCH(radius_arcsec => 1.0) AND a.mag < 18
-- good
JOIN b ON XMATCH(radius_arcsec => 1.0)
WHERE a.mag < 18

ValidationError: unsupported XMATCH join type 'RIGHT'; acid supports INNER (JOIN) and LEFT JOIN only

→ Swap the operands and use LEFT JOIN. RIGHT JOIN a ON XMATCH(...) becomes FROM b LEFT JOIN a ON XMATCH(...).

ValidationError: crossmatch margin shortfall: catalog 'c' can be pulled up to 6" toward the query root (the sum of the match radii above it), but has a margin cache of only 5". Boundary matches would be silently missed — on a LEFT path that fabricates false 'no counterpart' rows. Build a margin cache of radius ≥ 6" (`acid hats build-margin`) for 'c', or reduce the match radius.

→ In a composable (bushy) chain, a leaf reached through nested spatial matches can be pulled toward the root by the sum of the match radii on its path — so it needs a margin cache at least that wide, not just wide enough for the single nearest join. Rebuild the named catalog's margin at the radius the message states, or reduce a match radius on the path.

Equi-join locality¶

An ordinary .join(<Catalog>, on=…) asserts the right catalog is localized — rows sharing a key lie within its margin-cache radius — so it requires a spatial index and a margin cache:

ValidationError: equi join: catalog 'lc' has no spatial index (no hats_col_healpix column). The equi-join contract is locality — rows sharing a key lie within the margin radius of their partner — which needs the pixel column. For a position-less lookup table, pass an in-memory frame (broadcast join) instead.

ValidationError: equi join: catalog 'lc' has no margin cache configured. The equi-join contract is locality within the margin-cache radius (0 declares exact-pixel locality); build one with `acid hats build-margin`, or pass an in-memory frame (broadcast join) for a position-less lookup table.

→ For a survey's object/source tables (which are co-partitioned by design), build the right catalog's margin cache. For a small, position-less lookup table (an id→label map), don't make it a catalog at all — pass an in-memory frame to .join(<frame>, on=…), the broadcast join.

Missing `point_map.fits`¶

ValidationError: catalog 'gaia_dr3' has no point_map.fits (/data/hats/gaia_dr3). RAM-budget work-tuple sizing needs the row-count map every hats-import (and acid) output carries. Regenerate the catalog with a current importer, or set ACID_WORK_AUTOSIZE=0 to use the legacy layout-driven enumeration.

→ Every HATS catalog in a query needs a point_map.fits of per-cell row counts (not a 0/1 footprint mask) so the planner can size work tuples to your RAM budget. Every acid output and current hats-import catalog carries one; regenerate an older catalog with a current importer.

`IN_MOC` / regions¶

ValidationError: IN_MOC() inside WHERE must sit in a conjunctive position (top-level AND-chain, optionally negated with NOT). Found inside Or; move the predicate to the top-level WHERE or to a SELECT projection (which evaluates per-row).

→ IN_MOC inside an OR is rejected. If you need "this MOC or that MOC", build the union of the two MOCs ahead of time and register it under one name:

import mocpy
combined = mocpy.MOC.from_fits("des.fits").union(mocpy.MOC.from_fits("delve.fits"))
db.register_moc("des_or_delve", combined)

then write WHERE IN_MOC(a, 'des_or_delve').

ValidationError: IN_MOC() is only supported as a footprint restriction in WHERE (a top-level AND-ed term, optionally negated with NOT) or via the fluent .in_region(...) verb. It is not allowed in SELECT projections, ORDER BY, HAVING, or CASE.

→ Move the IN_MOC call into a WHERE clause. If you need a per-row "is this point in the footprint?" boolean, compute it after materialization (load the MOC in Python and test each row).

ValidationError: IN_MOC() references unknown MOC 'twomass'
hint: register it with Session.register_moc() or in the YAML 'mocs:' section, or pass a registered catalog name to auto-load its point_map.fits footprint.
did you mean 'twomass_psc'?

→ Check the name. If it's a catalog with a point_map.fits, use the catalog name itself — ACID will auto-load it. Otherwise register a MOC under that name.

Aggregates¶

ValidationError: MEDIAN() cannot be executed across partitions. Supported aggregates: COUNT, SUM, MIN, MAX, AVG, STDDEV, STDDEV_POP, VARIANCE, VARIANCE_POP, BOOL_AND, BOOL_OR.

→ See the aggregation guide's "Why no agg.median?" section. Aggregate down with the supported verbs first, then compute the median in Polars or pandas:

df = (cat.where("band = 'r'")
         .select("source_id, mag")
         .to_polars())
median_r = df.select(pl.col("mag").median()).item()

ValidationError: SELECT DISTINCT is no longer supported. Project the columns and de-duplicate the materialized result (e.g. `db.sql(...).to_polars().unique()`), or GROUP BY with an aggregate.

→ Two options: replace with GROUP BY ... + COUNT(*), or de-duplicate the materialized result downstream.

ValidationError: ORDER BY without LIMIT (a full global sort) is no longer supported. Add a LIMIT for top-K, or sort the materialized result.

→ Add a LIMIT K for top-K (it gets pushed to each partition), or sort the Result locally with r.to_polars().sort(...).

ValidationError: GROUP BY without a supported aggregate is no longer supported. Add an aggregate (COUNT/SUM/AVG/MIN/MAX/...), or de-duplicate the materialized result.

→ Add the aggregate you actually want. A bare GROUP BY is the same as SELECT DISTINCT; rewrite it explicitly.

ValidationError: Window functions are not supported across partitioned catalogs.

→ The architecture is partition-local; no window function can see across partitions correctly. For "per-object" patterns that fit in one partition, use GROUP BY <object_id> with a decomposable aggregate instead.

Registry / unknown names¶

ValidationError: anchor table 'gaia' is not present in the catalog registry
hint: register the catalog (via YAML or Registry.from_directory) before querying it.
did you mean 'gaia_dr3'?

→ Either fix the typo (most common — the "did you mean" suggestion is usually right), or register the missing catalog: db.register_catalog( "gaia", path="/data/hats/gaia_dr3"), or list it under catalogs: in your YAML.

ValidationError: column 'g_mag' not found on catalog 'gaia_dr3'
hint: did you mean 'phot_g_mean_mag'?

→ db.open("<name>").columns lists the available column names, as does the acid inspect <catalog> --schema CLI.

Output — `save` vs `export`¶

ValidationError: save() writes a HATS catalog (a directory tree); 'out.csv' looks like a single-file export. Use .export('out.csv') for a flat CSV / parquet / FITS file, or pass a directory path (trailing '/') to force a HATS tree named 'out.csv'.

→ save writes a queryable HATS catalog directory; a single-file extension almost always means you wanted export (one CSV / parquet / FITS file). If you genuinely want a HATS directory named out.csv, pass a trailing slash: save("out.csv/").

ValidationError: export('matches'): no file extension to infer a format from. Add a recognized extension (.parquet/.pq, .csv, .fits/.fit) or pass format=..., or use .save(...) to write a HATS catalog directory.

→ export writes one flat file and needs to know which: give the path a recognized extension, pass format="csv" (etc.), or — if you wanted a queryable HATS catalog — use save instead. export never writes HATS.

Cone / region geometry¶

ValidationError: acid.Connection.in_cone: nested in_cone blocks are not supported (only one cone may be active at a time). Outer cone is in_cone(ra=180, dec=0, r_deg=1). Exit the outer block first, or compose the two regions into a single cone (or MOC via Catalog.in_region) before entering.

→ Only one in_cone block is active at a time. Either exit the outer block first, or pre-compute a single MOC that covers both regions and use Catalog.in_region(moc) (which composes).

`RegistryError`¶

Raised for problems with catalog / MOC registration: missing path, unreadable HATS layout, missing ra_col / dec_col declarations, duplicate-name registration with overwrite=False.

Common shapes:

RegistryError: catalog 'gaia_dr3' not found; searched roots [/data/hats, /home/user/data] and registered names. Pass an absolute path, register the catalog via register_catalog(...), or use a YAML config.

→ Either pass an absolute path to db.open(...), register the catalog explicitly with db.register_catalog("gaia_dr3", path="/some/where"), or add the path under catalogs: in your YAML.

RegistryError: name 'rubin_subset' already registered on this connection

→ Catalog.save(path, name="rubin_subset", overwrite=True) if you mean to replace it. Without overwrite=True, an existing name is not silently shadowed.

RegistryError: save('gxt'): a catalog named 'gxt' already exists at /data/shared/gxt (earlier on ACID_PATH than the writable root /home/user/datasets); saving there would be shadowed — a later acid.open('gxt') would keep resolving the existing one. Pick a different name, or pass an explicit path (e.g. ./gxt). overwrite=True does not override this (the shadowing catalog may be shared / read-only).

→ A bare-name save lands under your first writable ACID_PATH root, but an earlier (e.g. read-only / shared) root already has a catalog of that name — so a future acid.open("gxt") would keep finding the existing one, not yours. Choose a different name, or write to an explicit path. overwrite=True does not override this: the shadowing catalog isn't yours to replace.

RegistryError: in_region: catalog 'gaia_dr3' has no point_map.fits at /data/hats/gaia_dr3 — pass a FITS path, a mocpy.MOC, or a catalog with a HATS-standard footprint.

→ Some HATS catalogs ship without a point_map.fits. Use an explicitly registered MOC instead: db.register_moc("gaia_footprint", "/path/to/footprint.fits").

`ExecutionError`¶

Raised when a per-partition execution failed — corrupt parquet, OOM, out-of-disk, a Polars error the engine couldn't avoid. The first failure aborts the whole job (ACID does not collect 5000 identical errors). Engine-side errors are translated to this class before they reach the user, so the message is engine-neutral.

ExecutionError: failed to read partition Norder=5/Dir=0/Npix=42.parquet: <underlying message>

→ Common causes the user can act on:

A corrupt parquet file — re-download the catalog, or run acid validate <query> to confirm the query is fine and the data isn't.
OOM on a big result. ACID spills phase-1 partials past inmem_row_limit (default 50 M rows) automatically, so the first move is not to lower workers. Reach for one of these, in order:
1. Stream the result with r.batches() instead of materializing it whole. The spill is already disk-backed; r.to_pandas() / r.to_polars() load it into memory and is what blows up. Replace with a loop over r.batches(...) and process in chunks. See Working with results — Streaming.
2. Lower inmem_row_limit so phase 1 spills earlier and the phase-2 reduce stays disk-backed end-to-end: acid.Connection(..., inmem_row_limit=10_000_000). See Performance — Memory & spill.
3. Then reduce workers if the OOM is in phase 1 (the per-partition working set fits in RAM × workers, so fewer workers ⇒ less concurrent RSS). --workers 1 is the memory-tightest mode.
Disk-full — check df -h on the connection's tmpdir (the spill directory) and on the --output target.
A Polars-level type error — usually a where(...) predicate that doesn't match the schema. Run acid validate (or Connection.validate(...)) to surface analyze-time errors; if validate is clean and execute fails, the schema and predicate are disagreeing — db.open(name).describe() to inspect.

This is the one error class where the first failure is not the same as the underlying cause. If you suspect intermittent failures, run with workers=1 to serialize and isolate.

`OutputError`¶

Raised when the output sink fails — disk full mid-write, write permission lost, schema mismatch in a streamed Parquet write, malformed --output path. Distinct from ExecutionError, which is about the engine / partition side.

OutputError: write to '/tmp/out.parquet' failed: <underlying message>

→ Check disk space (df -h <output-dir>), write permission, and that the parent directory exists. For a HATS-format output (--format hats or Catalog.save(...)), the target must not already exist unless you pass overwrite=True / --force.

`ConnectionClosedError`¶

Raised when a Catalog is used after the Connection it was opened on has been closed or garbage-collected — most often after acid.shutdown() (or exiting a with acid.Connection(...) as db: block).

ConnectionClosedError: Catalog's Connection has been closed or garbage collected; open a new connection (acid.open(...) or acid.Connection(...)) and re-create the query

ConnectionClosedError: Connection has been closed; open a new one via acid.init(...) or acid.Connection(...)

→ With the module-level API the default Connection lazy-reinits on the next call, so the common case is just: re-open the catalog and rebuild the query.

import acid

cat = acid.open("gaia_dr3").where("phot_g_mean_mag < 18")
acid.shutdown()                 # tears the default Connection down
cat.to_polars()                 # raises — its Connection is gone

cat = acid.open("gaia_dr3").where("phot_g_mean_mag < 18")  # re-open after shutdown
df = cat.to_polars()            # fine — lazy-reinit built a fresh default

Catalog handles do not outlive their Connection. With an explicit acid.Connection(...), keep your materialization calls inside the with block; to share a result across processes or sessions, Catalog.save(...) it to disk and re-open it later.

`ConfigError`¶

Raised for problems with the acid.conf config layer: a --config / ACID_CONFIG pointing at a missing file, an acid.conf that fails to parse, a malformed setting value (e.g. workers that is neither an int nor auto), or an unknown key passed to acid config get/set/unset.

ConfigError: could not parse config file /home/user/.config/acid/acid.conf: <underlying message>

ConfigError: workers must be an integer or 'auto', got 'eight'

→ Fix the syntax in the offending file, or use acid config set workers 8 (or auto) to write a known-good value. Run acid config list to see the parsed settings layer-by-layer.

Class reference¶

The error classes themselves, for completeness. Each inherits from AcidError and shares its constructor (message, query=, span=, hint=, suggestion=).

acid.AcidError ¶

Bases: Exception

Base class for all acid errors.

Subclasses (ParseError, ValidationError etc.) inherit the same constructor and renderer. Library callers can catch AcidError to handle every acid-originated failure uniformly.

from_sqlglot `classmethod` ¶

from_sqlglot(query: str, exc: BaseException) -> 'ParseError'

Wrap a sqlglot.errors.ParseError into an acid ParseError.

Extracts line/col/description/highlight from the SQLGlot exception's errors list (see sqlglot/errors.py:37-71). Falls back to str(exc) when no structured info is available.

from_node `classmethod` ¶

from_node(query: Optional[str], node: 'Optional[exp.Expression]', message: str, *, hint: Optional[str] = None, suggestion: Optional[str] = None) -> 'AcidError'

Construct with position resolved from a SQLGlot AST node.

SQLGlot 30.x does not attach line/col to expressions, so we re-tokenize query and find the token sequence that matches node. When unresolvable (or when query is unavailable), span is left None and the renderer omits the caret.

acid.ParseError ¶

Bases: AcidError

SQL parse failures (syntactic).

acid.ValidationError ¶

Bases: AcidError

Query violates acid semantics (unsupported predicate, join type, unknown column, etc.).

acid.RegistryError ¶

Bases: AcidError

Registry / catalog metadata problems.

acid.ExecutionError ¶

Bases: AcidError

Per-partition execution failures (engine-translated where possible).

acid.OutputError ¶

Bases: AcidError

Output sink failed (disk full, write permission lost, schema mismatch in a streamed Parquet write, etc.). Distinct from ExecutionError, which describes engine-side or partition-execution failures.

acid.ConnectionClosedError ¶

Bases: AcidError

Raised when a Catalog (or other Connection-bound handle) is used after its owning Connection was closed or GC'd.

acid.ConfigError ¶

Bases: AcidError

Raised for problems with the acid.conf config layer: a --config / ACID_CONFIG pointing at a missing file, an acid.conf that fails to parse, a malformed setting value (e.g. workers that is neither an int nor auto), or an unknown key passed to acid config get/set/unset. See docs/archive/CONFIG-SYSTEM.md.

Errors¶

AcidError¶

ParseError¶

ValidationError¶

Crossmatch / XMATCH¶

Equi-join locality¶

Missing point_map.fits¶

IN_MOC / regions¶

Aggregates¶

Registry / unknown names¶

Output — save vs export¶

Cone / region geometry¶

RegistryError¶

ExecutionError¶

OutputError¶

ConnectionClosedError¶

ConfigError¶

Class reference¶

acid.AcidError ¶

from_sqlglot classmethod ¶

from_node classmethod ¶

acid.ParseError ¶

acid.ValidationError ¶

acid.RegistryError ¶

acid.ExecutionError ¶

acid.OutputError ¶

acid.ConnectionClosedError ¶

acid.ConfigError ¶

See also¶

`AcidError`¶

`ParseError`¶

`ValidationError`¶

Missing `point_map.fits`¶

`IN_MOC` / regions¶

Output — `save` vs `export`¶

`RegistryError`¶

`ExecutionError`¶

`OutputError`¶

`ConnectionClosedError`¶

`ConfigError`¶

from_sqlglot `classmethod` ¶

from_node `classmethod` ¶