Python API¶
Stub
Final content will be auto-generated from docstrings via
mkdocstrings. The placeholder directives below show what the
generated page will look like once the docs build runs against an
installed acid.
Top-level functions¶
acid.connect ¶
connect(catalogs: Union[str, Path, Registry, dict, None], *, workers: int = 1, duckdb_threads: Optional[int] = None, cache_dir: Optional[Union[str, Path]] = None, engine: str = 'duckdb') -> Session
Open a persistent acid Session.
Use as a context manager:
with acid.connect("catalogs.yaml", workers=8) as s:
r1 = s.sql("SELECT a.id FROM a JOIN b ON XMATCH(r => 1.0)")
r2 = s.sql("SELECT COUNT(*) FROM a JOIN b ON XMATCH(r => 0.5)")
Parameters¶
engine
"duckdb" (default) or "polars".
acid.sql ¶
sql(query: str, *, catalogs: Union[str, Path, Registry, dict, None] = None, output: Union[str, Path, None] = None, workers: int = 1, duckdb_threads: Optional[int] = None, inmem_row_limit: int = 50000000, engine: str = 'duckdb') -> Result
One-shot convenience: opens a transient Session for one query.
Returns an :class:acid.Result. For repeated queries against the
same catalogs, prefer :func:acid.connect to amortize pool and
DuckDB-connection setup costs.
acid.run ¶
run(query: str, *, catalogs: Union[str, Path, Registry, dict, None] = None, output: Union[str, Path, None] = None, workers: int = 1, duckdb_threads: Optional[int] = None, engine: str = 'duckdb') -> ExecutionResult
One-shot lower-level entry: returns the raw ExecutionResult.
Useful when the caller needs per-partition failure visibility or wants to defer phase-2 reduction.
Session¶
acid.Session ¶
Persistent acid execution context.
Parameters¶
catalogs
YAML path, dict, or pre-built :class:Registry.
workers
Process-pool size. workers <= 1 runs in-process.
duckdb_threads
DuckDB SET threads per worker. Defaults to
cpu_count // workers to avoid thread oversubscription.
cache_dir
Where materialize(...) writes derived catalogs. When omitted,
a tempdir is created and cleaned up on close().
add_catalog ¶
Add (or replace) a catalog at runtime.
register_moc ¶
Register a MOC footprint by name. Subsequent queries can filter
rows by it via WHERE IN_MOC(<alias>, '<name>').
source is a FITS file path, an in-memory mocpy.MOC, or an
(N, 2) numpy array of order-29 [lo, hi) ranges.
explain ¶
Return the per-partition SQL the rewriter would emit for the anchor's first partition. Useful for debugging the rewrite.
sql ¶
sql(query: str, *, output: Optional[Union[str, Path]] = None, inmem_row_limit: int = 50000000) -> Result
Execute a query and return an :class:acid.Result.
When output is provided, per-partition Parquet files are
written there and the returned Result is backed by disk.
Queries with global aggregates / ORDER BY / LIMIT trigger the
phase-2 reducer over the per-partition output.
run ¶
Lower-level: returns the raw :class:ExecutionResult so the
caller can inspect per-partition failures.
materialize ¶
materialize(name: str, query: str, *, ra_col: Optional[str] = None, dec_col: Optional[str] = None, overwrite: bool = False) -> TableSpec
Run query, write the result as a HATS catalog under
cache_dir/<name>, and register it as a catalog usable in
subsequent sql() calls.
Returns the registered :class:TableSpec.
Parameters¶
ra_col, dec_col
Override the inherited anchor coords. Defaults to the anchor
query's anchor ra/dec column names.
overwrite
If True, delete the existing directory at cache_dir/name
before writing.
Result¶
acid.Result
dataclass
¶
A materialized query result.
Backing storage is one of
- in-memory
pa.Table(_table is not None), or - a directory of per-partition Parquet files
(
_output_dir is not None), typically a HATS catalog.
Most callers should use .arrow(), .df(), or .to_polars();
legacy code that treated the result as a pa.Table continues to
work via the passthrough surface (num_rows, column_names,
column(name), to_pandas(), to_pylist()).
arrow ¶
Return the result as an in-memory pa.Table.
Loads from disk if the result was spilled or written to a HATS output directory.
batches ¶
Iterate over pa.RecordBatch chunks.
For in-memory results, returns the table's existing batches (or
rebatches if batch_size is set). For disk-backed results,
streams Parquet without materializing the union in memory.
write_parquet ¶
Write the result to path.
With layout='hats', path becomes a HATS catalog directory
(mirrors the per-partition Parquet output and HATS metadata files).
With layout='single', path is a single Parquet file
containing the row union.
head ¶
Return a new Result containing the first n rows
(after applying whatever ORDER BY the original query had).
Registry¶
acid.Registry ¶
Resolve catalog names to TableSpec, with HATS auto-detection.
Optionally also holds named MOC footprints, registered via
:meth:register_moc or a top-level mocs: section in the YAML.
They're looked up by the analyzer when it encounters IN_MOC()
predicates in a query.
from_directory
classmethod
¶
Auto-discover HATS catalogs in subdirectories of path.
Each subdirectory containing properties or hats.properties
becomes a table named after the directory. point_map.fits files
are auto-registered as MOCs.
register_moc ¶
register_moc(name: str, source: Union[str, Path, 'mocpy.MOC', 'np.ndarray', 'MocSpec']) -> 'MocSpec'
Register a MOC by name. source is a FITS path, an in-memory
mocpy.MOC, an (N, 2) numpy array of order-29 [lo, hi)
ranges, or an already-built :class:MocSpec.
get_moc ¶
Return the MOC registered as name, lazily falling back to a
registered catalog's point_map.fits footprint when no explicit
registration matches.
Resolution order
- Explicitly registered MOC named
name. - Registered catalog named
namewhose<path>/point_map.fitsexists — auto-loaded and cached undernameso subsequent lookups are free. - Otherwise raise
RegistryErrornaming both attempts.
catalog_footprint ¶
Return the catalog's footprint MOC loaded from point_map.fits,
cached on first access. None when the catalog isn't registered
or has no point_map.fits. Used by the analyzer to scope IN_MOC
predicates to cells where the catalog actually has data —
independent of Registry._mocs so explicit registrations don't
shadow the catalog's own footprint.
has_moc ¶
Cheap pre-execution check: would get_moc(name) succeed?
Stats the catalog's point_map.fits for the auto-resolution
path but doesn't read it.
Errors¶
acid.AcidError ¶
Bases: Exception
Base class for all acid errors.
Subclasses (ParseError, ValidationError etc.) inherit the
same constructor and renderer. Library callers can catch
AcidError to handle every acid-originated failure uniformly.
from_sqlglot
classmethod
¶
Wrap a sqlglot.errors.ParseError into an acid ParseError.
Extracts line/col/description/highlight from the
SQLGlot exception's errors list (see sqlglot/errors.py:37-71).
Falls back to str(exc) when no structured info is available.
from_node
classmethod
¶
from_node(query: Optional[str], node: 'Optional[exp.Expression]', message: str, *, hint: Optional[str] = None, suggestion: Optional[str] = None) -> 'AcidError'
Construct with position resolved from a SQLGlot AST node.
SQLGlot 30.x does not attach line/col to expressions, so we
re-tokenize query and find the token sequence that matches
node. When unresolvable (or when query is unavailable),
span is left None and the renderer omits the caret.
acid.ValidationError ¶
Bases: AcidError
Query violates acid semantics (unsupported predicate, join type, unknown column, etc.).