Python API

Stub

Final content will be auto-generated from docstrings via mkdocstrings. The placeholder directives below show what the generated page will look like once the docs build runs against an installed acid.

Top-level functions

acid.connect

connect(catalogs: Union[str, Path, Registry, dict, None], *, workers: int = 1, duckdb_threads: Optional[int] = None, cache_dir: Optional[Union[str, Path]] = None, engine: str = 'duckdb') -> Session

Open a persistent acid Session.

Use as a context manager:

with acid.connect("catalogs.yaml", workers=8) as s:
    r1 = s.sql("SELECT a.id FROM a JOIN b ON XMATCH(r => 1.0)")
    r2 = s.sql("SELECT COUNT(*) FROM a JOIN b ON XMATCH(r => 0.5)")

Parameters

engine "duckdb" (default) or "polars".

acid.sql

sql(query: str, *, catalogs: Union[str, Path, Registry, dict, None] = None, output: Union[str, Path, None] = None, workers: int = 1, duckdb_threads: Optional[int] = None, inmem_row_limit: int = 50000000, engine: str = 'duckdb') -> Result

One-shot convenience: opens a transient Session for one query.

Returns an :class:acid.Result. For repeated queries against the same catalogs, prefer :func:acid.connect to amortize pool and DuckDB-connection setup costs.

acid.run

run(query: str, *, catalogs: Union[str, Path, Registry, dict, None] = None, output: Union[str, Path, None] = None, workers: int = 1, duckdb_threads: Optional[int] = None, engine: str = 'duckdb') -> ExecutionResult

One-shot lower-level entry: returns the raw ExecutionResult.

Useful when the caller needs per-partition failure visibility or wants to defer phase-2 reduction.

Session

acid.Session

Persistent acid execution context.

Parameters

catalogs YAML path, dict, or pre-built :class:Registry. workers Process-pool size. workers <= 1 runs in-process. duckdb_threads DuckDB SET threads per worker. Defaults to cpu_count // workers to avoid thread oversubscription. cache_dir Where materialize(...) writes derived catalogs. When omitted, a tempdir is created and cleaned up on close().

close

close() -> None

Shut down the worker pool and release resources.

list_catalogs

list_catalogs() -> list[str]

Return the names of all registered catalogs.

add_catalog

add_catalog(name: str, **spec_kwargs) -> TableSpec

Add (or replace) a catalog at runtime.

register_moc

register_moc(name: str, source: Union[str, Path, object])

Register a MOC footprint by name. Subsequent queries can filter rows by it via WHERE IN_MOC(<alias>, '<name>').

source is a FITS file path, an in-memory mocpy.MOC, or an (N, 2) numpy array of order-29 [lo, hi) ranges.

validate

validate(query: str) -> QueryPlan

Parse + analyze, no execution.

explain

explain(query: str) -> str

Return the per-partition SQL the rewriter would emit for the anchor's first partition. Useful for debugging the rewrite.

sql

sql(query: str, *, output: Optional[Union[str, Path]] = None, inmem_row_limit: int = 50000000) -> Result

Execute a query and return an :class:acid.Result.

When output is provided, per-partition Parquet files are written there and the returned Result is backed by disk. Queries with global aggregates / ORDER BY / LIMIT trigger the phase-2 reducer over the per-partition output.

run

run(query: str, *, output: Optional[Union[str, Path]] = None) -> ExecutionResult

Lower-level: returns the raw :class:ExecutionResult so the caller can inspect per-partition failures.

materialize

materialize(name: str, query: str, *, ra_col: Optional[str] = None, dec_col: Optional[str] = None, overwrite: bool = False) -> TableSpec

Run query, write the result as a HATS catalog under cache_dir/<name>, and register it as a catalog usable in subsequent sql() calls.

Returns the registered :class:TableSpec.

Parameters

ra_col, dec_col Override the inherited anchor coords. Defaults to the anchor query's anchor ra/dec column names. overwrite If True, delete the existing directory at cache_dir/name before writing.

Result

acid.Result dataclass

A materialized query result.

Backing storage is one of
  • in-memory pa.Table (_table is not None), or
  • a directory of per-partition Parquet files (_output_dir is not None), typically a HATS catalog.

Most callers should use .arrow(), .df(), or .to_polars(); legacy code that treated the result as a pa.Table continues to work via the passthrough surface (num_rows, column_names, column(name), to_pandas(), to_pylist()).

column

column(name: str) -> pa.ChunkedArray

Return a single column by name as a pa.ChunkedArray.

to_pandas

to_pandas() -> 'pd.DataFrame'

Convert to a pandas DataFrame.

to_pylist

to_pylist() -> list[dict]

Convert to a list of row dicts.

arrow

arrow() -> pa.Table

Return the result as an in-memory pa.Table.

Loads from disk if the result was spilled or written to a HATS output directory.

df

df() -> 'pd.DataFrame'

Alias for to_pandas().

to_polars

to_polars()

Convert to a Polars DataFrame (requires polars installed).

batches

batches(batch_size: Optional[int] = None) -> Iterator[pa.RecordBatch]

Iterate over pa.RecordBatch chunks.

For in-memory results, returns the table's existing batches (or rebatches if batch_size is set). For disk-backed results, streams Parquet without materializing the union in memory.

write_parquet

write_parquet(path: str | Path, *, layout: str = 'hats') -> Path

Write the result to path.

With layout='hats', path becomes a HATS catalog directory (mirrors the per-partition Parquet output and HATS metadata files). With layout='single', path is a single Parquet file containing the row union.

head

head(n: int = 10) -> 'Result'

Return a new Result containing the first n rows (after applying whatever ORDER BY the original query had).

Registry

acid.Registry

Resolve catalog names to TableSpec, with HATS auto-detection.

Optionally also holds named MOC footprints, registered via :meth:register_moc or a top-level mocs: section in the YAML. They're looked up by the analyzer when it encounters IN_MOC() predicates in a query.

from_directory classmethod

from_directory(path: str | PathLike) -> 'Registry'

Auto-discover HATS catalogs in subdirectories of path.

Each subdirectory containing properties or hats.properties becomes a table named after the directory. point_map.fits files are auto-registered as MOCs.

register_moc

register_moc(name: str, source: Union[str, Path, 'mocpy.MOC', 'np.ndarray', 'MocSpec']) -> 'MocSpec'

Register a MOC by name. source is a FITS path, an in-memory mocpy.MOC, an (N, 2) numpy array of order-29 [lo, hi) ranges, or an already-built :class:MocSpec.

get_moc

get_moc(name: str) -> 'MocSpec'

Return the MOC registered as name, lazily falling back to a registered catalog's point_map.fits footprint when no explicit registration matches.

Resolution order
  1. Explicitly registered MOC named name.
  2. Registered catalog named name whose <path>/point_map.fits exists — auto-loaded and cached under name so subsequent lookups are free.
  3. Otherwise raise RegistryError naming both attempts.

catalog_footprint

catalog_footprint(catalog_name: str) -> 'MocSpec | None'

Return the catalog's footprint MOC loaded from point_map.fits, cached on first access. None when the catalog isn't registered or has no point_map.fits. Used by the analyzer to scope IN_MOC predicates to cells where the catalog actually has data — independent of Registry._mocs so explicit registrations don't shadow the catalog's own footprint.

has_moc

has_moc(name: str) -> bool

Cheap pre-execution check: would get_moc(name) succeed?

Stats the catalog's point_map.fits for the auto-resolution path but doesn't read it.

Errors

acid.AcidError

Bases: Exception

Base class for all acid errors.

Subclasses (ParseError, ValidationError etc.) inherit the same constructor and renderer. Library callers can catch AcidError to handle every acid-originated failure uniformly.

from_sqlglot classmethod

from_sqlglot(query: str, exc: BaseException) -> 'ParseError'

Wrap a sqlglot.errors.ParseError into an acid ParseError.

Extracts line/col/description/highlight from the SQLGlot exception's errors list (see sqlglot/errors.py:37-71). Falls back to str(exc) when no structured info is available.

from_node classmethod

from_node(query: Optional[str], node: 'Optional[exp.Expression]', message: str, *, hint: Optional[str] = None, suggestion: Optional[str] = None) -> 'AcidError'

Construct with position resolved from a SQLGlot AST node.

SQLGlot 30.x does not attach line/col to expressions, so we re-tokenize query and find the token sequence that matches node. When unresolvable (or when query is unavailable), span is left None and the renderer omits the caret.

acid.RegistryError

Bases: AcidError

Registry / catalog metadata problems.

acid.ParseError

Bases: AcidError

SQL parse failures (syntactic).

acid.ValidationError

Bases: AcidError

Query violates acid semantics (unsupported predicate, join type, unknown column, etc.).

acid.ExecutionError

Bases: AcidError

Per-partition execution failures (engine-translated where possible).

acid.SessionClosedError

Bases: AcidError

Raised when a method is called on a closed Session.