CLI reference¶

The acid command-line tool runs the same engine as the Python API. Use it for one-off queries from a shell prompt, for catalog downloads and inspection, and for building margin caches.

This page lists every subcommand and every flag, with defaults read straight from src/acid_cli/. For the design rationale behind configuration discovery and --db resolution, see docs/archive/CONFIG-SYSTEM.md.

Synopsis¶

acid [--config FILE] {query|validate|config|search|list|download|inspect|hats} ...
acid --version

Global options¶

Flag	Description
`--version`	Print version, copyright, and license; exit.
`--config FILE`	Path to an `acid.conf` (overrides `ACID_CONFIG` and the default search).

`acid query` — run a SQL query¶

acid query [QUERY|-] [-f FILE]
           [--db ENTRIES] [--open PATH,RA,DEC]
           [--output PATH] [--format {hats,parquet,csv,fits}]
           [--workers N] [--threads N] [--ram-budget SIZE] [--tmpdir DIR]
           [--cone RA,DEC,RADIUS_DEG]
           [--progress {auto,on,off,plain}]

Reads the SQL query from a positional argument, from stdin (when the positional is -), or from a file (-f FILE). Exactly one of these is required.

Where to write results¶

`--output`	`--format`	Effect
omitted	(rejected if given)	Pretty-print to `stdout` (fixed-width on a TTY, TSV when piped).
directory or path with no extension	inferred / `hats`	Write a partitioned HATS tree at `--output`.
`.parquet` / `.pq`	inferred / `parquet`	Single Parquet file.
`*.csv`	inferred / `csv`	Single CSV file.
`.fits` / `.fit`	inferred / `fits`	Single FITS binary table.

If you set --format explicitly, it wins over the extension; an unrecognized extension with no --format is an error rather than a silent fallback to HATS. A global-reduce query (aggregate / top-K) cannot be written as a HATS tree — use single-file format, or omit --output to pretty-print.

Flags¶

Flag	Default	Description
`-f`, `--file FILE`	—	Read the SQL query from a file.
`--db ENTRIES`	`$ACID_PATH` → config `path` → `~/datasets`	`':'`-separated list of HATS directories, registry YAMLs, or both. Leftmost entry wins on name collisions.
`--open PATH,RA,DEC`	—	Use a raw data file (`.parquet`/`.csv`/`.tsv`/`.fits`/`.arrow`/VOTable) as a named table in the query, alongside the `--db` catalogs (a virtual catalog: spilled once, queried like HATS). Two forms: positional `PATH,RA,DEC` (table name = file basename), or named `NAME=PATH,ra=RA,dec=DEC`. The `RA`/`DEC` coordinate column names are required (never guessed). Repeatable.
`--output PATH`	—	Output path; omit to pretty-print to `stdout`.
`--format FMT`	inferred from `--output` (no ext → `hats`)	One of `hats`, `parquet`, `csv`, `fits`. Rejected without `--output`.
`--workers N`	resolved from config / `ACID_WORKERS` / `"auto"` (cgroup-aware)	Process-pool size.
`--threads N`	`cpu_cap // workers` (cgroup-aware)	Per-worker Polars thread budget.
`--ram-budget SIZE`	25 % of available RAM (cgroup-aware)	Total RAM the planner budgets for; work tuples are sized to fit `SIZE / workers` each. Bytes or a human size (`64GB`, `512MiB`, `32g`). See Performance — RAM budget.
`--tmpdir DIR`	config / `$TMPDIR`	Base scratch directory; a per-run subdir is created and removed on exit.
`--cone RA,DEC,RADIUS_DEG`	—	Restrict the query to a sky cone (degrees, ICRS).
`--progress MODE`	`auto`	`auto` (animated bar on a TTY, silent when piped), `on` (force bar), `off` (silent), `plain` (one committed line per stage; good for logs).

Examples¶

# Pretty-print to stdout — no --output.
acid query "SELECT COUNT(*) FROM gaia_dr3" --db /data/hats

# Write a HATS tree (default --format when --output has no extension).
acid query -f xmatch.sql --db catalogs.yaml --output results/ --workers 32

# Write a single CSV — extension picks the format.
acid query "SELECT id, ra, dec FROM object LIMIT 1000" \
    --db /data/hats --output sample.csv

# Read SQL from stdin.
echo "SELECT * FROM object LIMIT 10" | acid query - --db /data/hats

# Restrict to a 2 deg cone around (50, -50).
acid query "SELECT g.id, t.designation FROM gaia g JOIN two_mass t \
            ON XMATCH(radius_arcsec => 1.0)" \
    --db /data/hats --cone 50,-50,2

# Crossmatch a raw CSV target list (named --open form) against a catalog.
acid query "SELECT t.id, g.source_id FROM t \
            JOIN gaia_dr3 ON XMATCH(radius_arcsec => 1.0)" \
    --db /data/hats --open t=candidates.csv,ra=RA,dec=DEC

# Cap the RAM the planner budgets for (work tuples shrink to fit).
acid query -f big.sql --db /data/hats --workers 32 --ram-budget 64GB

`acid validate` — parse + analyze, don't run¶

acid validate [QUERY|-] [-f FILE] [--db ENTRIES]

Same --db / query-source rules as acid query. Prints the analyzed plan summary (anchor, joins, projection, aggregation, ordering, footprint filters) and exits. Use it to catch ParseError / ValidationError before kicking off a long run.

`acid config` — show or edit `acid.conf`¶

acid config show [--effective]
acid config get  KEY [--effective]
acid config set  KEY VALUE
acid config unset KEY

show lists the file's settings. --effective shows the resolved values (with provenance: env > config > built-in) plus the search order ACID walked.

get KEY prints the file value (exits 1 silently if unset) — use --effective to print the resolved value instead.

set and unset rewrite the target file in place; comments and formatting are not preserved.

Valid keys: path, download_path, workers, mem_per_worker_gb, tmpdir, inmem_row_limit, workers_jemalloc_conf, ram_budget.

`acid search` — list downloadable catalogs¶

acid search [PATTERN]
            [--cache {use,refresh,off}]
            [--timeout SEC] [--insecure] [--no-color]

Crawls every root on the download path ($ACID_DOWNLOAD_PATH → config download_path → built-in default) over its native transport — local directories, ssh:// hosts, and http(s):// mirrors alike — and lists the HATS catalogs available to acid download. One line per catalog; the name is the token to hand to acid download.

Flag	Default	Description
`PATTERN`	—	Show only catalogs whose name contains this text (case-insensitive substring).
`--cache {use,refresh,off}`	`use`	Remote-listing cache mode: `use` serves a fresh-enough cached listing (remote roots are cached ~1h under `$XDG_CACHE_HOME/acid/downloads`), `refresh` forces a re-crawl and rewrites the cache, `off` bypasses it entirely (neither read nor write).
`--timeout SEC`	`300`	Per-request timeout in seconds (HTTP / SSH).
`--insecure`	off	Skip TLS cert verification (HTTPS only — self-signed mirrors).
`--no-color`	off	Disable ANSI color.

Output¶

Catalog rows go to stdout; the searched roots, the count, and any shadowing footnote go to stderr as quiet context (so acid search | … pipes only the data).

On a terminal — an aligned, colored table, with a live braille spinner and a running catalog count while the crawl is in flight. Each row reads NAME margins: <widths> arcsec (or margins: none); the margin widths are the margin-cache radii (arcsec) available for the catalog. With more than one download root, the row also shows a compact root label; a catalog shadowed by an earlier root carries a trailing *.
When piped — clean tab-separated lines, name⇥margins⇥root⇥marker, where marker is shadowed (the resolved copy leaves it empty). This is stable for scripting.

Catalogs grouped under a namespace directory surface as namespace/child (e.g. wise/allwise) — the exact token acid download accepts. HATS collections are listed as plain catalog names; the collection structure is read internally only to find their margins.

Shadowing. Every occurrence of a name across the roots is shown, but acid download <name> resolves first-wins, so a same-named catalog at a later root is flagged shadowed (trailing * on a TTY; the shadowed 4th TSV column when piped). The summary reports the count, e.g. ✓ 12 catalogs (2 shadowed). Download a shadowed copy by its explicit URL/path, not by name.

Examples¶

# Everything available on the download path.
acid search

# Only catalogs whose name contains "gaia".
acid search gaia

# Force a fresh crawl past the ~1h cache.
acid search --cache refresh

# Names of only the catalogs `acid download` would actually fetch
# (drop the shadowed ones, whose 4th column is non-empty).
acid search | awk -F'\t' '$4 == "" { print $1 }'

`acid list` — list catalogs you can open¶

acid list [PATTERN]
          [--cache {use,refresh,off}]
          [--timeout SEC] [--insecure] [--no-color]

The local twin of acid search. It uses the same discovery engine, transports, flags, and output contract — but crawls the catalog path ($ACID_PATH → config path → built-in default ~/datasets) instead of the download path. So acid search answers what can I download, while acid list answers what's already here that I can open by name — the catalogs acid open / acid query resolve a bare name against, first-wins across the roots.

The flags, the TTY-table-vs-piped-TSV output, the namespace/child names, the margin-radii column, and the shadowing rules are all exactly as documented under acid search, with one wording change: the shadowing footnote names acid open (which resolves the first match) rather than acid download.

Examples¶

# Every catalog you can open by name, with margin radii.
acid list

# Only the ones whose name contains "gaia".
acid list gaia

# Just the names (piped → TSV).
acid list | cut -f1

`acid download` — fetch a HATS catalog¶

acid download SOURCE [DEST]
              [--cone RA,DEC,RADIUS_DEG]
              [--columns COL,COL,...]
              [--workers N] [--timeout SEC]
              [--estimate] [--prefetch-metadata]
              [--skip-margin] [--insecure]
              [--tmpdir DIR]

Flag	Default	Description
`SOURCE`	required	A catalog name to resolve against the download path, or an explicit source. A name is a bare token (`gaia_dr3`) or a nested `namespace/child` (`wise/allwise`, as `acid search` surfaces). An explicit source — a URL, an SSH `user@host:path`, or a local path with a leading `./` / `/` / `~` — is used verbatim. (To copy from a local relative directory, prefix it with `./`; a bare token is always a name lookup.)
`DEST`	optional for a name	Destination directory. A path (containing a `/`) or URL is used as-is; a bare token resolves to `<ACID_PATH-root>/<token>`. Omit it for a name source — it defaults to `<ACID_PATH-root>/<leaf>` (a nested name's last segment, so `wise/allwise` → `<ACID_PATH>/allwise`). An explicit source requires an explicit dest.
`--cone RA,DEC,RADIUS_DEG`	—	Download only partitions overlapping this cone (degrees, ICRS).
`--columns COL,COL,...`	all	Download only the named columns (HATS-required columns like RA/Dec/`_healpix_29` are always included).
`--workers N`	`8`	Parallel download workers.
`--estimate`	off	Print estimated bytes/files and exit without downloading.
`--prefetch-metadata`	off	Force fetching `_metadata` for exact sizes / bulk byte ranges.
`--skip-margin`	off	Don't download the catalog's margin cache.
`--timeout SEC`	`300`	Per-file timeout (HTTP only).
`--insecure`	off	Skip TLS cert verification (HTTPS only — self-signed certs, testing).
`--tmpdir DIR`	`$TMPDIR`	Scratch directory for the `point_map.fits` mmap build. Keep on fast local storage when `DEST` is networked.

Downloaded subsets are valid HATS catalogs with a rebuilt _metadata and a regenerated point_map.fits footprint. If any file fails after retries, the download exits non-zero — there is no "partial success" mode, on purpose (a half-downloaded catalog looks structurally valid).

The built-in download path is two roots, searched first-wins: https://data.lsdb.io/hats/ then ssh://slacd/sdf/home/m/mjuric/datasets. Override it with $ACID_DOWNLOAD_PATH or an acid.conf download_path = line. Use acid search to see what each root offers and which copy a name resolves to.

Examples¶

# By name — resolved against the download path; lands under <ACID_PATH>/gaia_dr3.
acid download gaia_dr3

# Nested name — lands under its leaf, <ACID_PATH>/allwise.
acid download wise/allwise

# Full catalog over an explicit HTTP URL (explicit source needs an explicit dest).
acid download https://data.lsdb.io/hats/two_mass/two_mass /data/two_mass

# 5-deg cone, column subset, over SSH.
acid download user@server:/hats/gaia /data/gaia \
    --cone 180,0,5 --columns ra,dec,phot_g_mean_mag

# Copy from a LOCAL directory: prefix ./ or / so it isn't a name lookup.
acid download ./mirror/two_mass /data/two_mass

# Estimate first.
acid download https://data.lsdb.io/hats/two_mass/two_mass /data/two_mass \
    --cone 50,-50,2 --estimate

`acid inspect` — show catalog info¶

acid inspect [MODE] SOURCE

MODE is one of summary (default), schema, or properties. SOURCE is a local path, HTTP URL, or user@host:path SSH path.

acid inspect /data/two_mass                       # summary
acid inspect schema /data/two_mass                # column schema
acid inspect properties /data/two_mass            # HATS properties
acid inspect https://data.lsdb.io/hats/two_mass/two_mass

`acid hats` — HATS catalog operations¶

acid hats build-margin CATALOG
                       [--margin-arcsec ARCSEC]
                       [--workers N] [--output DIR]
                       [--overwrite] [--mem-limit GB]
                       [--tmpdir DIR]

Builds a HATS margin cache for CATALOG. Output passes hats.read_hats(...) and matches the upstream hats-import tool.

Flag	Default	Description
`CATALOG`	required	Path to a local HATS catalog.
`--margin-arcsec ARCSEC`	`10.0`	Margin threshold in arcseconds. Must be at least as large as the largest XMATCH radius you'll ever run against this catalog (see margin caches).
`--workers N`	cgroup-aware `cpu_cap`	Parallel workers.
`--output DIR`	`<catalog>_<margin>arcsec`	Output directory.
`--overwrite`	off	Overwrite an existing margin cache at `--output`.
`--mem-limit GB`	10 % of RAM	Memory limit before the accumulator spills to disk.
`--tmpdir DIR`	spill under the output directory	Base directory for the spill scratch (must already exist). Point at fast local storage when `--output` is on a slow networked filesystem.

Example¶

acid hats build-margin /data/two_mass --margin-arcsec 10.0 --workers 16

Environment variables¶

A few env vars influence CLI behavior; the full list (including the allocator and worker-startup knobs) is in MEMORY-TUNING.md. The ones a user is most likely to set:

Variable	Effect
`ACID_PATH`	Default value for `--db` (overrides config `path`).
`ACID_WORKERS`	Default for `--workers`.
`ACID_MEM_PER_WORKER_GB`	RAM/worker bound for `--workers auto`.
`ACID_RAM_BUDGET`	Default for `--ram-budget` (bytes or `64GB`/`512MiB`).
`ACID_TMPDIR`	Default for `--tmpdir`.
`ACID_INMEM_ROW_LIMIT`	Spill threshold (rows).
`ACID_CONFIG`	Path to a specific `acid.conf`.
`ACID_PROGRESS`	`0` to silence, `1` to force the bar, `plain` for committed lines per stage.
`ACID_PROFILE`	`1` to print a per-step profile to `stderr`.
`ACID_PROFILE_OUT`	Path for the full per-worker profile JSON.

CLI reference¶

Synopsis¶

Global options¶

acid query — run a SQL query¶

Where to write results¶

Flags¶

Examples¶

acid validate — parse + analyze, don't run¶

acid config — show or edit acid.conf¶

acid search — list downloadable catalogs¶

Output¶

Examples¶

acid list — list catalogs you can open¶

Examples¶

acid download — fetch a HATS catalog¶

Examples¶

acid inspect — show catalog info¶

acid hats — HATS catalog operations¶

Example¶

Environment variables¶

`acid query` — run a SQL query¶

`acid validate` — parse + analyze, don't run¶

`acid config` — show or edit `acid.conf`¶

`acid search` — list downloadable catalogs¶

`acid list` — list catalogs you can open¶

`acid download` — fetch a HATS catalog¶

`acid inspect` — show catalog info¶

`acid hats` — HATS catalog operations¶