Skip to content

CLI reference

The acid command-line tool runs the same engine as the Python API. Use it for one-off queries from a shell prompt, for catalog downloads and inspection, and for building margin caches.

This page lists every subcommand and every flag, with defaults read straight from src/acid_cli/. For the design rationale behind configuration discovery and --db resolution, see docs/archive/CONFIG-SYSTEM.md.

Synopsis

acid [--config FILE] {query|validate|config|search|list|download|inspect|hats} ...
acid --version

Global options

Flag Description
--version Print version, copyright, and license; exit.
--config FILE Path to an acid.conf (overrides ACID_CONFIG and the default search).

acid query — run a SQL query

acid query [QUERY|-] [-f FILE]
           [--db ENTRIES] [--open PATH,RA,DEC]
           [--output PATH] [--format {hats,parquet,csv,fits}]
           [--workers N] [--threads N] [--ram-budget SIZE] [--tmpdir DIR]
           [--cone RA,DEC,RADIUS_DEG]
           [--progress {auto,on,off,plain}]

Reads the SQL query from a positional argument, from stdin (when the positional is -), or from a file (-f FILE). Exactly one of these is required.

Where to write results

--output --format Effect
omitted (rejected if given) Pretty-print to stdout (fixed-width on a TTY, TSV when piped).
directory or path with no extension inferred / hats Write a partitioned HATS tree at --output.
*.parquet / *.pq inferred / parquet Single Parquet file.
*.csv inferred / csv Single CSV file.
*.fits / *.fit inferred / fits Single FITS binary table.

If you set --format explicitly, it wins over the extension; an unrecognized extension with no --format is an error rather than a silent fallback to HATS. A global-reduce query (aggregate / top-K) cannot be written as a HATS tree — use single-file format, or omit --output to pretty-print.

Flags

Flag Default Description
-f, --file FILE Read the SQL query from a file.
--db ENTRIES $ACID_PATH → config path~/datasets ':'-separated list of HATS directories, registry YAMLs, or both. Leftmost entry wins on name collisions.
--open PATH,RA,DEC Use a raw data file (.parquet/.csv/.tsv/.fits/.arrow/VOTable) as a named table in the query, alongside the --db catalogs (a virtual catalog: spilled once, queried like HATS). Two forms: positional PATH,RA,DEC (table name = file basename), or named NAME=PATH,ra=RA,dec=DEC. The RA/DEC coordinate column names are required (never guessed). Repeatable.
--output PATH Output path; omit to pretty-print to stdout.
--format FMT inferred from --output (no ext → hats) One of hats, parquet, csv, fits. Rejected without --output.
--workers N resolved from config / ACID_WORKERS / "auto" (cgroup-aware) Process-pool size.
--threads N cpu_cap // workers (cgroup-aware) Per-worker Polars thread budget.
--ram-budget SIZE 25 % of available RAM (cgroup-aware) Total RAM the planner budgets for; work tuples are sized to fit SIZE / workers each. Bytes or a human size (64GB, 512MiB, 32g). See Performance — RAM budget.
--tmpdir DIR config / $TMPDIR Base scratch directory; a per-run subdir is created and removed on exit.
--cone RA,DEC,RADIUS_DEG Restrict the query to a sky cone (degrees, ICRS).
--progress MODE auto auto (animated bar on a TTY, silent when piped), on (force bar), off (silent), plain (one committed line per stage; good for logs).

Examples

# Pretty-print to stdout — no --output.
acid query "SELECT COUNT(*) FROM gaia_dr3" --db /data/hats

# Write a HATS tree (default --format when --output has no extension).
acid query -f xmatch.sql --db catalogs.yaml --output results/ --workers 32

# Write a single CSV — extension picks the format.
acid query "SELECT id, ra, dec FROM object LIMIT 1000" \
    --db /data/hats --output sample.csv

# Read SQL from stdin.
echo "SELECT * FROM object LIMIT 10" | acid query - --db /data/hats

# Restrict to a 2 deg cone around (50, -50).
acid query "SELECT g.id, t.designation FROM gaia g JOIN two_mass t \
            ON XMATCH(radius_arcsec => 1.0)" \
    --db /data/hats --cone 50,-50,2

# Crossmatch a raw CSV target list (named --open form) against a catalog.
acid query "SELECT t.id, g.source_id FROM t \
            JOIN gaia_dr3 ON XMATCH(radius_arcsec => 1.0)" \
    --db /data/hats --open t=candidates.csv,ra=RA,dec=DEC

# Cap the RAM the planner budgets for (work tuples shrink to fit).
acid query -f big.sql --db /data/hats --workers 32 --ram-budget 64GB

acid validate — parse + analyze, don't run

acid validate [QUERY|-] [-f FILE] [--db ENTRIES]

Same --db / query-source rules as acid query. Prints the analyzed plan summary (anchor, joins, projection, aggregation, ordering, footprint filters) and exits. Use it to catch ParseError / ValidationError before kicking off a long run.

acid config — show or edit acid.conf

acid config show [--effective]
acid config get  KEY [--effective]
acid config set  KEY VALUE
acid config unset KEY

show lists the file's settings. --effective shows the resolved values (with provenance: env > config > built-in) plus the search order ACID walked.

get KEY prints the file value (exits 1 silently if unset) — use --effective to print the resolved value instead.

set and unset rewrite the target file in place; comments and formatting are not preserved.

Valid keys: path, download_path, workers, mem_per_worker_gb, tmpdir, inmem_row_limit, workers_jemalloc_conf, ram_budget.

acid search — list downloadable catalogs

acid search [PATTERN]
            [--cache {use,refresh,off}]
            [--timeout SEC] [--insecure] [--no-color]

Crawls every root on the download path ($ACID_DOWNLOAD_PATH → config download_path → built-in default) over its native transport — local directories, ssh:// hosts, and http(s):// mirrors alike — and lists the HATS catalogs available to acid download. One line per catalog; the name is the token to hand to acid download.

Flag Default Description
PATTERN Show only catalogs whose name contains this text (case-insensitive substring).
--cache {use,refresh,off} use Remote-listing cache mode: use serves a fresh-enough cached listing (remote roots are cached ~1h under $XDG_CACHE_HOME/acid/downloads), refresh forces a re-crawl and rewrites the cache, off bypasses it entirely (neither read nor write).
--timeout SEC 300 Per-request timeout in seconds (HTTP / SSH).
--insecure off Skip TLS cert verification (HTTPS only — self-signed mirrors).
--no-color off Disable ANSI color.

Output

Catalog rows go to stdout; the searched roots, the count, and any shadowing footnote go to stderr as quiet context (so acid search | … pipes only the data).

  • On a terminal — an aligned, colored table, with a live braille spinner and a running catalog count while the crawl is in flight. Each row reads NAME margins: <widths> arcsec (or margins: none); the margin widths are the margin-cache radii (arcsec) available for the catalog. With more than one download root, the row also shows a compact root label; a catalog shadowed by an earlier root carries a trailing *.
  • When piped — clean tab-separated lines, name⇥margins⇥root⇥marker, where marker is shadowed (the resolved copy leaves it empty). This is stable for scripting.

Catalogs grouped under a namespace directory surface as namespace/child (e.g. wise/allwise) — the exact token acid download accepts. HATS collections are listed as plain catalog names; the collection structure is read internally only to find their margins.

Shadowing. Every occurrence of a name across the roots is shown, but acid download <name> resolves first-wins, so a same-named catalog at a later root is flagged shadowed (trailing * on a TTY; the shadowed 4th TSV column when piped). The summary reports the count, e.g. ✓ 12 catalogs (2 shadowed). Download a shadowed copy by its explicit URL/path, not by name.

Examples

# Everything available on the download path.
acid search

# Only catalogs whose name contains "gaia".
acid search gaia

# Force a fresh crawl past the ~1h cache.
acid search --cache refresh

# Names of only the catalogs `acid download` would actually fetch
# (drop the shadowed ones, whose 4th column is non-empty).
acid search | awk -F'\t' '$4 == "" { print $1 }'

acid list — list catalogs you can open

acid list [PATTERN]
          [--cache {use,refresh,off}]
          [--timeout SEC] [--insecure] [--no-color]

The local twin of acid search. It uses the same discovery engine, transports, flags, and output contract — but crawls the catalog path ($ACID_PATH → config path → built-in default ~/datasets) instead of the download path. So acid search answers what can I download, while acid list answers what's already here that I can open by name — the catalogs acid open / acid query resolve a bare name against, first-wins across the roots.

The flags, the TTY-table-vs-piped-TSV output, the namespace/child names, the margin-radii column, and the shadowing rules are all exactly as documented under acid search, with one wording change: the shadowing footnote names acid open (which resolves the first match) rather than acid download.

Examples

# Every catalog you can open by name, with margin radii.
acid list

# Only the ones whose name contains "gaia".
acid list gaia

# Just the names (piped → TSV).
acid list | cut -f1

acid download — fetch a HATS catalog

acid download SOURCE [DEST]
              [--cone RA,DEC,RADIUS_DEG]
              [--columns COL,COL,...]
              [--workers N] [--timeout SEC]
              [--estimate] [--prefetch-metadata]
              [--skip-margin] [--insecure]
              [--tmpdir DIR]
Flag Default Description
SOURCE required A catalog name to resolve against the download path, or an explicit source. A name is a bare token (gaia_dr3) or a nested namespace/child (wise/allwise, as acid search surfaces). An explicit source — a URL, an SSH user@host:path, or a local path with a leading ./ / / / ~ — is used verbatim. (To copy from a local relative directory, prefix it with ./; a bare token is always a name lookup.)
DEST optional for a name Destination directory. A path (containing a /) or URL is used as-is; a bare token resolves to <ACID_PATH-root>/<token>. Omit it for a name source — it defaults to <ACID_PATH-root>/<leaf> (a nested name's last segment, so wise/allwise<ACID_PATH>/allwise). An explicit source requires an explicit dest.
--cone RA,DEC,RADIUS_DEG Download only partitions overlapping this cone (degrees, ICRS).
--columns COL,COL,... all Download only the named columns (HATS-required columns like RA/Dec/_healpix_29 are always included).
--workers N 8 Parallel download workers.
--estimate off Print estimated bytes/files and exit without downloading.
--prefetch-metadata off Force fetching _metadata for exact sizes / bulk byte ranges.
--skip-margin off Don't download the catalog's margin cache.
--timeout SEC 300 Per-file timeout (HTTP only).
--insecure off Skip TLS cert verification (HTTPS only — self-signed certs, testing).
--tmpdir DIR $TMPDIR Scratch directory for the point_map.fits mmap build. Keep on fast local storage when DEST is networked.

Downloaded subsets are valid HATS catalogs with a rebuilt _metadata and a regenerated point_map.fits footprint. If any file fails after retries, the download exits non-zero — there is no "partial success" mode, on purpose (a half-downloaded catalog looks structurally valid).

The built-in download path is two roots, searched first-wins: https://data.lsdb.io/hats/ then ssh://slacd/sdf/home/m/mjuric/datasets. Override it with $ACID_DOWNLOAD_PATH or an acid.conf download_path = line. Use acid search to see what each root offers and which copy a name resolves to.

Examples

# By name — resolved against the download path; lands under <ACID_PATH>/gaia_dr3.
acid download gaia_dr3

# Nested name — lands under its leaf, <ACID_PATH>/allwise.
acid download wise/allwise

# Full catalog over an explicit HTTP URL (explicit source needs an explicit dest).
acid download https://data.lsdb.io/hats/two_mass/two_mass /data/two_mass

# 5-deg cone, column subset, over SSH.
acid download user@server:/hats/gaia /data/gaia \
    --cone 180,0,5 --columns ra,dec,phot_g_mean_mag

# Copy from a LOCAL directory: prefix ./ or / so it isn't a name lookup.
acid download ./mirror/two_mass /data/two_mass

# Estimate first.
acid download https://data.lsdb.io/hats/two_mass/two_mass /data/two_mass \
    --cone 50,-50,2 --estimate

acid inspect — show catalog info

acid inspect [MODE] SOURCE

MODE is one of summary (default), schema, or properties. SOURCE is a local path, HTTP URL, or user@host:path SSH path.

acid inspect /data/two_mass                       # summary
acid inspect schema /data/two_mass                # column schema
acid inspect properties /data/two_mass            # HATS properties
acid inspect https://data.lsdb.io/hats/two_mass/two_mass

acid hats — HATS catalog operations

acid hats build-margin CATALOG
                       [--margin-arcsec ARCSEC]
                       [--workers N] [--output DIR]
                       [--overwrite] [--mem-limit GB]
                       [--tmpdir DIR]

Builds a HATS margin cache for CATALOG. Output passes hats.read_hats(...) and matches the upstream hats-import tool.

Flag Default Description
CATALOG required Path to a local HATS catalog.
--margin-arcsec ARCSEC 10.0 Margin threshold in arcseconds. Must be at least as large as the largest XMATCH radius you'll ever run against this catalog (see margin caches).
--workers N cgroup-aware cpu_cap Parallel workers.
--output DIR <catalog>_<margin>arcsec Output directory.
--overwrite off Overwrite an existing margin cache at --output.
--mem-limit GB 10 % of RAM Memory limit before the accumulator spills to disk.
--tmpdir DIR spill under the output directory Base directory for the spill scratch (must already exist). Point at fast local storage when --output is on a slow networked filesystem.

Example

acid hats build-margin /data/two_mass --margin-arcsec 10.0 --workers 16

Environment variables

A few env vars influence CLI behavior; the full list (including the allocator and worker-startup knobs) is in MEMORY-TUNING.md. The ones a user is most likely to set:

Variable Effect
ACID_PATH Default value for --db (overrides config path).
ACID_WORKERS Default for --workers.
ACID_MEM_PER_WORKER_GB RAM/worker bound for --workers auto.
ACID_RAM_BUDGET Default for --ram-budget (bytes or 64GB/512MiB).
ACID_TMPDIR Default for --tmpdir.
ACID_INMEM_ROW_LIMIT Spill threshold (rows).
ACID_CONFIG Path to a specific acid.conf.
ACID_PROGRESS 0 to silence, 1 to force the bar, plain for committed lines per stage.
ACID_PROFILE 1 to print a per-step profile to stderr.
ACID_PROFILE_OUT Path for the full per-worker profile JSON.