CLI reference¶
The acid command-line tool runs the same engine as the Python API.
Use it for one-off queries from a shell prompt, for catalog downloads
and inspection, and for building margin caches.
This page lists every subcommand and every flag, with defaults read
straight from src/acid_cli/. For the design rationale behind
configuration discovery and --db resolution, see
docs/archive/CONFIG-SYSTEM.md.
Synopsis¶
Global options¶
| Flag | Description |
|---|---|
--version |
Print version, copyright, and license; exit. |
--config FILE |
Path to an acid.conf (overrides ACID_CONFIG and the default search). |
acid query — run a SQL query¶
acid query [QUERY|-] [-f FILE]
[--db ENTRIES] [--open PATH,RA,DEC]
[--output PATH] [--format {hats,parquet,csv,fits}]
[--workers N] [--threads N] [--ram-budget SIZE] [--tmpdir DIR]
[--cone RA,DEC,RADIUS_DEG]
[--progress {auto,on,off,plain}]
Reads the SQL query from a positional argument, from stdin (when
the positional is -), or from a file (-f FILE). Exactly one of
these is required.
Where to write results¶
--output |
--format |
Effect |
|---|---|---|
| omitted | (rejected if given) | Pretty-print to stdout (fixed-width on a TTY, TSV when piped). |
| directory or path with no extension | inferred / hats |
Write a partitioned HATS tree at --output. |
*.parquet / *.pq |
inferred / parquet |
Single Parquet file. |
*.csv |
inferred / csv |
Single CSV file. |
*.fits / *.fit |
inferred / fits |
Single FITS binary table. |
If you set --format explicitly, it wins over the extension; an
unrecognized extension with no --format is an error rather than a
silent fallback to HATS. A global-reduce query (aggregate / top-K)
cannot be written as a HATS tree — use single-file format, or omit
--output to pretty-print.
Flags¶
| Flag | Default | Description |
|---|---|---|
-f, --file FILE |
— | Read the SQL query from a file. |
--db ENTRIES |
$ACID_PATH → config path → ~/datasets |
':'-separated list of HATS directories, registry YAMLs, or both. Leftmost entry wins on name collisions. |
--open PATH,RA,DEC |
— | Use a raw data file (.parquet/.csv/.tsv/.fits/.arrow/VOTable) as a named table in the query, alongside the --db catalogs (a virtual catalog: spilled once, queried like HATS). Two forms: positional PATH,RA,DEC (table name = file basename), or named NAME=PATH,ra=RA,dec=DEC. The RA/DEC coordinate column names are required (never guessed). Repeatable. |
--output PATH |
— | Output path; omit to pretty-print to stdout. |
--format FMT |
inferred from --output (no ext → hats) |
One of hats, parquet, csv, fits. Rejected without --output. |
--workers N |
resolved from config / ACID_WORKERS / "auto" (cgroup-aware) |
Process-pool size. |
--threads N |
cpu_cap // workers (cgroup-aware) |
Per-worker Polars thread budget. |
--ram-budget SIZE |
25 % of available RAM (cgroup-aware) | Total RAM the planner budgets for; work tuples are sized to fit SIZE / workers each. Bytes or a human size (64GB, 512MiB, 32g). See Performance — RAM budget. |
--tmpdir DIR |
config / $TMPDIR |
Base scratch directory; a per-run subdir is created and removed on exit. |
--cone RA,DEC,RADIUS_DEG |
— | Restrict the query to a sky cone (degrees, ICRS). |
--progress MODE |
auto |
auto (animated bar on a TTY, silent when piped), on (force bar), off (silent), plain (one committed line per stage; good for logs). |
Examples¶
# Pretty-print to stdout — no --output.
acid query "SELECT COUNT(*) FROM gaia_dr3" --db /data/hats
# Write a HATS tree (default --format when --output has no extension).
acid query -f xmatch.sql --db catalogs.yaml --output results/ --workers 32
# Write a single CSV — extension picks the format.
acid query "SELECT id, ra, dec FROM object LIMIT 1000" \
--db /data/hats --output sample.csv
# Read SQL from stdin.
echo "SELECT * FROM object LIMIT 10" | acid query - --db /data/hats
# Restrict to a 2 deg cone around (50, -50).
acid query "SELECT g.id, t.designation FROM gaia g JOIN two_mass t \
ON XMATCH(radius_arcsec => 1.0)" \
--db /data/hats --cone 50,-50,2
# Crossmatch a raw CSV target list (named --open form) against a catalog.
acid query "SELECT t.id, g.source_id FROM t \
JOIN gaia_dr3 ON XMATCH(radius_arcsec => 1.0)" \
--db /data/hats --open t=candidates.csv,ra=RA,dec=DEC
# Cap the RAM the planner budgets for (work tuples shrink to fit).
acid query -f big.sql --db /data/hats --workers 32 --ram-budget 64GB
acid validate — parse + analyze, don't run¶
Same --db / query-source rules as acid query. Prints the analyzed
plan summary (anchor, joins, projection, aggregation, ordering,
footprint filters) and exits. Use it to catch ParseError /
ValidationError before kicking off a long run.
acid config — show or edit acid.conf¶
acid config show [--effective]
acid config get KEY [--effective]
acid config set KEY VALUE
acid config unset KEY
show lists the file's settings. --effective shows the resolved
values (with provenance: env > config > built-in) plus the search
order ACID walked.
get KEY prints the file value (exits 1 silently if unset) — use
--effective to print the resolved value instead.
set and unset rewrite the target file in place; comments and
formatting are not preserved.
Valid keys: path, download_path, workers, mem_per_worker_gb,
tmpdir, inmem_row_limit, workers_jemalloc_conf, ram_budget.
acid search — list downloadable catalogs¶
Crawls every root on the download path ($ACID_DOWNLOAD_PATH →
config download_path → built-in default) over its native transport —
local directories, ssh:// hosts, and http(s):// mirrors alike — and
lists the HATS catalogs available to acid download. One line per
catalog; the name is the token to hand to acid download.
| Flag | Default | Description |
|---|---|---|
PATTERN |
— | Show only catalogs whose name contains this text (case-insensitive substring). |
--cache {use,refresh,off} |
use |
Remote-listing cache mode: use serves a fresh-enough cached listing (remote roots are cached ~1h under $XDG_CACHE_HOME/acid/downloads), refresh forces a re-crawl and rewrites the cache, off bypasses it entirely (neither read nor write). |
--timeout SEC |
300 |
Per-request timeout in seconds (HTTP / SSH). |
--insecure |
off | Skip TLS cert verification (HTTPS only — self-signed mirrors). |
--no-color |
off | Disable ANSI color. |
Output¶
Catalog rows go to stdout; the searched roots, the count, and any
shadowing footnote go to stderr as quiet context (so acid search | …
pipes only the data).
- On a terminal — an aligned, colored table, with a live braille
spinner and a running catalog count while the crawl is in flight. Each
row reads
NAME margins: <widths> arcsec(ormargins: none); the margin widths are the margin-cache radii (arcsec) available for the catalog. With more than one download root, the row also shows a compact root label; a catalog shadowed by an earlier root carries a trailing*. - When piped — clean tab-separated lines,
name⇥margins⇥root⇥marker, wheremarkerisshadowed(the resolved copy leaves it empty). This is stable for scripting.
Catalogs grouped under a namespace directory surface as namespace/child
(e.g. wise/allwise) — the exact token acid download accepts. HATS
collections are listed as plain catalog names; the collection structure
is read internally only to find their margins.
Shadowing. Every occurrence of a name across the roots is shown, but
acid download <name> resolves first-wins, so a same-named catalog at
a later root is flagged shadowed (trailing * on a TTY; the
shadowed 4th TSV column when piped). The summary reports the count, e.g.
✓ 12 catalogs (2 shadowed). Download a shadowed copy by its explicit
URL/path, not by name.
Examples¶
# Everything available on the download path.
acid search
# Only catalogs whose name contains "gaia".
acid search gaia
# Force a fresh crawl past the ~1h cache.
acid search --cache refresh
# Names of only the catalogs `acid download` would actually fetch
# (drop the shadowed ones, whose 4th column is non-empty).
acid search | awk -F'\t' '$4 == "" { print $1 }'
acid list — list catalogs you can open¶
The local twin of acid search.
It uses the same discovery engine, transports, flags, and output contract
— but crawls the catalog path ($ACID_PATH → config path →
built-in default ~/datasets) instead of the download path. So acid
search answers what can I download, while acid list answers what's
already here that I can open by name — the catalogs acid open /
acid query resolve a bare name against, first-wins across the roots.
The flags, the TTY-table-vs-piped-TSV output, the namespace/child names,
the margin-radii column, and the shadowing rules are all exactly as
documented under acid search,
with one wording change: the shadowing footnote names acid open (which
resolves the first match) rather than acid download.
Examples¶
# Every catalog you can open by name, with margin radii.
acid list
# Only the ones whose name contains "gaia".
acid list gaia
# Just the names (piped → TSV).
acid list | cut -f1
acid download — fetch a HATS catalog¶
acid download SOURCE [DEST]
[--cone RA,DEC,RADIUS_DEG]
[--columns COL,COL,...]
[--workers N] [--timeout SEC]
[--estimate] [--prefetch-metadata]
[--skip-margin] [--insecure]
[--tmpdir DIR]
| Flag | Default | Description |
|---|---|---|
SOURCE |
required | A catalog name to resolve against the download path, or an explicit source. A name is a bare token (gaia_dr3) or a nested namespace/child (wise/allwise, as acid search surfaces). An explicit source — a URL, an SSH user@host:path, or a local path with a leading ./ / / / ~ — is used verbatim. (To copy from a local relative directory, prefix it with ./; a bare token is always a name lookup.) |
DEST |
optional for a name | Destination directory. A path (containing a /) or URL is used as-is; a bare token resolves to <ACID_PATH-root>/<token>. Omit it for a name source — it defaults to <ACID_PATH-root>/<leaf> (a nested name's last segment, so wise/allwise → <ACID_PATH>/allwise). An explicit source requires an explicit dest. |
--cone RA,DEC,RADIUS_DEG |
— | Download only partitions overlapping this cone (degrees, ICRS). |
--columns COL,COL,... |
all | Download only the named columns (HATS-required columns like RA/Dec/_healpix_29 are always included). |
--workers N |
8 |
Parallel download workers. |
--estimate |
off | Print estimated bytes/files and exit without downloading. |
--prefetch-metadata |
off | Force fetching _metadata for exact sizes / bulk byte ranges. |
--skip-margin |
off | Don't download the catalog's margin cache. |
--timeout SEC |
300 |
Per-file timeout (HTTP only). |
--insecure |
off | Skip TLS cert verification (HTTPS only — self-signed certs, testing). |
--tmpdir DIR |
$TMPDIR |
Scratch directory for the point_map.fits mmap build. Keep on fast local storage when DEST is networked. |
Downloaded subsets are valid HATS catalogs with a rebuilt _metadata
and a regenerated point_map.fits footprint. If any file fails after
retries, the download exits non-zero — there is no "partial success"
mode, on purpose (a half-downloaded catalog looks structurally valid).
The built-in download path is two roots, searched first-wins:
https://data.lsdb.io/hats/ then ssh://slacd/sdf/home/m/mjuric/datasets.
Override it with $ACID_DOWNLOAD_PATH or an acid.conf download_path =
line. Use acid search to see
what each root offers and which copy a name resolves to.
Examples¶
# By name — resolved against the download path; lands under <ACID_PATH>/gaia_dr3.
acid download gaia_dr3
# Nested name — lands under its leaf, <ACID_PATH>/allwise.
acid download wise/allwise
# Full catalog over an explicit HTTP URL (explicit source needs an explicit dest).
acid download https://data.lsdb.io/hats/two_mass/two_mass /data/two_mass
# 5-deg cone, column subset, over SSH.
acid download user@server:/hats/gaia /data/gaia \
--cone 180,0,5 --columns ra,dec,phot_g_mean_mag
# Copy from a LOCAL directory: prefix ./ or / so it isn't a name lookup.
acid download ./mirror/two_mass /data/two_mass
# Estimate first.
acid download https://data.lsdb.io/hats/two_mass/two_mass /data/two_mass \
--cone 50,-50,2 --estimate
acid inspect — show catalog info¶
MODE is one of summary (default), schema, or properties.
SOURCE is a local path, HTTP URL, or user@host:path SSH path.
acid inspect /data/two_mass # summary
acid inspect schema /data/two_mass # column schema
acid inspect properties /data/two_mass # HATS properties
acid inspect https://data.lsdb.io/hats/two_mass/two_mass
acid hats — HATS catalog operations¶
acid hats build-margin CATALOG
[--margin-arcsec ARCSEC]
[--workers N] [--output DIR]
[--overwrite] [--mem-limit GB]
[--tmpdir DIR]
Builds a HATS margin cache for CATALOG. Output passes
hats.read_hats(...) and matches the upstream hats-import tool.
| Flag | Default | Description |
|---|---|---|
CATALOG |
required | Path to a local HATS catalog. |
--margin-arcsec ARCSEC |
10.0 |
Margin threshold in arcseconds. Must be at least as large as the largest XMATCH radius you'll ever run against this catalog (see margin caches). |
--workers N |
cgroup-aware cpu_cap |
Parallel workers. |
--output DIR |
<catalog>_<margin>arcsec |
Output directory. |
--overwrite |
off | Overwrite an existing margin cache at --output. |
--mem-limit GB |
10 % of RAM | Memory limit before the accumulator spills to disk. |
--tmpdir DIR |
spill under the output directory | Base directory for the spill scratch (must already exist). Point at fast local storage when --output is on a slow networked filesystem. |
Example¶
Environment variables¶
A few env vars influence CLI behavior; the full list (including the
allocator and worker-startup knobs) is in
MEMORY-TUNING.md.
The ones a user is most likely to set:
| Variable | Effect |
|---|---|
ACID_PATH |
Default value for --db (overrides config path). |
ACID_WORKERS |
Default for --workers. |
ACID_MEM_PER_WORKER_GB |
RAM/worker bound for --workers auto. |
ACID_RAM_BUDGET |
Default for --ram-budget (bytes or 64GB/512MiB). |
ACID_TMPDIR |
Default for --tmpdir. |
ACID_INMEM_ROW_LIMIT |
Spill threshold (rows). |
ACID_CONFIG |
Path to a specific acid.conf. |
ACID_PROGRESS |
0 to silence, 1 to force the bar, plain for committed lines per stage. |
ACID_PROFILE |
1 to print a per-step profile to stderr. |
ACID_PROFILE_OUT |
Path for the full per-worker profile JSON. |