Performance¶
Stub
Final content will be the practical knobs: workers, predicate pushdown, and when to spill — with one benchmark table per knob so you know what to expect.
Topics planned:
workers=N— start with the number of cores you have, halve it if your disk is slow.- Predicate pushdown — write
WHERE ra > 100 AND IN_MOC(a, ...)andacidwill skip whole partitions whose pixels don't overlap your filter. The mechanism is automatic; the rule for you is "filter early". - "Avoid
SELECT *" — column pushdown is real: a query that selects 3 columns out of 150 only reads 3 columns from disk. materialize("name", ...)for expensive intermediates you'll reuse, vswrite_parquet(...)for final outputs.- Use Polars for post-processing: pandas and
astropy.Tableare fine for inspecting results, but for filtering, group-by, and joins on results larger than a few hundred thousand rows,r.to_polars()is typically 5–50× faster. Round-trip back withpolars_df.to_pandas()whenever you need a pandas frame.