Performance

Stub

Final content will be the practical knobs: workers, predicate pushdown, and when to spill — with one benchmark table per knob so you know what to expect.

Topics planned:

  • workers=N — start with the number of cores you have, halve it if your disk is slow.
  • Predicate pushdown — write WHERE ra > 100 AND IN_MOC(a, ...) and acid will skip whole partitions whose pixels don't overlap your filter. The mechanism is automatic; the rule for you is "filter early".
  • "Avoid SELECT *" — column pushdown is real: a query that selects 3 columns out of 150 only reads 3 columns from disk.
  • materialize("name", ...) for expensive intermediates you'll reuse, vs write_parquet(...) for final outputs.
  • Use Polars for post-processing: pandas and astropy.Table are fine for inspecting results, but for filtering, group-by, and joins on results larger than a few hundred thousand rows, r.to_polars() is typically 5–50× faster. Round-trip back with polars_df.to_pandas() whenever you need a pandas frame.