`KszPipe`¶

class kszx.KszPipe(input_dir, output_dir)¶

This is the main kSZ analysis pipeline, which computes data/surrogate power spectra from catalogs.

There are two ways to run a KszPipe pipeline. The first way is to create a KszPipe instance and call the run() method. This can be done either in a script or a jupyter notebook (if you’re using jupyter, you should keep in mind that the KszPipe may take hours to run, and you’ll need to babysit the connection to the jupyterhib). The second way is to run from the command line with:

python -m kszx kszpipe_run [-p NUM_PROCESSES] <input_dir> <output_dir>

The input_dir contains a parameter file params.yml and galaxy/random catalogs. The output_dir will be populated with power spectra. For details of what KszPipe computes, and documentation of file formats, see the sphinx docs:

https://kszx.readthedocs.io/en/latest/kszpipe.html#kszpipe-details

After running the pipeline, you may want to load pipeline outputs using the helper class KszPipeOutdir, or do parameter estimation using PgvLikelihood.

High-level features:

Runs “surrogate” sims (see overleaf) to characterize the survey window function, determine dependence of power spectra on $(f_{NL}, b_v)$, and assign error bars to power spectra.

Velocity reconstruction noise is included in surrogate sims via a bootstrap procedure, using the observed CMB realization. This automatically incorporates noise inhomogeneity and “striping”, and captures correlations e.g. between 90 and 150 GHz.

The galaxy catalog can be spectroscopic or photometric (via the ztrue_col and zobs_col constructor args). Surrogate sims will capture the effect of photo-z errors.

The windowed power spectra $P_{gg}$, $P_{gv}$, $P_{vv}$ use a normalization which should be approximately correct. The normalization is an ansatz which is imperfect, especially on large scales, so surrogate sims should still be used to compare power spetra to models. Eventually, we’ll implement a precise calculation of the window function.

Currently assumes one galaxy field, and two velocity reconstructions labelled “90” and “150” (with ACT in mind).

Currently, there is not much implemented for CMB foregrounds. Later, I’d like to include foreground clustering terms in the surrogate model (i.e. terms of the form $b_\delta \delta(x)$, in addition to the kSZ term $b_v v_r(x)$), and estimate the $b_\delta$ biases by estimating the spin-zero $P_{gv}$ power spectrum.

property window_function¶

3-by-3 matrix $W_{ij}$ containing the window function for power spectra on three spatial footprints:

footprint 0: random catalog weighted by weight_gal column

footprint 1: random catalog weighted by product of columns weight_vr * bv_90

footprint 2: random catalog weighted by product of columns weight_vr * bv_150

These spatial weightings are appropriate for the $\delta_g$, $v_r^{90}$, and $v_r^{150}$ fields.

Window functions are computed with wfunc_utils.compute_wcrude() and are crude approximations (for more info see compute_wcrude() docstring), but this is okay since surrogate fields are treated consistently.

property surrogate_factory¶: Returns an instance of class SurrogateFactory, a helper class for simulating the density and radial velocity fields at locations of randoms.

get_pk_data(run=False, force=False)¶

Returns a shape (3,3,nkbins) array, and saves it in pipeline_outdir/pk_data.npy.

The returned array contains auto and cross power spectra of the following fields:

0: galaxy overdensity

1: kSZ velocity reconstruction $v_r^{90}$

2: kSZ velocity reconstruction $v_r^{150}$

Flags:

If run=False, then this function expects the $P(k)$ file to be on disk from a previous pipeline run.

If run=True, then the $P(k)$ file will be computed if it is not on disk.

If force=True, then this function recomputes $P(k)$, even if it is on disk from a previous pipeline run.

get_pk_surrogate(isurr, run=False, force=False)¶

Returns a shape (6,6,nkbins) array, and saves it in pipeline_outdir/tmp/pk_surr_{isurr}.npy.

The returned array contains auto and cross power spectra of the following fields, for a single surrogate:

0: surrogate galaxy field $S_g$ with $f_{NL}=0$.

1: derivative $dS_g/df_{NL}$.

2: surrogate kSZ velocity reconstruction $S_v^{90}$, with $b_v=0$ (i.e. noise only).

3: derivative $dS_v^{90}/db_v$.

4: surrogate kSZ velocity reconstruction $S_v^{150}$, with $b_v=0$ (i.e. noise only).

5: derivative $dS_v^{150}/db_v$.

Flags:

If run=False, then this function expects the $P(k)$ file to be on disk from a previous pipeline run.

If run=True, then the $P(k)$ file will be computed if it is not on disk.

If force=True, then this function recomputes $P(k)$, even if it is on disk from a previous pipeline run.

get_pk_surrogates()¶

Returns a shape (nsurr,6,6,nkins) array, and saves it in pipeline_outdir/pk_surrogates.npy.

The returned array contains auto and cross power spectra of the following fields, for all surrogates:

0: surrogate galaxy field $S_g$ with $f_{NL}=0$.

1: derivative $dS_g/df_{NL}$.

2: surrogate kSZ velocity reconstruction $S_v^{90}$, with $b_v=0$ (i.e. noise only).

3: derivative $dS_v^{90}/db_v$.

4: surrogate kSZ velocity reconstruction $S_v^{150}$, with $b_v=0$ (i.e. noise only).

5: derivative $dS_v^{150}/db_v$.

This function only reads files from disk – it does not run the pipeline. To run the pipeline, use run().

run(processes)¶

Runs pipeline and saves results to disk, skipping results already on disk from previous runs.

Implementation: creates a multiprocessing Pool, and calls get_pk_data() and get_pk_surrogates() in worker processes.

Can be run from the command line with:

python -m kszx kszpipe_run [-p NUM_PROCESSES] <input_dir> <output_dir>

The processes argument is the number of worker processes. Currently I don’t have a good way of setting this automatically – the caller must adjust the number of processes, based on the size of the datasets, and amount of memory available.

class kszx.KszPipeOutdir(dirname, nsurr=None)¶

A helper class for loading and processing output files from class KszPipe.

Note: for MCMCs and parameter fits, there is a separate class PgvLikelihood. The KszPipeOutdir class is more minimal (the main use case is plot scripts!)

The constructor reads the files {dirname}/params.yml, {dirname}/pk_data.npy, {dirname}/pk_surrogates.npy which are generated by run(). For more info on these files, and documentation of file formats, see the sphinx docs:

https://kszx.readthedocs.io/en/latest/kszpipe.html#kszpipe-details

Constructor arguments:

dirname (string): name of pipeline output directory.

nsurr (integer or None): this is a hack for running on an incomplete pipeline. If specified, then {dirname}/pk_surr.npy is not read. Instead we read files of the form {dirname}/tmp/pk_surr_{i}.npy.

pgg_data()¶: Returns shape (nkbins,) array containing $P_{gg}^{data}(k)$.

pgg_mean(fnl=0)¶: Returns shape (nkbins,) array, containing $\langle P_{gg}^{surr}(k) \rangle$.

pgg_rms(fnl=0)¶: Returns shape (nkbins,) array, containing sqrt(Var($P_{gg}^{surr}(k)$)).

pgv_data(field)¶

Returns shape (nkbins,) array containing $P_{gv}^{data}(k)$.