Catalog

class kszx.Catalog(cols=None, name=None, filename=None, size=0)

Represents a galaxy catalog, with one “row” per galaxy, and user-defined “columns” for RA, DEC, etc.

A design decision: do we actually need a Catalog class? Or should we phase it out, in favor of numpy.recarray or astropy.Table?

Constructor args:
  • cols: dictionary (string col_name) -> (1-d numpy array). Optional, since you can add columns later with add_column().

  • name: catalog name (optional)

  • filename: filename, if catalog is stored on disk (optional)

  • size: number of galaxies in catalog (optional, can be inferred from array dimensions instead).

Members:

  • self.size (integer): Number of galaxies in catalog.

  • self.col_names (list of strings): List of user-defined columns.

  • self.name (string or None): Catalog name (optional).

  • self.filename (string or None): Filename, if Catalog is stored on disk (optional).

Additionally, for each column name (in self.col_names), the Catalog contains a member with the corresponding name, whose value is a 1-d array of length self.size.

Column names are user-defined, but here are some frequently-occuring column names (note that I always use lower case):

  • self.ra_deg: right ascension (degrees)

  • self.dec_deg: declination (degrees)

  • self.z: redshift

  • self.zerr: redshift error (if photometric)

  • self.rmag: r-band magnitude (and similarly for g-band, z-band, etc.)

Here are some functions which return Catalogs:

kszx.sdss.read_galaxies()
kszx.sdss.read_randoms()
kszx.desils_lrg.read_galaxies()
kszx.desils_lrg.read_randoms()
kszx.Catalog.from_h5()    # static method

Here are some functions which make maps from catalogs:

kszx.healpix_utils.map_from_catalog()   # catalog -> 2d healpix map
kszx.pixell_utils.map_from_catalog()    # catalog -> 2d pixell maps
kszx.grid_points()                      # catalog -> 3d map
add_column(col_name, col_data)

Adds a new column to the catalog.

  • col_name (string)

  • col_data (1-d numpy array)

remove_column(col_name)

Removes an existing column from the catalog.

apply_boolean_mask(mask, name=None, in_place=True)

Reduces the size of the Catalog, by applying a caller-specified boolean mask.

This can be used to implement color cuts, redshift cuts, etc.

  • mask (1-d boolean array). Elements should be True for galaxies which are kept, False for galaxies which are discarded.

  • name (string or None). If specified, the mask fraction will be printed.

  • in_place (booelan). If False, the original Catalog is unmodified, and a new Catalog is returned.

apply_redshift_cut(zmin, zmax, in_place=True)

Reduces the size of the catalog, by keeping objects in redshift range \(z_{min} \le z \le z_{max}\).

This is a special case of apply_boolean_mask().

  • zmin (float or None). If None, then no lower redshift limit is imposed.

  • zmax (float or None). If None, then no upper redshift limit is imposed.

  • in_place (booelan). If False, the original Catalog is unmodified, and a new Catalog is returned.

get_xyz(cosmo, zcol_name='z')

Returns shape (N,3) array containing galaxy locations in Cartesian coords (observer at origin).

The Catalog must define columns named ra_deg, dec_deg, and z. The constructor arg is:

  • cosmo (Cosmology). Used to convert redshifts to distances.

This function is super useful – here are some use cases for the points array that it returns:

  • To “grid” the galaxy field, pass the points array to kszx.grid_points().

  • To interpolate a real-space field to the galaxy locations, pass the points array to kszx.interpolate_points()

  • To create a bounding box for the galaxy field, pass the points array to the BoundingBox constructor.

This function is roughly equivalent to:

# Compute radial distaance from 'z' column of catalog
chi = kszx.Cosmology.chi(self.z)

# Compute Cartesian coords from 'ra_deg' and 'dec_deg' cols of catalog.
points = kszx.utils.ra_dec_to_xyz(self.ra_deg, self.dec_deg, r=chi)
return points
generate_batches(batchsize, verbose=True)

Splits catalog into subcatalogs no larger than ‘batchsize’.

If batchsize is None, the catalog will be “split” into a single subcatalog.

If verbose is True, and the catalog is split into more than one batch, then a progress indicator will be shown.

static concatenate(catalog_list, name=None, destructive=False)

Returns a new Catalog, obtained by concatenating (merging) multiple Catalogs.

  • catalog_list (sequence of Catalogs).

  • name (string or None). Name of new Catalog (optional).

  • destructive (boolean). If True, then the input Catalogs will be destroyed to save memory

absorb_columns(src_catalog, destructive=False)

‘Absorbs’ all columns of ‘src_catalog’ into an existing Catalog (equivalent to calling add_column() multiple times).

If destructive=True, then all columns will be removed from src_catalog (saving some memory).

make_random_subcatalog(new_size, name=None)

Returns a new Catalog, obtained by choosing a random subset (with size new_size) of existing Catalog.

This method can be used to reduce the size of a random catalog. For example:

# Random catalong with 100x the galaxy density
rcat_large = kszx.sdss.read_randoms('CMASS_North')

# Random catalong with 10x the galaxy density
rcat_small = rcat_large.make_random_subcatalog(rcat_large.size/10)
static from_h5(filename)

Reads FITS file in HDF5 format (written by catalog.write_h5()), and returns a Catalog object.

write_h5(filename)

Writes a Catalog to disk, in an HDF5 file format (readable with Catalog.read_h5()).

static from_fits(filename, col_name_pairs, name=None)

Reads FITS file in SDSS/DESILS format, and returns a Catalog object.

The col_name_pairs arg should be a list of pairs (col_name, fits_col_name). See sdss.py and desils_lrg.py for examples.

static from_text_file(filename, col_names, name=None)

Reads text file (one row per catalog object) and returns a Catlog object. The col_names arg should be a list of column names. See sdss.py for examples.

show()

Prints a summary of the Catalog to stdout, with one line per column showing min/mean/max.