Read a catalogue split over multiple files (SplitFile class)

The particle and subfind catalogues of the Hydrangea/C-EAGLE simulations are distributed over several HDF5 files. Although each of these can, in principle, be read directly with h5py or the HDF5 I/O routines of this library, this is not very convenient. The SplitFile class allows easy reading of entries from these split catalogues, without having to explicitly consider the way in which the data are distributed over files.

Note

It is possible to select individual elements, or a list/range thereof, from the catalogues and read their entries much faster than the full data list. For reading particles within a particular sub-volume of a simulation, use the ReadRegion class instead.

General usage

The first step is to set up an instance of the class, specifying the catalogue (e.g. particle or structure type), and possibly other parameters (see hydrangea.SplitFile below for all options). Once this is done, any catalogue property can be accessed directly as an attribute of the class instance (e.g. subhalo.Mass for the total mass of subhaloes, assuming the instance has the name subhalo and has been set up to read a subhalo catalogue). By default, data is returned in “astronomically sensible” units, other systems (e.g. cgs) can be specified through the units attribute.

For convenience, methods and properties to read particle properties directly, connect particles to structures and obtaining various time-stamps and meta-data of the catalogue.

Examples

Below are a few examples of how to use the SplitFile class to read from subfind or particle catalogues; all of them assume that 'subfind_file' and 'snapshot_file' have been set up to point to one of the files of the subfind or snapshot catalogue, respectively, to read from. More complete use cases are provided by the demonstration scripts listed in the “Basic examples” section.

Read the mass of all subhaloes:

subhaloes = hydrangea.SplitFile(subfind_file, 'Subhalo')
print(subhaloes.Mass)

Read the stellar mass (part_type == 4) of subhalo index 1000 (note that we do not specify part_type as an argument to the class constructor, because this would instruct it to read a star particle catalogue instead):

subhalo = hydrangea.SplitFile(subfind_file, 'Subhalo', read_index=1000)
print(subhalo.MassType[4])

Properties stored in HDF5 groups can be accessed by separating group(s) and data sets by '__'. Here is how to read the stellar mass within an aperture of 30 pkpc from the subhalo centre:

print(subhalo.ApertureMeasurements__Mass__030kpc[4])

If desired, entries can also be read explicitly. This also allows specifying e.g. an alternative unit system for this particular entry. Here, we read the DM mass of the subhalo in kg (why you would want to do this is another question…):

msub_kg = subhalo.read_data('MassType[1]', units='SI', data_type='np.float64')

Particle catalogues can be read analogously. Here, we read the formation “times” (really: expansion factors) of two specific star particles at indices 56 and 89:

stars = hydrangea.SplitFile(snapshot_file, part_type=4, read_index=[56, 89])
print(stars.StellarFormationTime)

Reference

Reading catalogue entries

Linking particles to structures

Catalogue time-stamps

Catalogue meta-data

class hydrangea.SplitFile(file_name, group_name=None, part_type=None, sim_type='Eagle', verbose=1, units='astro', read_range=None, read_index=None)

Class to read catalogues that are split over multiple files.

Parameters:
  • file_name (str) – Any one of the files in the collection to read.
  • group_name (str, optional) – Name of base group to read from. If not provided, it is assumed to be ‘PartType[x]’, where x is the particle type (which must then be supplied).
  • part_type (int, optional) – The (numerical) particle type; only relevant for sn[a/i]pshots. If not provided, it is inferred from group_name, which must then be supplied.
  • sim_type (str, optional) – Type of simulation to which this data belongs: ‘Eagle’ (default) or ‘Illustris’. The latter is an experimental feature to read data from the Illustris[TNG] simulation family with this library.
  • verbose (int, optional) – Specify level of log output, from 0 (minimal) to 2 (lots). Default: 1.
  • units (str or None, optional) – Convert values to other units (default: ‘astro’; capitalization is ignored). With ‘data’, no conversion is done.
  • read_range ((int, int) or None, optional) – Read only elements from the first up to but excluding the second entry in the tuple. If None (default), load entire catalogue. Ignored if read_index is provided.
  • read_index (int or np.array(int) or None, optional) – Read only the elements in read_index. If int, a single element is read, and the first dimension truncated. If an array is provided, the elements between the lowest and highest index are read and the output then masked to the exact elements. If None (default), everything is read.
num_elem

Number of elements in the selected group category (None if it could not be determined or is not applicable).

Type:int or None
num_files

Number of files in the collection (None if not determined).

Type:int or None

Note

For proper functionality, read_range requires that the file offsets be determined. If this is not possible, the entire data set will be read and then truncated (slower).

GroupIndex

Emulate a non-existing group index for all particles.

SubhaloIndex

Emulate a non-existing subhalo index for all particles.

aexp

Expansion factor of the data set.

file_offsets

Offset (index of first entry) of each file in the catalogue.

Note

This is currently None if the whole catalogue is to be read. The motivation for this is not entirely clear…

get_unit_conversion(dataset_name, units_name)

Get appropriate factor to convert data units to other system.

Parameters:
  • dataset_name (str) – The name of the data set for which to obtain the conversion factor (including possible containing groups, but not the base group).
  • units_name (str) –

    The unit system to calculate the conversion factor for. Options are (case-insensitive):

    • 'data' –> Exactly as stored in file (i.e. no conversion)
    • 'clean' –> As in file, but without a and h factors
    • 'astro' –> Astronomically useful units (e.g. M_Sun, pMpc)
    • 'si' –> SI units
    • 'cgs' –> CGS units
Returns:

data_to_other (float) – The conversion factor for the specified unit system. The ‘raw’ data as read from the file(s) must be multiplied with this value to obtain the magnitude in the target system.

Note

In particular in SI and CGS units, overflow issues may occur for 32-bit floats, because these have a maximum value of ~1e39.

Examples

>>> import hydrangea as hy
>>> snap_file = hy.objects.Simulation(index=0).get_snap_file(29)
>>> stars = hy.SplitFile(snap_file, part_type=4)  # or hy.ReadRegion
>>> stars.get_unit_conversion('Mass', 'astro')
14755791648.22193
>>> stars.get_unit_conversion('CentreOfPotential', 'SI')
4.553162166150214e+22
in_subhalo(subhalo_index, subhalo_file=None)

Identify members of a subhalo within the catalogue selection.

Parameters:
  • subhalo_index (int) – The index of the subhalo against which to test particles for membership.
  • subhalo_file (str or None, optional) – A file from the subfind catalogue containing the subhalo. If None (default), the file specified by the subfind_file attribute is used.

Note

This method internally reads and matches data betweeen this catalogue and the associated subhalo catalogue; this may take some time if there are many particles in either.

lookback_time

Lookback time to the data set from z = 0 [Gyr].

m_baryon

Initial baryon mass (only for snap-/snipshots).

m_dm

DM particle mass (only for snap-/snipshots).

num_elem

Number of catalogue entries to read.

With read_range set up (including implicitly through read_index), this gives the total number of elements in this range, not in the total catalogue. None if it could not be determined.

num_entries

Number of entries in output.

If read_index is specified, the number of elements in it. If read_range is specified, the length of this range. Otherwise, the total number of entries in the catalogue.

num_files

Number of files in the catalogue.

read_data(dataset_name, verbose=None, units=None, store=False, trial=False, data_type=None)

Read a specified data set from the file collection.

Parameters:
  • dataset_name (str) – The name of the data set to read, including possibly containing groups, but not the main group specified in the instantiation.
  • units (str or None, optional) – Convert to other unit system (default: class init value). Capitalization is ignored for these names.
  • verbose (int, optional) – Provide more or less useful messages during reading. Default: 1 (minimal)
  • store (str or None or False, optional) – Store the retrieved array as an attribute with this name. If None, the (full) name of the data set is used, with ‘/’ replaced by ‘__’. Default: False.
  • trial (bool, optional) – Attempt to read the data set. If it does not yield the expected number of elements for any one file or total (or returns any elements in HAC mode), return None. If False (default), enter debug mode in this case.
  • data_type (dataType or None, optional) – Read the data into an array of this data type. If None (default), determine this from the HDF5 data set.
Returns:

data (np.array) – Array containing the specified data.

redshift

Redshift of the data set.

subfind_file

Subfind catalogue file associated to a snapshot.

This must be set explicitly by the user. It can point to the output’s own subfind file (if it exists), but does not need to: in the latter case, it allows matching particles to structures at another point in time.

time

Age of the Universe of the data set [Gyr].