HDF5 I/O

For convenience, the library provides a number of tools to simplify reading and writing HDF5 files in the hydrangea.hdf5 module. All of them are relatively simple wrappers around the h5py library, and can be used to read and write datasets and attributes, or inspect the contents of an HDF5 file.

Note

The functions in this module are provided for coding convenience. In situations involving heavy HDF5 I/O, better performance may be obtained by using the h5py library directly.

Examples

Write a numpy array to an HDF5 dataset named 'Random' in a file data.hdf5 (including a comment describing what the dataset contains):

random_data = np.random.random(100)  # Generate some data
hydrangea.hdf5.write_data('data.hdf5', 'Random', random_data,
                          comment="100 randomly generated data points")

Read the data and comment back in:

data_in = hydrangea.hdf5.read_data('data.hdf5', 'Random')
data_comment = hydrangea.hdf5.read_attribute('data.hdf5', 'Comment')

Alternatively, only read data point 76:

data_76 = hydrangea.hdf5.read_data('data.hdf5', 'Random', read_index=76)

Use cases more directly related to the Hydrangea/C-EAGLE simulations are provided by the demonstration scripts in the examples directory (see the “Basic examples” section).

Reference

HDF5 data handling

HDF5 attribute handling

Inspect HDF5 structure

Convenience routines for reading and writing data in HDF5 format.

hydrangea.hdf5.attrs_to_dict(file_name, container)

Read all attributes from a specified file and container into a dict.

Parameters:
  • file_name (string) – The HDF5 file to read the attributes from.
  • container (string) – The name of the container (data set or group) containing the attribute, including possibly containing groups (e.g. ‘nested/group’)
Returns:

dict – Dictionary of ‘Attribute name’ : ‘Attribute value’

Note

An exception is raised if the container does not exist. If there are no attributes attached to the container, an empty dict is returned.

hydrangea.hdf5.list_datasets(file_name, group=None)

Return a list of all datasets in an HDF5 group.

Parameters:
  • file_name (string) – The path to the HDF5 file whose data sets should be listed.
  • group (string, optional) – The group in which to look for data sets (default: None, i.e. search in the root group of the HDF5 file).
Returns:

list of strings – The names of all data sets in the target group.

Note

An exception is raised if the file could not be opened.

hydrangea.hdf5.read_attribute(file_name, container, att_name, default=None, require=False, convert_string=True)

Read an HDF5 attribute from a specified file and container.

Parameters:
  • file_name (string) – The HDF5 file to read the attribute from.
  • container (string) – The name of the container (data set or group) containing the attribute, including possibly containing groups (e.g. ‘nested/group’)
  • att_name (string) – The name of the attribute to read from the specified container.
  • default (scalar, np.array, or string) – A default value to return if the attribute (or its container, or the file) does not exist. Default: None
  • require (bool) – Raise an exception if the attribute does not exist. If False (default), the default value is returned instead.
  • convert_string (bool) – Convert an attribute of type np.string_ to a standard string. Default: True
Returns:

attribute (scalar, np.array, or string) – The attribute value read from the HDF5 file.

hydrangea.hdf5.read_data(file_name, container, read_range=None, read_index=None, index_dim=0, require=False)

Read one dataset from an HDF5 file.

Optionally, only a section of the dataset may be read, as specified by the read_range, read_index, and index_dim parameters.

Parameters:
  • file_name (string) – The HDF5 file to read the dataset from.
  • container (string) – The name of the dataset, including possibly containing groups (e.g. ‘well/nested/group/data’).
  • read_range ((int, int) or None, optional) – Read only elements from the first up to but excluding the second entry in the tuple (in dimension index_dim). If None (default), read the entire file. Ignored if read_index is provided.
  • read_index (int or np.array(int) or None, optional) – Read only the specified element(s), in dimension index_dim. If int, a single element is read, and the first dimension truncated. If an array is provided, the elements between the lowest and highest index are read and the output then masked to the exact elements. If None (default), everything is read.
  • index_dim (int, optional) – Dimension over which to apply the read cuts specified by read_range or read_index (default: 0).
  • require (bool, optional) – Require the presence of the data set, and raise an exception if it does not exist. If False (default), return None in this case.
Returns:

np.array (type and dimensions as dataset).

hydrangea.hdf5.test_attribute(file_name, container, attribute)

Test whether an HDF5 attribute exists.

Parameters:
  • file_name (string) – The path to the HDF5 file to test.
  • container (string) – The container (group or dataset) in which the attribute’s presence is tested (including possible containing groups).
  • attribute (string) – The name of the attribute to test.
Returns:

boolTrue if the attribute exists, False if it does not. If the container or file does not exist, False is also returned.

hydrangea.hdf5.test_dataset(file_name, dataset)

Test whether a specified dataset exists in an HDF5 file.

Parameters:
  • file_name (string) – The path to the HDF5 file to test.
  • dataset (string) – The data set to test, including possible containing groups (e.g. ‘well/nested/group/data’)
Returns:

boolTrue if file_name is an HDF5 file and contains the specified data set. False otherwise.

hydrangea.hdf5.test_group(file_name, group)

Test whether a specified group exists in an HDF5 file.

Parameters:
  • file_name (string) – The path to the HDF5 file to test.
  • group (string) – The group to test, including possible containing groups (e.g. ‘well/nested/group/’)
Returns:

boolTrue if file_name is an HDF5 file and contains the specified group. False otherwise.

hydrangea.hdf5.write_attribute(file_name, container, att_name, value, new=False, group=True, update=True)

Write a variable to an HDF5 attribute.

Parameters:
  • file_name (string) – The HDF5 file to write the attribute to. If it does not already exist, it is created new.
  • container (string) – The name of the container (data set or group) to attach the attribute to, including possibly containing groups (e.g. ‘/well/nested/group/’)
  • att_name (string) – The name of the attribute to write to.
  • value (scalar, np.array, or string) – The variable to write as HDF5 attribute.
  • new (bool, optional) – If a file with the specified file_name already exists, rename it to ‘file_name.old’ and start a new file containing only an (empty) container with this attribute. Default: False.
  • group (bool, optional) – If the specified container does not exist, create it as a group (if True, default) or (empty) data set (False).
  • update (bool, optional) – First check whether the attribute already exists, and update it if so (default). If False, a pre-existing attribute with the same name is not altered, and the input value not added to the file.

Note

If the specified variable is a string, it is first converted to type np.string_, in order to be acceptable as HDF5 attribute.

hydrangea.hdf5.write_data(file_name, container, array, new=False, comment=None, update=True, replace=False, compression=None)

Write a numpy array to an HDF5 dataset.

If the specified file and/or data set already exists, the default is to add the data to this, modifying the pre-existing content if required.

Parameters:
  • file_name (string) – The HDF5 file to write the data set to. If it does not already exist, it is created new.
  • container (string) – The name of the data set to write, including possibly containing groups (e.g. ‘/well/nested/group/data’)
  • array (np.array) – The numpy array to write to an HDF5 file (can be of any data type).
  • new (bool, optional) – If a file with the specified file_name already exists, rename it to ‘file_name.old’ and start a new file. Default: False.
  • comment (string, optional) – Comment text to describe the content of the data set; written as an HDF5 attribute ‘Comment’ to the data set. If None (default), no comment is written.
  • update (bool, optional) – Check whether the data set already exists, and update it if so (default: True). If False, a pre-existing data set is left intact, and array not written to the HDF5 file.
  • replace (bool, optional) – If a data set is updated, always delete the old one first. If False (default), in-place update is attempted (as long as the old and new data sets are of the same shape). This parameter is only meaningful if update is True.
  • compression (string, optional) – Compression to be applied to a newly created data set. Options are ‘gzip’ or ‘lzf’ (default: None, no compression applied).