Converting Between HDF4 and HDF5

Converting Between HDF4 and HDF5#

Convert PSI-style HDF files between HDF4 (.hdf) and HDF5 (.h5) formats.

This example demonstrates two conversion routines:

  • convert() – a general-purpose bidirectional converter that preserves all datasets and their attributes while keeping the original dataset names intact.

  • convert_psih4_to_psih5() – a PSI-convention-aware converter that additionally remaps the standard HDF4 primary dataset name ('Data-Set-2') to its HDF5 equivalent ('Data').

import tempfile
from pathlib import Path
from psi_io import convert, convert_psih4_to_psih5, read_hdf_meta, data

Fetch a real PSI HDF4 data file (the radial magnetic field cube) to use as the conversion source:

br_hdf4_filepath = data.get_3d_data(hdf=".hdf")
print(f"Source file : {Path(br_hdf4_filepath).name}")
Source file : br.hdf

Inspect the HDF4 metadata. Note the PSI-standard dataset name 'Data-Set-2' and scale names 'fakeDim0', 'fakeDim1', 'fakeDim2':

source_meta = read_hdf_meta(br_hdf4_filepath)
[HDF4 source]  dataset='Data-Set-2'  shape=(181, 100, 151)  dtype=float32
    scale='fakeDim0'  shape=(181,)  range=[0.0000, 6.2832]
    scale='fakeDim1'  shape=(100,)  range=[0.0000, 3.1416]
    scale='fakeDim2'  shape=(151,)  range=[0.9996, 30.5116]

Generic conversion with convert()

The generic converter reads every non-scale dataset in the source file and writes it to the output file under the same name. For a PSI HDF4 file, this means the primary dataset is preserved as 'Data-Set-2' in the resulting HDF5 file. All associated scale datasets and attributes are also carried over:

[convert() → HDF5]  dataset='Data-Set-2'  shape=(181, 100, 151)  dtype=float32
    scale='dim1'  shape=(151,)  range=[0.9996, 30.5116]
    scale='dim2'  shape=(100,)  range=[0.0000, 3.1416]
    scale='dim3'  shape=(181,)  range=[0.0000, 6.2832]

Note

Because convert() preserves dataset names verbatim, the resulting HDF5 file has a 'Data-Set-2' dataset rather than the 'Data' dataset expected by read_hdf_data() and other psi-io reading routines by default. Use convert_psih4_to_psih5() when PSI-convention HDF5 naming is required.

PSI-convention conversion with convert_psih4_to_psih5()

This converter is designed specifically for PSI-style HDF4 files. It reads the 'Data-Set-2' dataset and writes it as 'Data' in the output HDF5 file, matching the naming convention expected by all psi-io reading routines. Scale names are also updated from 'fakeDimN' to 'dimN':

with tempfile.TemporaryDirectory() as tmpdir:
    out_psi = Path(tmpdir) / "br_psi.h5"
    convert_psih4_to_psih5(br_hdf4_filepath, out_psi)

    psi_meta = read_hdf_meta(out_psi)
[convert_psih4_to_psih5() → HDF5]  dataset='Data'  shape=(181, 100, 151)  dtype=float32
    scale='dim1'  shape=(151,)  range=[0.9996, 30.5116]
    scale='dim2'  shape=(100,)  range=[0.0000, 3.1416]
    scale='dim3'  shape=(181,)  range=[0.0000, 6.2832]

HDF5 → HDF4 conversion

convert() is bidirectional. Passing an HDF5 file as input and an HDF4 path as output performs the reverse conversion. When ofile is omitted, the output file is placed alongside the input file with its extension swapped:

Note

Here the strict parameter is set to False. It is CRITICAL to note that HDF4 has a more restrictive attribute type system than HDF5. HDF5 Datasets are typically written with DIMENSION_LABELS and DIMENSION_LIST attributes – arrays of coordinate-variable proxy objects (which HDF4 cannot represent).

Therefore, it is generally advised to set strict=False when converting from HDF5 to HDF4 to avoid conversion failures due to unsupported attribute types. With strict=False, unsupported attributes are skipped with a warning rather than causing the entire conversion to fail.

with tempfile.TemporaryDirectory() as tmpdir:
    br_h5_filepath = data.get_3d_data(hdf=".h5")
    out_hdf = Path(tmpdir) / "br_back.hdf"
    convert(br_h5_filepath, out_hdf, strict=False)

    back_meta = read_hdf_meta(out_hdf)
Warning: Failed to set attribute 'DIMENSION_LABELS' on dataset 'Data'; skipping.
Warning: Failed to set attribute 'DIMENSION_LIST' on dataset 'Data'; skipping.
[convert() → HDF4]  dataset='Data'  shape=(181, 100, 151)  dtype=float32
    scale='fakeDim0'  shape=(181,)  range=[0.0000, 6.2832]
    scale='fakeDim1'  shape=(100,)  range=[0.0000, 3.1416]
    scale='fakeDim2'  shape=(151,)  range=[0.9996, 30.5116]

Total running time of the script: (0 minutes 0.069 seconds)

Gallery generated by Sphinx-Gallery