Modern visualization formats in TNL-LBM

Jakub Klinkovský

Czech Technical University in Prague
Faculty of Nuclear Sciences and Physical Engineering
Department of Software Engineering

LBM in Krakow 2025
February 7 2025

Outline

  1. Motivation: from VTK to ADIOS2
  2. Current approaches in TNL-LBM
  3. Challenges and results
  4. Conclusion

Tools for scientific visualization

Note: I will talk about volumetric data visualization, not about plotting graphs etc.

  • State-of-the art library VTK (Visualization Toolkit) developed by Kitware

  • Derived applications: ParaView (Kitware), VisIt, etc.

  • Most tools follow the simple post-processing scheme:

    • computation \rightarrow snapshots on disk \rightarrow visualization
  • Alternative scheme is co-processing (in-situ visualization)

    • rendering during computation, reduced snapshots on disk
    • available with e.g. ParaView Catalyst
  • Recent push towards web applications (VTK.js, ParaViewWeb, Trame, etc.)

Overview of file formats

ParaView can read at least 73 file formats:

  • some are general data formats (CSV, NetCDF, XDMF, VTK, etc.)

  • some are specific to certain software tools (OpenFOAM, Fluent, MFIX, etc.)

ParaView can export data in several formats:

  • 7 general data formats (limited file conversion tool)

  • common multimedia files (PNG, JPEG, AVI, OGV)

Legacy VTK file format

  • First attempt of a unified file format for visualization
  • Supports multiple datasets: STRUCTURED_POINTS, STRUCTURED_GRID, RECTILINEAR_GRID, UNSTRUCTURED_GRID, POLYDATA
  • Supports files with ASCII and binary data encoding (not portable)
  • Parsing the data is hard and not robust
# vtk DataFile Version 2.0
Presentation Demo
ASCII
DATASET POLYDATA
POINTS 8 float
0.0 0.0 0.0 ... ... ...

XML VTK file format

  • More complex, but more flexible than the legacy file format
  • Supports multiple datasets: ImageData, StructuredGrid, RectilinearGrid,
    UnstructuredGrid, PolyData
  • Data arrays are encoded in one of three formats: ASCII, binary (base64 encoding), or appended (one global XML element for all data arrays)
  • Supports random access, parallel I/O, and portable data compression
<?xml version="1.0"?>
<VTKFile type="ImageData" version="1.0" byte_order="LittleEndian" header_type="UInt32">
  <ImageData WholeExtent="0 2 0 3 0 4" Origin="0 0 0" Spacing="1 1 1 ">
    <Piece Extent="0 2 0 3 0 4 ">
      <PointData>...</PointData>
      <CellData>...</CellData>
    </Piece>
  </ImageData>
</VTKFile>

Problems of VTK file formats

  • There is no complete formal specification, only a reference implementation
  • The reference implementation is too large – VTK is a rendering library, not an I/O library
  • Low performance (everything must be converted/parsed)

ADIOS2 library

ADIOS2 (Adaptable Input Output System 2) is:

  • A unified high-performance I/O framework
    (used in supercomputer applications that write and read petabytes of data)
  • MPI-based – provides parallel I/O, but can be used in serial applications too
  • Streaming-oriented (asynchronous data transfers) and step-based

ADIOS2 is not:

  • A file-only I/O library – it can be used for general data transfers (TCP, RDMA, MPI)
  • A hierarchical model ­– ADIOS2 does not enforce data models, they can be built on top of it

Feature comparison: VTK vs ADIOS2


Feature Legacy VTK XML VTK ADIOS2
Binary output format yes yes (base64) yes
Compression no yes (lossless) yes (lossy or plugin)
Parallel I/O no yes (separate files) yes
High-performance I/O no no yes
Integration with ParaView yes yes yes (explained later)

Hence, we wanted to replace Legacy VTK with ADIOS2 in the TNL-LBM project.

center

ADIOS2 engines

  • There are virtual engines and concrete engines
  • Virtual engines select an engine and its parameters for a specific purpose:
    • File, FileStream, InSituAnalysis, InSituVisualization,
      and CodeCoupling
  • Engines provide interaction with system I/O resources to handle data transfers:
    • BP3, BP4, BP5 ­– storage in native binary pack file formats (.bp)
    • HDF5 – storage in the Hierarchical Data Format (.h5)
    • SST (Sustainable Staging Transport), SSC (Strong Staging Coupler), DataMan, etc. – high-performance engines for network data transfer

How to write .bp files with ADIOS2

#include <adios2.h>

int main(int argc, char *argv[]) {
  ...

  adios2::ADIOS adios(MPI_COMM_WORLD);
  adios2::IO io = adios.DeclareIO("bp");
  io.SetEngine("BP4");

  adios2::Dims shape({global_Nz, global_Ny, global_Nx});
  adios2::Dims start({offset_z, offset_y, offset_x});
  adios2::Dims count({local_Nz, local_Ny, local_Nx});
  auto density_var = io.DefineVariable<double>("density", shape, start, count);

  const std::string filename = "test-output.bp";
  adios2::Engine engine = io.Open(filename, adios2::Mode::Write);

  engine.BeginStep();
    engine.Put(density_var, density_data_ptr);
    // more stuff can be written here...
  engine.EndStep();
  engine.Close();
}

What does it look like?

The .bp format creates a directory rather than a single file:

$ tree test-output.bp/
test-output.bp/
├── data.0
├── md.0
├── md.idx
└── profiling.json

1 directory, 4 files

What does it look like?

We can examine the binary file with bpls:

$ bpls --dump --long -av test-output.bp/
File info:
  of variables:  1
  of attributes: 1
  statistics:    Min / Max 

  double   density  2*{10, 20, 30} = -0.99963 / 0.998827
    (0,0, 0, 0)    0.978451 -0.283784 -0.672374 0.75637 -0.725867 -0.945123
    (0,0, 0, 6)    0.316181 -0.592851 0.497916 0.425028 0.715004 0.189573
    (0,0, 0,12)    0.454404 0.987235 0.619286 0.476911 0.789413 -0.0561796
    (0,0, 0,18)    0.231284 -0.890819 -0.25781 -0.286346 0.53493 -0.976957
    (0,0, 0,24)    -0.650337 -0.530909 -0.637601 0.257753 0.776025 -0.441885
    ...

The dimensions are reported as "number of steps" ×Nz×Ny×Nx\times N_z \times N_y \times N_x

ADIOS2 integration in ParaView

There is a tutorial how to visualize data with ADIOS2 .bp files:

  const std::string extentG = fmt::format("0 {} 0 {} 0 {})", global_Nz, global_Ny, global_Nx);
  const std::string extentL = fmt::format("0 {} 0 {} 0 {})", local_Nz, local_Ny, local_Nx);
  const std::string origin = "0 0 0";
  const std::string spacing = fmt::format("{} {} {}", dz, dy, dx);
  const std::string vtk_scheme = fmt::format(R"(
    <?xml version="1.0"?>
    <VTKFile type="ImageData" version="0.1" byte_order="LittleEndian">
      <ImageData WholeExtent="{0}" Origin="{1}" Spacing="{2}">
        <Piece Extent="{3}">
          <CellData>
            <DataArray Name="density"/>
          </CellData>
        </Piece>
      </ImageData>
    </VTKFile>)", extentG, origin, spacing, extentL);

  engine.BeginStep();
    engine.Put(density_var, density_data_ptr);
    io.DefineAttribute<std::string>("vtk.xml", vtk_scheme);
  engine.EndStep();

💣 ParaView plugins

The previous approach relies on the ADIOS2VTXReader plugin in ParaView:

  • to map the data arrays to visualization objects in ParaView, we need to define a textual attribute in the .bp file
  • we used ImageData, but similar mapping can be done for UnstructuredGrid
  • the implementation for ImageData is very limited – e.g. vector fields do not work

💣 ParaView plugins

There are some alternatives:

  • the ADIOS2CoreImageReader plugin should be able to introspect the dataset and automatically detect the correct mapping of data arrays for visualization... but we did not try it yet 🚧
  • the FidesReader uses an alternative data-mapping schema, which is more complete and actively developed
    • either an external .json file or separate attributes in the ADIOS2 file
      (e.g. Fides_Data_Model, Fides_Origin, Fides_Spacing,
      Fides_Variable_List, etc.)
    • but there are still some limitations...

💣 FidesReader problem

When the simulation is run in a distributed manner, e.g. with mpirun -np 8, and the output is based on PointData, the Fides dataset contains "gaps" between the subdomains:

Other use-case for ADIOS2

  • ADIOS2 is not only for visualization – it can save and load arbitrary data
  • we have implemented a CheckpointManager class in TNL-LBM:
    • periodic snapshots of the whole simulation state based on walltime
    • simulations can start either from zero or the latest snapshot
    • hence, we can run "infinite" simulations on any cluster 😊

Conclusion

  • TNL-LBM can use ADIOS2 for simulation snapshots and visualization output
  • using a "modern library" is not automatically great 🙃

Current work

  • in-situ visualization with ParaView Catalyst

Future work

  • performance optimization
  • asynchronous operations (e.g. writing multiple arrays at the same time, or overlapping computations with data output – up to the next snapshot period)

Thank you for your attention!

Dziękuję za uwagę!

Abstract: Scientific visualization in our research group is mostly based on the traditional VTK file formats. However, these formats are not flexible and cause bottlenecks in high-performance applications operating on large datasets. In this talk, we summarize our progress towards using the ADIOS2 library, a unified, MPI-based and streaming-oriented high-performance I/O framework, for managing data output in the TNL-LBM project.