Advancements in elastic wave solvers using DevitoPRO and mixed-precision

New challenges need new algorithms

Featured image

At Devito Codes, we are making significant strides in developing highly optimized elastic wave solvers using DevitoPRO. Elastic wave inversion provides a more comprehensive understanding of subsurface properties than acoustic inversion. By capturing both P-wave and S-wave data, elastic inversion enhances the resolution and accuracy of subsurface models, which is crucial for exploration geophysics and other applications. However, elastic RTM/FWI is significantly more complex and computationally expensive than acoustic RTM/FWI, making it vital to develop fast, innovative solutions.

Utilizing mixed-precision methods, we have found that we can nearly double the performance of wave propagators. Comparisons with standard 32-bit floating point implementations have shown negligible numerical errors. While mixed-precision support in DevitoPRO is still under development, we are working with energy and computer industry collaborators to accelerate our roadmap to bring mixed-precision support into production quickly.

The rapid growth in machine learning and AI is driving all processor manufacturers to better support these workloads by focusing more silicon on 16-bit floating point (FP16) and other reduced precision floating point datatypes, with Nvidia announcing FP4 and AMD announcing FP4 and FP6 and this year for their next generation of processor. At the same time, the rate of increase in memory bandwidth is lower than the rate of computational power increasing on these processors.

These hardware trends challenge the vast majority of HPC code developers, who have traditionally relied upon FP64 arithmetic to maintain accuracy, and seismic imaging workloads, which predominantly use FP32.

For these reasons, mathematical software programmers across all HPC communities are increasingly focused on developing algorithms and software support for accuracy-preserving mixed precision. However, this will not be straightforward, necessitating an interdisciplinary approach and changes throughout the full software stack to achieve reliable results.

DevitoPRO for agility and performance portability

DevitoPRO enhances Devito’s core functionalities by incorporating advanced compiler techniques for automatic optimization of stencil computations. Users can write complex finite-difference schemes in high-level Python code, which is then translated into optimized, parallelized C/CUDA/HIP/SYCL code suitable for various hardware architectures. DevitoPRO includes many advanced algorithmic optimizations like the expanding-box technique, which focuses computation on active domains to reduce overhead. Beyond performance enhancements, DevitoPRO offers essential features for production-level seismic imaging. It supports compression-based asynchronous serialization and intelligent data-streaming techniques for efficient disk-host-GPU transfers, significantly improving data management and performance during reverse time migration (RTM) and full-waveform inversion (FWI). These capabilities make DevitoPRO an indispensable tool for high-performance and scalable seismic imaging solutions.

Reduced memory pressure and mixed-precision

In DevitoPRO, we successfully doubled the speed of elastic propagators by using reduced-precision storage for model parameters and wavefields while using mixed-precision for floating-point arithmetic. This approach benefits from reduced memory pressure and makes use of available hardware support for mixed-precision. Our comparisons with pure FP32 implementations have shown negligible numerical errors, validating the effectiveness of this method.

Example - Elastic VTI Running on Intel Sapphire Rapids As an example, we benchmarked a discretized form of elastic VTI running on the SEAM 3D model (Fehler and Larner, 2008) using DevitoPRO on a single socket Intel Sapphire Rapids. The velocity-stress equations are solved using a space order of 8 and a first-order leap-frog time discretization on an 800x400x400 grid. Using FP32, the best result was 2.45 GP/s (giga-points-per-second), while using mixed-precision, we were able to achieve 4.1 GP/s, which is a 1.7 speedup. For comparison, we plot a shot record and wavefield snapshot of the mixed-precision wavefield on the left panels below and the absolute error between the mixed-precision and standard FP32 solution, scaled up by a factor of 500 so it would be visible on the right-hand side panels below.

Shot record comparison Wavefield comparison

Conclusion

A major barrier to the adoption of mixed-precision arithmetic in production codes is the dual complexity of designing robust algorithms and managing the resulting software. DevitoPRO overcomes this by insulating application developers from this complexity through the Devito domain-specific language (DSL) and compiler pipeline.

Another significant barrier to developing mixed-precision software is the lack of standardized reduced-precision floating-point formats in programming languages and standard libraries such as MPI. Although initiatives such as Microscaling Formats (MX) Alliance is actively working on developing standards to simplify the integration and adoption of these new data formats across the industry; it may take years for these standards to be widely available throughout the entire software ecosystem. In the meantime, DevitoPRO has implemented portability layers to enable users to leverage reduced precision on currently supported platforms.

Even in cases where the mathematical formulation of the problem needs adjustment to make it more amenable to reduced-precision floating point arithmetic, these changes are at a high level where it is relatively straightforward to test and evaluate different variations. This agility is one of the reasons why DevitoPRO will be instrumental in the industry transition to leveraging mixed-precision hardware platforms.

Early results are promising, showing a 70% improvement in performance. Ongoing efforts aim to develop more sophisticated compiler passes to fully utilize mixed precision on modern processors, potentially achieving even greater speedups in the future. By continuing to innovate and optimize, DevitoPRO is poised to make elastic modeling and inversion more accessible and efficient, benefiting a wide range of scientific and engineering applications.