November 9, 2022 4 min to read

Devito DevOps Cluster v2

Now with Instinct™

TLDR

Thanks to support from AMD we are happy to announce:

DevitoPRO now has HIP support for AMD Instinct™ GPU’s
AMAX AceleMax DGS-428 4U Server has been added to the Devito DevOps cluster.

Nitty Gritty

One of Devito’s core missions is performance portability. Our current roadmap targets the matrix of architectures and parallel programming models below.

Architecture	MPI	OpenMP
AMD/CPU
AMD/GPU
ARM
Intel/CPU
Intel/KNC,KNL
Intel/GPU	TBA	TBA
NVidia

To support the delivery of this vision, since January 2020 Devito Codes has maintained a distributed cluster as part of our DevOps infrastructure for both open-source Devito and DevitoPRO. This provides services such as:

Testing (i.e. CI) ¹.
Performance optimization on ‘bare metal’ target platforms.
Performance benchmarking ².
Deployment (i.e. CD via Docker and release mirrors).

While Devito Codes invested directly in hardware at the outset, long term loans and donations now make up the majority of the cluster ³. At the time of writing the cluster is comprised of:

Model	units	CPU	GPU	Sponsor
AMAX AceleMax DGS-428 4U Server	1	AMD EPYC 7643 48-Core Processor	4x AMD Radeon Instinct MI210	AMAX
Dell	1	AMD	4 X A100 (80G)	NVidia & Dell
HP	1	Intel Xeon		Devito Codes
Fujitsu A64FX	1	ARM64		Fujitsu
self (custom build)	1	AMD	2 x MI50	Devito Codes (server) & AMD (GPUs)
self (custom build)	4	Intel/AMD PC CPUs	3 x RTX3090, RTX3080	Devito Codes
Supermicro	4	Intel Xeon Gold	4 x V100	NVidia & Supermicro

While we were optimizing for the Intel KNC and KNL, [DUG]{https://dug.com/} provided us with access to nodes for DevOps. Today they are running their own DevOps on KNC and KNL for DUG Wave which depends upon Devito. So far no problems have been experienced so long as we monitor for performance regression on Intel Xeon’s running with OpenMP.

We do all our deployment on the cluster using Docker containers. We do this to ensure that the same environment is used for testing and deployment. This has also been useful for performance debugging on Cloud platforms, such as AWS and Azure, because we can differentiated between Docker related issues and the underlying platform.

Early on we relied on Cloud computing nodes rather than our own hardware. However, we ran into a number of issues:

Sponsored Cloud credit severely limited what architectures we were able to test on - in particular we would not use any modern GPU.
It was difficult to control costs. We make heavy use of GitHub Actions for automation. Self hosted runners require you to run a client on your self-hosted runner. This always on model meant we would rack up an eye-watering bill at the end of each month - in fact, this is what provided the initial motivation to build our own cluster.

There is no doubt that we will revisit the Cloud in the future with a smarter strategy to control costs. However, we are committed to continuing with our strategy of maintaining our own hardware so we can always drill down to bare metal when we are looking for maximum performance.

If it is not tested, then it is broken. ↩
Performance benchmark specification includes code verification for correctness. ↩
Many thanks to all the vendors that helped make the Devito Cluster happen: AMD, Dell, DUG, NVidia, Supermicro ↩

Devito DevOps Cluster v2

TLDR

Nitty Gritty

Cross-platform seismic imaging benchmarking

Share