Hands‑On Guide: Containerizing Horovod + TensorFlow for AMD GPU Clusters

Reviving Horovod with Tensorflow support for AMD GPUs

As machine learning frameworks evolve, older tools sometimes fall behind — even those that were once essential. When libraries stop being actively maintained, keeping them running with newer software stacks becomes a challenge of its own.

One example of this is Horovod — a library that used to be the go-to solution for distributed deep learning.

Originally developed at Uber, Horovod made it simple to scale training across multiple GPUs or nodes by wrapping MPI and NCCL backends for TensorFlow, PyTorch, and MXNet.

However, Horovod isn’t seeing much active development these days.
The last major release (v0.28.1) appeared in 2023. As a result, running Horovod with Tensorflow on modern systems has become increasingly tricky.

The Challenge

The goal was to build a working setup for TensorFlow + Horovod on AMD GPUs, preferably inside a container, using the newest possible versions.

But in practice, it quickly got complicated.

After some digging, it turned out that ** the ROCm Docker registry** for TensorFlow still only provides a combined TensorFlow + Horovod image based on TensorFlow 2.9.

On the other hand, an unofficial combinations quickly run into breaking changes. For example, starting with TensorFlow 2.16, Horovod crashes with:

AttributeError: 'Variable' object has no attribute 'ref'

This is due to internal TensorFlow API changes (see Horovod issue #4290).

The Working Setup

After some experimentation, a stable configuration was found that still works reliably:

  • TensorFlow 2.15
  • Horovod 0.28.1
  • ROCm 6.3.3

Here’s the working Apptainer recipe:

BootStrap: docker
From: rocm/tensorflow:rocm6.3.3-py3.10-tf2.15-dev

%post
    python -m pip install --upgrade pip

    apt-get update && apt-get install -y cmake
    apt-get update && apt-get install -y \
    openmpi-bin \
    openmpi-common \
    libopenmpi-dev


    git clone --recursive https://github.com/horovod/horovod.git /opt/horovod
    cd /opt/horovod


    ln -s ${ROCM_PATH}/lib/cmake/hip/FindHIP* cmake/Modules
    sed -i 's#rccl.h#rccl/rccl.h#g' horovod/common/ops/nccl_operations.h

    CC=mpicc \
    CXX=mpicxx \
    MAKEFLAGS=-j16 \
    HOROVOD_GPU_BROADCAST=NCCL \
    HOROVOD_GPU_ALLREDUCE=NCCL \
    HOROVOD_WITHOUT_MXNET=1 \
    HOROVOD_WITH_TENSORFLOW=1 \
    HOROVOD_WITHOUT_GLOO=1 \
    HOROVOD_WITH_MPI=1 \
    HOROVOD_ROCM_PATH=${ROCM_PATH} \
    HOROVOD_ROCM_HOME=${ROCM_PATH} \
    HOROVOD_GPU=ROCM \
    HOROVOD_WITHOUT_PYTORCH=1 \
    python setup.py bdist_wheel 2>&1

    pip install ./dist/horovod-0.28.1-cp310-cp310-linux_x86_64.whl

%labels
  author Nastassya Horlava
  version 1.0

%help
  The converted Docker image for AMD Tensorflow (version 2.15).

Takeaways

Even though Horovod hasn’t been actively maintained in recent years, it can still be revived with careful version matching and a few small patches.

Keep these points in mind:

  • :warning: Performance may lag behind modern frameworks.
  • :warning: TensorFlow ≥ 2.16 is no longer compatible.
  • :warning: Long-term support is uncertain.

In short, Horovod’s legacy still lives on — but it’s gradually becoming fragile as the ecosystem moves forward.

If you rely on distributed training with AMD GPUs, this setup will get you running - but it’s worth considering newer, actively maintained frameworks optimized for ROCm for future projects.

Nice work! :clap:

Do you know whether there’s a ready to use container for NVIDIA GPUs that works out of the box with Horovod and TensorFlow?

Thanks! :blush:

I haven’t tested it myself, but according to the NVIDIA Frameworks Support Matrix, the official NVIDIA TensorFlow containers include Horovod support. For example, you can see this in the TensorFlow 25.02 release notes, among other releases.