Introduction

This tutorial presents NixOS Compose. This tool generates and deploy fully reproducible system environments. It relies on the Nix package manager and the associated Linux distribution NixOS.

You can find the associated publication here. And some partial documentation here.

First, we will set up the connection to the Grid'5000 machines. Then we will present NixOS Compose on a simple example to get you acquainted with the notions and commands. Finally, we will create step by step a reproducible environment for a distributed experiment.

Note: Beside some SSH configuration, all of the commands will be executed on the Grid'5000 platform.

Grid'5000

Grid'5000 (a.k.a. g5k) is a french testbed for distributed experiments.

This tutorial relies on this platform to perform deployments. We will thus create accounts for you.

Hopefully, by the time you read these lines, your account should have already been created.

Email

You should receive an email like:

Subject: [Grid5000-account] Your Grid5000 account was created by ...

Dear Firstname Lastname (username),

You receive this email because your manager (...) requested a Grid5000
account for you in the context of a tutorial. To get more information about
Grid5000, see the website: http://www.grid5000.fr.

Your login on the Grid5000 platform is: username.

The next two steps for you are now to:

1/ Finish setting up your access to the platform by creating a password and an SSH key.

   To do so, open the following URL:
   https://public-api.grid5000.fr/stable/users/setup_password?password_token=XXXXXXXXXXXXXXXXXXXXXX#special.

2/ Read carefully the two following pages:

   The Grid5000 getting started documentation (https://www.grid5000.fr/w/Getting_Started),
   which gives important information on how to use the platform.

   The Grid5000 usage policy (https://www.grid5000.fr/w/Grid5000:UsagePolicy),
   which gives the rules that MUST be followed when using the platform. Note that any
   abuse will automatically be detected and reported to your manager.

Follow the steps in the email before continuing.

Connect to Grid'5000

Add these lines to your ~/.ssh/config file and replace G5K_USERNAME by your username:

Host g5k
  User G5K_USERNAME
  Hostname access.grid5000.fr
  ForwardAgent no
Host *.g5k
  User G5K_USERNAME
  ProxyCommand ssh g5k -W "$(basename %h .g5k):%p"
  ForwardAgent no

You should be able to access the different sites of Grid'5000:

ssh grenoble.g5k

We can, for example, use the Grenoble site for this tutorial.

Tips

SSH connections can be broken, which is quite annoying when doing experiments. We thus recommend to use tmux to deal with it.

First job reservation

You can try to make a job reservation. Grid'5000 uses OAR as a resources job manager.

oarsub -I --project lab-2025-compas-nxc 
  • oarsub is the command to make a submission

  • -I means that this is an interactive job

  • --project lab-2025-compas-nxc is the accounting project linked for this tutorial

This command should give you access to a node that is not the frontend (fgrenoble), but a node from the dahu cluster (dahu-X). You can execute some commands there. And once you are done, just exit the shell, and you will be back on the frontend. As this is an interactive job, the job will be killed as soon as you exit the shell.

tmux

We recommend using tmux on Grid'5000 as the connection between your laptop and Grid'5000 could break, and you could lose access to your work.

A cheat-sheet of tmux is available here: https://tmuxcheatsheet.com/.

Installing NixOS Compose

The python API

We can install NixOS Compose via pip:

pip install nixos-compose

You might need to modify your $PATH:

echo "export PATH=$PATH:~/.local/bin" >> ~/.bash_profile
source ~/.bash_profile

The nix package manager

NixOS-Compose unsurprisingly needs Nix. Let us install it locally:

nxc helper install-nix

This will put the nix store in ~/.local/share

Set up a preloaded store

To accelerate a bit the tutorial, we will preload the store to avoid having to download a lot of packages:

curl -sL https://gitlab.inria.fr/nixos-compose/tuto-nxc/-/raw/main/misc/import-base-store.sh | bash

Create your first composition

Before jumping into the creation of an environment for the IOR benchmark, let us go through a simpler example.

In the next sections, we will present the notions and commands of NixOS Compose.

Start from a template

NXC proposes several templates, which are good starting points.

Let us use the basic one.

mkdir tuto
cd tuto
nxc init -t basic

The previous command created 3 files in the tuto folder:

  • nxc.json: a JSON file required by NXC. You never have to modify it.

  • flake.nix: the Nix file responsible to lock all the inputs.

  • composition.nix: the Nix file representing the distributed environment.

Inspect the composition.nix file

If you open the composition.nix file, you will find the following:

{ pkgs, ... }: {
  roles = {
    foo = { pkgs, ... }:
      {
        # add needed package
        # environment.systemPackages = with pkgs; [ socat ];
      };
  };
  testScript = ''
    foo.succeed("true")
  '';
}

The composition is a function that takes a set as input ({ pkgs, ... }) and returns a set containing:

  • a testScript string

  • a roles set of NixOS configurations

What interest us for the moment is the roles set. In the example above, we define a single role named foo with an empty configuration. We can add packages to the environment by uncommenting the environment.systemPackages line:

{ pkgs, ... }: {
  roles = {
    foo = { pkgs, ... }:
      {
        # add needed package
        environment.systemPackages = with pkgs; [ socat ];
      };
  };
  testScript = ''
    foo.succeed("true")
  '';
}

Local deployment

In the NixOS Compose workflow, the idea is to first iterate with lightweight flavours such as docker or vm.

Before deploying at full scale to Grid'5000, let's try to deploy the environment locally on a single machine.

If you have a Linux machine you can try to run NixOS Compose on your machine, otherwise, you can also use Grid'5000.

Note: all the commands are to be executed from the root directory.

On your local Linux machine

Installing NixOS Compose

Please refer to the previous section to install NixOS Compose and Nix on your machine.

Enter the Nix environment

nix develop

Build the composition

nxc build -f vm

Start the composition

nxc start

Connect to the Virtual machines

In another terminal:

nxc connect

It should open a tmux window with access to the virtual machine.

On Grid'5000

In this section, we present how to deploy Virtual Machines on a Grid'5000 nodes. Note that this is not the common usage of NixOS Compose, and thus the workflow is a bit cumbersome. For instance, make sure that you are not inside a tmux instance started on the frontend of the Grid'5000 site.

Connect to a compute node

oarsub --project lab-2025-compas-nxc -I

Start tmux

tmux

Build the composition

nxc build -f vm

Install vde2

In order to make the different virtual machines communicate, we need a virtual switch. We use vde2, which we will install:

sudo-g5k apt install -y vde2

Start the composition

nxc start

Connect

Create a new tmux pane (⌃ Control + B + %)

nxc connect

It should open a tmux window with access to the virtual machine.

Check the presence of socat

[root@foo:~]# socat
2025/05/29 14:46:51 socat[82883] E exactly 2 addresses required (there are 0); use option "-h" for help

Build your first composition

You can build the composition with the nxc build command

It takes as argument a target platform, that we call flavour.

nxc build -f <FLAVOUR>

There are different flavours that NixOS Compose can build:

  • docker: Generates a docker compose configuration

  • nspawn: Experimental Generates light-weight container runnable with systemd-nspawn.

  • vm: Generates Qemu Virtual Machines

  • g5k-nfs-store: Generates Kernel images and initrd without a packed /nix/store, but mounts the store of the frontend. Also deployed with kexec

  • g5k-image: Generates full system image

In this tutorial we will focus only on g5k-nfs-store, and g5k-image if you have time.

For example, let us build the composition with the g5k-nfs-store flavour:

nxc build -f g5k-nfs-store

Deploying the g5k-nfs-store flavour

Reserve the nodes

Let us reserve 1 machine for an hour on Grid'5000:

export $(oarsub --project lab-2025-compas-nxc -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

You can use oarstat -u to check the status of the reservation

List the reserved machines

NixOS-Compose needs a list of target machines where to deploy the software environments. This list will be written to the file OAR.$OAR_JOB_ID.stdout in the current directory once the machines are available.

cat OAR.$OAR_JOB_ID.stdout

Which should output something like

dahu-2.grenoble.grid5000.fr

If the file does not exist yet, your reservation might not be ready yet. You can check with oarstat -u, looking for the "S" column. "W" means waiting and "R" means ready. In any case, the next command will wait for the creation of this file.

Deploy

nxc start -m OAR.$OAR_JOB_ID.stdout -W

Release the node

oardel $OAR_JOB_ID

Deploying the g5k-image flavour

Warning

This page is outside of the scope of this tutorial: you can skip it.

Reserve the nodes

Let us deploy this composition on 1 physical machine

export $(oarsub --project lab-2025-compas-nxc -t deploy -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script)
1h" | grep OAR_JOB_ID)

You can use oarstat -u to check the status of the reservation

Deploy

nxc start -m OAR.$OAR_JOB_ID.stdout -W

Release the node

oardel $OAR_JOB_ID

Connect to the nodes

Once the deployment over, you can connect to the nodes via the nxc connect command.

nxc connect

It will open a tmux session with a pane per node, making it easy to navigate between the nodes.

You can provide a hostname to the command to connect to a specific host.

nxc connect foo

Check the presence of socat

[root@foo:~]# socat
2025/05/29 14:46:51 socat[82883] E exactly 2 addresses required (there are 0); use option "-h" for help

IOR

IOR is a parallel IO benchmark that can be used to test the performance of parallel storage systems using various interfaces and access patterns. It uses a common parallel I/O abstraction backend and relies on MPI for synchronization.

You can find the documentation here

Adding IOR to the composition

Start from a template

mkdir ior_bench
cd ior_bench
nxc init -t basic

Add IOR to the composition

The IOR benchmark is available in nixpkgs and thus accessible in pkgs. We also need openmpi to run the benchmark.

# composition.nix
{ pkgs, ... }: {
  roles = {
    foo = { pkgs, ... }:
      {
        # add needed package
        environment.systemPackages = with pkgs; [ openmpi ior ];
      };
  };
  testScript = ''
    foo.succeed("true")
  '';
}

Run the benchmark in the environment

As done previously, we have to go through the nxc build and nxc start phases.

Building

nxc build -f g5k-nfs-store

Getting the node

export $(oarsub --project lab-2025-compas-nxc -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Deploying

nxc start -m OAR.$OAR_JOB_ID.stdout -W

Running the benchmark

Once the environment is deployed, we connect with nxc connect.

We can now run the benchmark.

ior

The output should look something like that:

[root@foo:~]# ior
IOR-3.3.0: MPI Coordinated Test of Parallel I/O
Began               : Tue Sep 13 14:08:28 2022
Command line        : ior
Machine             : Linux foo
TestID              : 0
StartTime           : Tue Sep 13 14:08:28 2022
Path                : /root
FS                  : 1.9 GiB   Used FS: 1.3%   Inodes: 0.5 Mi   Used Inodes: 0.1%

Options:
api                 : POSIX
apiVersion          :
test filename       : testFile
access              : single-shared-file
type                : independent
segments            : 1
ordering in a file  : sequential
ordering inter file : no tasks offsets
nodes               : 1
tasks               : 1
clients per node    : 1
repetitions         : 1
xfersize            : 262144 bytes
blocksize           : 1 MiB
aggregate filesize  : 1 MiB

Results:

access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
write     3117.60    13205      0.000076    1024.00    256.00     0.000014   0.000303   0.000004   0.000321   0
read      3459.03    15107      0.000066    1024.00    256.00     0.000022   0.000265   0.000002   0.000289   0
remove    -          -          -           -          -          -          -          -          0.000121   0
Max Write: 3117.60 MiB/sec (3269.04 MB/sec)
Max Read:  3459.03 MiB/sec (3627.06 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev   Max(OPs)   Min(OPs)  Mean(OPs)     StdDev    Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt   blksiz    xsize aggs(MiB)   API RefNum
write        3117.60    3117.60    3117.60       0.00   12470.38   12470.38   12470.38       0.00    0.00032         NA            NA     0      1   1    1   0     0        1         0    0      1  1048576   262144       1.0 POSIX      0
read         3459.03    3459.03    3459.03       0.00   13836.14   13836.14   13836.14       0.00    0.00029         NA            NA     0      1   1    1   0     0        1         0    0      1  1048576   262144       1.0 POSIX      0
Finished            : Tue Sep 13 14:08:28 2022

If the previous command fails, it might be because of the network interface of the Grid'5000 node you deployed on.

You can try to run the ior command in a more explict way:

[root@foo:~]# mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root -np 4 ior

Release the booking

Now we are done, we exit the connection to the node and release this reservation.

oardel $OAR_JOB_ID

Add a PFS to the composition

For the moment the tests are only performed on the local file system of one computer.

Let us to setup a parallel filesystem to evaluate its performances. For this tutorial, it will be GlusterFS.

The approach will be the following:

  1. add a new role server to the composition

  2. setup the PFS server on the server role

  3. mount the PFS export on the compute nodes

Add a role to the composition

To add another role to the composition, we only need to add a new element to the set roles. Let us also rename the compute node to node and empty the testScript:

{ pkgs, ... }: {
  roles = {
    node = { pkgs, ... }:
      {
        # add needed package
        environment.systemPackages = with pkgs; [ openmpi ior ];
      };
    server = { pkgs, ... }:
      {
        # ...
      };
  };
  testScript = ''
  '';
}

Setting up the GlusterFS server

To set up the GlusterFS server, we need to configure several things:

  1. open ports for the clients to connect: we will actually disable the entire firewall for simplicity stakes, but we could be more precise in the ports we open

  2. Enable the systemd service for the GlusterFS server

  3. Define the export point

server = { pkgs, ... }: {
  # Disable the firewall
  networking.firewall.enable = false;

  # Enable the glusterfs server services
  services.glusterfs.enable = true;

  # Define a partition at /srv that will host the blocks
  fileSystems = {
    "/srv" = {
      device = "/dev/disk/by-partlabel/KDPL_TMP_disk0";
      fsType = "ext4";
    };
  };

  # we also add the htop package for light monitoring
  environment.systemPackages = with pkgs; [ htop ];
};

Note that the KDPL_TMP_disk0 label is only valid on Grid'5000.

Mount the GlusterFS server on the compute nodes

We now need to make the compute nodes mount the PFS server.

To do this, we will also disable the firewall and create a new mounting point (/data in our case):

node = { pkgs, ... }:
{
  # add needed package
  environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];

  # Disable the firewall
  networking.firewall.enable = false;

  # Mount the PFS
  fileSystems."/data" = {
    device = "server:/gv0";
    fsType = "glusterfs";
  };
};

The gv0 represents the GlusterFS volume (not yet created!).

Test the GlusterFS server

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

This time two nodes.

export $(oarsub --project lab-2025-compas-nxc -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Starting the nodes

nxc start -m OAR.$OAR_JOB_ID.stdout -W

Connect

nxc connect

You should now have a tmux session with 2 panes: one for the server and one for the node.

Setting up the GlusterFS volume

We need to create the GlusterFS volume and start it.

From the server node, run the following commands:

mkdir -p /srv/gv0
gluster volume create gv0 server:/srv/gv0
gluster volume start gv0

Mounting the volume from the compute node

As we already define the mount in the composition, we can simply restart the systemd service:

systemctl restart data.mount

Testing the GlusterFS server

Go on the node and change directory to the mounted volume:

cd /data

A quick ls shows that this is empty.

Create a file in it:

touch glusterfs_works

Now go on the server node and check the directory of the GlusterFS server.

ls /srv/gv0

If the glusterfs_works file exists, everything worked fine!

Release the nodes

oardel $OAR_JOB_ID

Creating a service

Currently, the creation of the GlusterFS volume is manual, but we would like to do it automatically at boot time.

To do this, we will create a systemd service and use it in the composition.

In NixOS, a service (or module) is composed of two parts: the interface and the implementation.

In the previous sections, you already interacted with services ! For example:

  ...
  services.glusterfs.enable = true;
  ...

Creating the module

Let's create a new file to store the content of the service:

# my-module.nix
{ config, lib, pkgs, ... }:

with lib;
let
  cfg = config.services.my-glusterfs;
in
{
  ################################################
  #
  # Interface
  #
  options = {
    services.my-glusterfs = {
      enable = mkEnableOption "My glusterfs";

      package = mkOption {
        type = types.package;
        default = pkgs.glusterfs;
      };

      volumePath = mkOption {
        type = types.str;
        default = "/srv";
      };

      volumeName = mkOption {
        type = types.str;
        default = "gv0";
      };
    };
  };

  ################################################
  #
  # Implementation
  #
  config = mkIf (cfg.enable) {
    systemd.services.my-glusterfs = {
      description = "My GlusterFS module";
      wantedBy = [ "multi-user.target" ];
      after = [ "glusterd.service" "glustereventsd.service" ];
      serviceConfig.Type = "oneshot";
      script =
        ''
        if [ ! $(${cfg.package}/bin/gluster volume list | grep ${cfg.volumeName}) ]
        then
          mkdir -p ${cfg.volumePath}/${cfg.volumeName}
          ${cfg.package}/bin/gluster volume create ${cfg.volumeName} server:${cfg.volumePath}/${cfg.volumeName}
          ${cfg.package}/bin/gluster volume start ${cfg.volumeName}
        fi
        '';
    };
  };
}

Ok, let's decypher what all of this means.

  • we created a service called my-glusterfs

  • the services has 4 options:

    • enable : to enable or not the service

    • package : the Nix package containing the glusterfs binaries

    • volumePath : the path on the server where the volume will be created

    • volumeName : the name of the volume

Then, in the implementation part, we explicit:

  • that this service is wanted by the multi-user service

  • that this service needs to be executed after that the GlusterFS deamon has been started (glusterd.service and glustereventsd.service)

  • that this service must only be ran once (oneshot)

  • and finally, the commands to run. Here the commands are the same as the ones seen in the previous section, but we are using the configuration of the service: cfg.VolumePath, cfg.VolumeName.

Call this service

Let's now use this service in the composition.

server = { pkgs, ... }: {
  # We import the definition of the service
  imports = [ ./my-module.nix ];

  services.my-glusterfs = {
    enable = true;        # We activate our service
    volumePath = "/srv"; # We define where the volume will be
    volumeName = "gv0";   # and the name of the volume
  };

  networking.firewall.enable = false;
  services.glusterfs.enable = true;
  fileSystems = {
    "/srv" = {
      device = "/dev/disk/by-partlabel/KDPL_TMP_disk0";
      fsType = "ext4";
    };
  };
  environment.systemPackages = with pkgs; [ htop ];
};

Now, everything the server boots, it will create the volume and start it so that it should be available for the nodes to mount.

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

This time two nodes.

export $(oarsub --project lab-2025-compas-nxc -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Starting the nodes

nxc start -m OAR.$OAR_JOB_ID.stdout -W

Connect

nxc connect

Re-mount the volumes on the node

On the node:

systemctl restart data.mount

You can now use the volume from the node !

Add another compute node

For the moment we have a single compute node. In this section, we will add another one and run IOR on several nodes.

Add another role in the composition

We will rename the role node into node1 and create a new role node2 with the exact same configuration:


roles = {

  node1 = { pkgs, ... }:
  {
    # add needed package
    environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];

    # Disable the firewall
    networking.firewall.enable = false;

    # Mount the PFS
    fileSystems."/data" = {
      device = "server:/gv0";
      fsType = "glusterfs";
    };
  };

  node2 = { pkgs, ... }:
  {
    # add needed package
    environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];

    # Disable the firewall
    networking.firewall.enable = false;

    # Mount the PFS
    fileSystems."/data" = {
      device = "server:/gv0";
      fsType = "glusterfs";
    };
  };

  server = { pkgs, ... }:
  { # ... };
}

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

export $(oarsub --project lab-2025-compas-nxc -l nodes=3,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Deploying

nxc start -m OAR.$OAR_JOB_ID.stdout -W

Connect to the nodes

nxc connect

Remount the volume from the nodes (run this command once from any of the node):

cat /etc/hosts | grep node | cut -f2 -d" " | xargs -t -I{} systemctl --host root@{} restart data.mount

After building and starting the environment, we now have 3 nodes: node1, node2 and the server.

We can now try to run IOR with MPI from the nodes and writing on the PFS (/data).

All the deployed machines already know each others (you can look at /etc/hosts to verify). So we will create the MPI hostfile in myhosts:

cd /data
printf "node1 slots=8\nnode2 slots=8" > myhosts

The /data/myhosts file should look like:

node1 slots=8
node2 slots=8

Now, from any node (node1 or node2), we can run start the benchmark (without the high performance network of Grid'5000) with:

cd /data
mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root --hostfile myhosts -np 16 ior

Release the nodes

oardel $OAR_JOB_ID

Generalization of the composition

As you can see from the previous section, the scaling of the number of computing node is a bit cumbersome.

Fortunately, NixOS Compose provides the notion of role to tackle this issue.

A role is a configuration. In our case, we actually have only two roles: the NFS server and the compute nodes. The configuration of the compute nodes is the same no matter how many compute nodes. Thus having to define the configuration for node1 and node2 is redundant.

roles = {

  node = { pkgs, ... }:
  {
    # add needed package
    environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];

    # Disable the firewall
    networking.firewall.enable = false;

    # Mount the PFS
    fileSystems."/data" = {
      device = "server:/gv0";
      fsType = "glusterfs";
    };
  };

  server = { pkgs, ... }:
  { # ... };
}

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

export $(oarsub --project lab-2025-compas-nxc -l nodes=3,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Starting the nodes

The nxc start command can take an additional yaml file as input describing the number of machines per role, as well as their hostnames.

The following yaml file will create 3 machines: server (one instance per default), two node instances (node1 and node2).

# nodes.yaml
node: 2

You can deploy the composition by passing this yaml file to the nxc start command:

nxc start -m OAR.$OAR_JOB_ID.stdout -W nodes.yaml

Connect to the nodes

nxc connect

Remount the volume (to run once on any of the nodes):

cat /etc/hosts | grep node | cut -f2 -d" " | xargs -t -I{} systemctl --host root@{} restart data.mount

Release the nodes

oardel $OAR_JOB_ID

Adding a configuration file

IOR allows to take a configuration file as input. In this section we will integrate this configuration file into the configuration.

A IOR configuration file looks something like this:

IOR START
    api=POSIX
    testFile=testFile
    hintsFileName=hintsFile
    multiFile=0
    interTestDelay=5
    readFile=1
    writeFile=1
    filePerProc=0
    checkWrite=0
    checkRead=0
    keepFile=1
    quitOnError=0
    outlierThreshold=0
    setAlignment=1
    singleXferAttempt=0
    individualDataSets=0
    verbose=0
    collective=0
    preallocate=0
    useFileView=0
    keepFileWithError=0
    setTimeStampSignature=0
    useSharedFilePointer=0
    useStridedDatatype=0
    uniqueDir=0
    fsync=0
    storeFileOffset=0
    maxTimeDuration=60
    deadlineForStonewalling=0
    useExistingTestFile=0
    useO_DIRECT=0
    showHints=0

    repetitions=3
    numTasks=16
    segmentCount=16
    blockSize=4k
    transferSize=1k

    summaryFile=/tmp/results_ior.json
    summaryFormat=JSON
RUN
IOR STOP

It gathers the information about the experiment.

It can then be run as:

ior -f <IOR_CONFIG_FILE>

We will put this configuration file in the composition in order to make the experiment reproducible. Let us store this file locally in the file script.ior.

We can then create a file in /etc/ that will point to the content of the file in the nix store. In the following configuration we will write the file at /etc/ior_script:

# ...
  node = { pkgs, ... }: {
    networking.firewall.enable = false;

    environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];

    environment.etc = {
      ior_script = {
        text = builtins.readFile ./script.ior;
      };
    };

    # Mount the PFS
    fileSystems."/data" = {
      device = "server:/gv0";
      fsType = "glusterfs";
    };
  };
# ...

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

export $(oarsub --project lab-2025-compas-nxc -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Starting the nodes

nxc start -m OAR.$OAR_JOB_ID.stdout -W

Connecting

nxc connect

Running the benchmark with the script

Remount the volume (to run once on any of the nodes):

cat /etc/hosts | grep node | cut -f2 -d" " | xargs -t -I{} systemctl --host root@{} restart data.mount

From the node:

cd /data
ior -f /etc/ior_script

Release the nodes

oardel $OAR_JOB_ID

Adding scripts to the environment

The command to start the benchmark is quite obscure and cumbersome to type. We would like to create a script to wrap it. However, creating a full Nix package for such a small script is not worth it. Fortunately, Nix provides ways to create reproducible bash (or others) script easily.

Let us create a Nix file called myscripts.nix. It will be a function that takes pkgs as input and return the set of our scripts.

{ pkgs, ... }:
let
  # We define some constants here
  nfsMountPoint = "/data";
  nbProcs = 16;
  iorConfig = "/etc/ior_script";
in {
  # This function takes the number of compute nodes,
  # creates a hostfile for MPI and runs the benchmark
  start_ior =
    pkgs.writeScriptBin "start_ior" ''
      cd ${nfsMountPoint}

      NB_NODES=$(cat /etc/hosts | grep node | wc -l)
      NB_SLOTS_PER_NODE=$((${builtins.toString nbProcs} / $NB_NODES))

      cat /etc/hosts | grep node | awk -v nb_slots="$NB_SLOTS_PER_NODE" '{ print $2 " slots=" nb_slots;}' > my_hosts

      mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root -np ${builtins.toString nbProcs} --hostfile my_hosts ior -f ${iorConfig}
    '';
   remount_glusterfs =
    pkgs.writeScriptBin "remount_glusterfs" ''
        cat /etc/hosts | grep node | cut -f2 -d" " | xargs -t -I{} systemctl --host root@{} restart data.mount
    '';

}

We can now import these scripts in the composition:

# ...
  node = { pkgs, ... }:
  let
    scripts = import ./myscripts.nix { inherit pkgs; };
  in
  {
    networking.firewall.enable = false;

    environment.systemPackages = with pkgs; [ openmpi ior glusterfs scripts.start_ior scripts.remount_glusterfs ];

    environment.etc = {
      ior_script = {
        text = builtins.readFile ./script.ior;
      };
    };

    # Mount the PFS
    fileSystems."/data" = {
      device = "server:/gv0";
      fsType = "glusterfs";
    };
  };
# ...

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

export $(oarsub --project lab-2025-compas-nxc -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Starting the nodes

nxc start -m OAR.$OAR_JOB_ID.stdout -W

Connecting

nxc connect

Running the script

Remount the volumes on the nodes (to run once on a single node):

remount_glusterfs

We can now simply run start_ior from the node to run the benchmark.

start_ior

Release the nodes

oardel $OAR_JOB_ID

Integration with an Experiment Engine

For now we can deploy a reproducible distributed environment, connect to the nodes, and run commands. What we would like to do now is to automatize the execution of the commands in an experiment script that we can easily rerun.

Fortunately, NixOS-Compose provides an integration with Execo, a experiment engine for Grid'5000. Execo is a Python library that abstract the usual operations on Grid'5000 (submitting jobs, deploying, executing commands, etc.).

Let's see how to use Execo and NixOS-Compose to run reproducible experiments

Starting Point

The snippet below represents a good starting point for an Execo script with NixOS-Compose.

# script.py
from nixos_compose.nxc_execo import get_oar_job_nodes_nxc
from nixos_compose.g5k import key_sleep_script
import os

from execo import Remote
from execo_engine import Engine, logger, ParamSweeper, sweep
from execo_g5k import oardel, oarsub, OarSubmission, wait_oar_job_start

class NXCEngine(Engine):
    def __init__(self):
        super(NXCEngine, self).__init__()
        parser = self.args_parser
        parser.add_argument('--nxc_build_file', help='Path to the NXC build file')
        parser.add_argument('--flavour', help='Flavour to deploy')
        self.nodes = {}
        self.oar_job_id = -1
        # --- Where and how many nodes ----
        self.nb_nodes = 2
        self.site = "grenoble"
        self.cluster = "dahu"

    def init(self):
        # --- Reservation ----
        duration = 15 * 60 #seconds
        if self.args.flavour == "g5k-image":
            self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", duration, job_type=["deploy"], project="lab-2025-compas-nxc"), self.site)])[0]
        else:
            self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", duration, job_type=[], project="lab-2025-compas-nxc", command=f"{key_sleep_script} {duration}"), self.site)])[0]
        wait_oar_job_start(self.oar_job_id, site) # wait for the job to start, otherwise we might get a timeout in the `get_oar_job_nodes_nxc`

        # --- How many nodes per role ---
        roles_quantities = {"server": ["server"], "node": ["node"]}

        # --- Deploy and populate the dict `self.nodes` accordingly ---
        self.nodes, self.roles = get_oar_job_nodes_nxc(
            self.oar_job_id,
            site,
            flavour_name=self.args.flavour,
            compose_info_file=os.environ['HOME'] + "/.local/share/nix/root" + os.readlink(self.args.nxc_build_file),
            roles_quantities=roles_quantities)

    def run(self):
        my_command = "echo \"Hello from $(whoami) at $(hostname) ($(ip -4 addr | grep \"/20\" | awk '{print $2;}'))\" > /tmp/hello"
        hello_remote = Remote(my_command, self.roles["server"], connection_params={'user': 'root'})
        hello_remote.run()

        my_command2 = "cat /tmp/hello"
        cat_remote = Remote(my_command2, self.roles["server"], connection_params={'user': 'root'})
        cat_remote.run()
        for process in cat_remote.processes:
            print(process.stdout)

        # --- Giving back the resources ---
        oardel([(self.oar_job_id, self.site)])


if __name__ == "__main__":
    NXCEngine().start()

Make sure you are in an environment with NixOS-Compose available.

You can then run python3 script.py --help.

The script takes two arguments:

  • nxc_build_file which is the path to the result of nxc build. Most probably it will be under build/composition::FLAVOUR.json

  • and the flavour. On Grid'5000 it can be g5k-nfs-store, g5k-ramdisk, or g5k-image

Let's try to run the script for the g5k-nfs-store flavour:

python3 script.py --nxc_build_file $(pwd)/build/composition::g5k-nfs-store --flavour g5k-nfs-store

You should see the logs from Execo telling you that it is doing the reservation to OAR, and starting deploying. When the deployment is finished, you can see that the commands that we ran in the run function of script.py are being executed.

Run a real experiment

The code about is just to show the basics of Execo. In this section, we will run a more realistic experiment calling the start_ior command that we packaged in a previous section.

# script.py
from nixos_compose.nxc_execo import get_oar_job_nodes_nxc
from nixos_compose.g5k import key_sleep_script
import os
import time

from execo import Remote
from execo_engine import Engine, logger, ParamSweeper, sweep
from execo_g5k import oardel, oarsub, OarSubmission, wait_oar_job_start

class NXCEngine(Engine):
    def __init__(self):
        super(NXCEngine, self).__init__()
        parser = self.args_parser
        parser.add_argument('--nxc_build_file', help='Path to the NXC build file')
        parser.add_argument('--flavour', help='Flavour to deploy')
        parser.add_argument('--nb_nodes', help='Number of nodes')
        parser.add_argument('--result_file', help='path to store the results')
        self.nodes = {}
        self.oar_job_id = -1
        # --- Where and how many nodes ----
        self.site = "grenoble"
        self.cluster = "dahu"

    def init(self):
        self.nb_nodes = int(self.args.nb_nodes)
        assert self.nb_nodes >= 2, "I need at least two nodes"

        # --- Reservation ----
        duration = 15 * 60 #seconds
        if self.args.flavour == "g5k-image":
            self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", duration, job_type=["deploy"], project="lab-2025-compas-nxc"), self.site)])[0]
        else:
	        self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", duration, job_type=[], project="lab-2025-compas-nxc", command=f"{key_sleep_script} {duration}"), self.site)])[0]
        wait_oar_job_start(self.oar_job_id, site) # wait for the job to start, otherwise we might get a timeout in the `get_oar_job_nodes_nxc`

        # --- How many nodes per role ---
        # We want one server and all the other nodes are `node`
        roles_quantities = {"server": ["server"], "node": [f"node{i}" for i in range(1, self.nb_nodes)]}

        # --- Deploy and populate the dict `self.nodes` accordingly ---
        self.nodes, self.roles = get_oar_job_nodes_nxc(
            self.oar_job_id,
            site,
            flavour_name=self.args.flavour,
            compose_info_file=os.environ['HOME'] + "/.local/share/nix/root" + os.readlink(self.args.nxc_build_file),
            roles_quantities=roles_quantities)

    def run(self):
        result_file = self.args.result_file
        time.sleep(10)
        remount_volume_remote = Remote("remount_glusterfs", self.roles["node"][0], connection_params={'user': 'root'})
        remount_volume_remote.run()

        run_ior_remote = Remote("start_ior", self.roles["node"][0], connection_params={'user': 'root'})
        run_ior_remote.run()
        get_file_remote = Remote(f"cp /tmp/results_ior.json {result_file}", self.roles["node"][0], connection_params={'user': 'root'})
        get_file_remote.run()

        oardel([(self.oar_job_id, self.site)])

if __name__ == "__main__":
    NXCEngine().start()

The previous script can be ran with:

python3 script.py --nxc_build_file $(pwd)/build/composition::g5k-nfs-store --flavour g5k-nfs-store --nb_nodes 2 --result_file $(pwd)/ior_results.json

Deploying OAR

NixOS Compose can deploy complex distributed system, such as the a cluster managed by the OAR batch scheduler.

Clone the composition

git clone --depth=1 git@github.com:oar-team/oar-nixos-compose.git

Variants

This repository contains several variants of the composition deploying OAR.

Let's use the master one:

cd master/

Inspect the composition

Take a look at the composition.nix file. You can see that there are 3 roles:

  • frontend: this role is reponsible to receive the job requests from the users (the oarsub commands). This is the same as the frontend nodes on Grid'5000 (e.g., fgrenoble, fnancy).

  • server: this role hosts the "brain" of OAR. It is responsible to schedule the jobs.

  • node: this role represents the compute nodes of the cluster. Notice that the roleDistribution attribute: it can generate several instances of a role (this is similar to the nodes.yaml file seen in 'Generalization of the composition', but more static).

Build the composition

nxc build -f g5k-nfs-store

Start the composition

Reserve the nodes

We thus need to reserve 4 nodes (1 frontend, 1 server and 2 node):

export $(oarsub --project lab-2025-compas-nxc -l nodes=4,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Deploy the composition

nxc start -m OAR.$OAR_JOB_ID.stdout -W

Connect to the frontend

nxc connect frontend

Submiting a OAR job in our deployed OAR

[root@frontend:~]# su user1
[user1@frontend:/root]# cd ~
[user1@frontend:/root]# oarsub -I

You should then be connected on one of the compute node (node1 or node2) with the OAR_JOB_ID=1.

Release the nodes

When you are done, do not forget to release the nodes from the Grid'5000 frontend:

oardel $OAR_JOB_ID

Parametrized Builds

We often want to have variation in the environment. In our example, it could be the configuration of the NFS server.

One solution could be to have several composition.nix files for each variant, but there is a lot of redundancy.

Instead, we will use the notion of setup of NixOS Compose to parametrize the composition.

Add the setup to the flake.nix

In the flake.nix file, add the following:

# ...
  outputs = { self, nixpkgs, nxc }:
    let
      system = "x86_64-linux";
    in {
      packages.${system} = nxc.lib.compose {
        inherit nixpkgs system;
        composition = ./composition.nix;

        # Defines the setup
        setup = ./setup.toml;

      };

      defaultPackage.${system} =
        self.packages.${system}."composition::nixos-test";

      devShell.${system} = nxc.devShells.${system}.nxcShellFull;
    };
}

Create the setup.toml file

Create a setup.toml file and add the following content:

# setup.toml

[project]

[params]
nfsNbProcs=8

Use the paramaters

The above setup.toml defines a paramater called nfsNbProcs. The value of this paramater can be used in the composition. The value of the paramater is available at setup.param.<VARIABLE_NAME>.

# composition.nix

# Add `setup` in the arguments of the composition
{ pkgs, setup, ... }: {
  # ....
  server = { pkgs, ... }: {
    networking.firewall.enable = false;
    services.nfs.server.enable = true;
    services.nfs.server.exports = ''
      /srv/shared *(rw,no_subtree_check,fsid=0,no_root_squash)
    '';
    services.nfs.server.createMountPoints = true;

    # Use the value of the paramater
    services.nfs.server.nproc = setup.params.nfsNbProcs;

    environment.systemPackages = with pkgs; [ htop ];
  };
}

Introduce variations

For the moment, we only have a single value for the parameter. But we want to have variation. In the current situation, we have a NFS server with 8 processes managing the requests. We want to see how this number of processes influences the performances of the IOR benchmark. We will create two variants:

  • fast: with more processes for the NFS server (let us say 32)
  • slow: with fewer processes (let us say 2)

We can add the variants in the setup.toml file as follows:

# setup.toml

[project]

[params]
nfsNbProcs=8

[slow.params]
# to select this variant use:
# nxc build --setup slow
nfsNbProcs=2

[fast.params]
# to select this variant use:
# nxc build --setup fast
nfsNbProcs=32

When we build the composition we can pick the variant as:

nxc build --setup fast
# or
nxc build -s slow

If there is no --setup or -s flag, NixOS Compose will take the default values of the parameters.

Release the nodes

Once we are done with the experiment, we have to give back the resources.

List all your running jobs

oarstat -u
Job id     Name           User           Submission Date     S Queue
---------- -------------- -------------- ------------------- - ----------
2155685                   qguilloteau    2022-09-23 10:50:36 R default

Delete the job

oardel 2155685
Deleting the job = 2155685 ...REGISTERED.
The job(s) [ 2155685 ] will be deleted in the near future.

The one-liner

You can delete all of your jobs with this one command:

oarstat -u -J | jq --raw-output 'keys | .[]' | xargs -I {} oardel {}

Deploy the environment

To deploy the built flavour use the start subcommand of nxc. By default, it uses the last built environment from any flavour, but you can specify the flavour with the -f flag.

nxc start -f <FLAVOUR>

For the flavours on Grid'5000, we first need to reserve some resources before deploying. The next sections give more details.

Packaging the MADbench2 benchmark

Warning

This page is outside of the scope of this tutorial: you can skip it.

TODO: create a repo with a make make install setup for MB2

Nix Expression

{ stdenv, openmpi }:

stdenv.mkDerivation {
  name = "MADbench2";
  src = ./.;
  buildInputs = [ openmpi ];
  installPhase = ''
    mkdir -p $out/bin
    mpicc -D SYSTEM -D COLUMBIA -D IO -o MADbench2.x MADbench2.c -lm
    mv MADbench2.x $out/bin
  '';
}

Questions:

  • do we make them add to a flake this and import it later?

Try to run the benchmark locally

First, enter a nix shell with the packaged MADbench2 and openmpi.

nix shell .#MADbench2 nixpkgs#openmpi

Once in the shell, you can run the benchmark with:

mpirun -np 4 MADbench2.x 640 8 1 8 8 1 1

It should output the following:

MADbench 2.0 IO-mode
no_pe = 4  no_pix = 640  no_bin = 8  no_gang = 1  sblocksize = 8  fblocksize = 8  r_mod = 1  w_mod = 1
IOMETHOD = POSIX  IOMODE = SYNC  FILETYPE = UNIQUE  REMAP = CUSTOM

S_cc         0.00   [      0.00:      0.00]
S_bw         0.00   [      0.00:      0.00]
S_w          0.01   [      0.01:      0.01]
          -------
S_total      0.01   [      0.01:      0.01]

W_cc         0.00   [      0.00:      0.00]
W_bw         0.12   [      0.12:      0.12]
W_r          0.00   [      0.00:      0.00]
W_w          0.00   [      0.00:      0.00]
          -------
W_total      0.13   [      0.13:      0.13]

C_cc         0.00   [      0.00:      0.00]
C_bw         0.00   [      0.00:      0.00]
C_r          0.00   [      0.00:      0.00]
          -------
C_total      0.00   [      0.00:      0.00]


dC[0] = 0.00000e+00

What interest us in this tutorial is the total time spent writing: W_total.

Create your first composition

Before jumping into the creation of an environment for the IOR benchmark, let us go through a simpler example.

In the next sections, we will present the notions and commands of NixOS Compose.

Making MADbench2 available

Warning

This page is outside of the scope of this tutorial: you can skip it.

Option 1: You have a flake reprository with MADbench2 packaged

In the flake.nix file, add your reprository as an input:

{
  description = "nixos-compose - basic setup";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/21.05";
    nxc.url = "git+https://gitlab.inria.fr/nixos-compose/nixos-compose.git";
    mypkgs.url = "URL TO YOUR PKGS";
  };
  outputs = { self, nixpkgs, nxc, mypkgs }:
  {
    # ...
  };
}

Then create an overlay with adding the benchmark:

{
  # ...

  outputs = { self, nixpkgs, nxc, mypkgs }:
    let
      system = "x86_64-linux";
      myOverlay = final: prev:
        {
          mb2 = mypkgs.packages.${system}.MADbench2;
        };
    in {
      packages.${system} = nxc.lib.compose {
        inherit nixpkgs system;
        overlays = [ myOverlay ];
        composition = ./composition.nix;
        # setup = ./setup.toml;
      };

      # ...
};

Option 2: You have a local file packaging MADbench2

In this case, you do not have to add another input, and you can simply call the Nix file packaging MADbench2 (MADbench2.nix in the example below).

Similarly, we create an overlay with the package to add.

{
  description = "nixos-compose - basic setup";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/22.05";
    nxc.url = "git+https://gitlab.inria.fr/nixos-compose/nixos-compose.git";
  };

  outputs = { self, nixpkgs, nxc }:
    let
      system = "x86_64-linux";
      myOverlay = final: prev:
        {
          mb2 = prev.callPackage ./MADbench2 { };
        };
    in {
      packages.${system} = nxc.lib.compose {
        inherit nixpkgs system;
        overlays = [ myOverlay ];
        composition = ./composition.nix;
        setup = ./setup.toml;
      };

      defaultPackage.${system} =
        self.packages.${system}."composition::nixos-test";

      devShell.${system} = nxc.devShells.${system}.nxcShellFull;
    };
}

Adding MADbench2 to the composition

Warning

This page is outside of the scope of this tutorial: you can skip it.

Once MADbench2 has been added to the flake.nix file, we can access the package inside the composition. In our case, we want an environment with both MADbench2 and openmpi to run it.

{ pkgs, ... }: {
  roles = {
    foo = { pkgs, ... }:
      {
        # add needed package
        environment.systemPackages = with pkgs; [ openmpi MADbench2 ];
      };
  };
  testScript = ''
    foo.succeed("true")
  '';
}

Run the benchmark in the environment

Warning

This page is outside of the scope of this tutorial: you can skip it.

As done previously, we have to go through the nxc build and nxc start phases.

Once the environment deployed, we connect with nxc connect

We can now try to run the benchmark.

mpirun -np 4 MADbench2.x 640 8 1 8 8 1 1

Unfortunalety this will fail for several reasons:

  • we are running as root

  • there are not enough slots to start 4 processes

As we are setting up the environment for now, we can disable these warnings by adding some flags:

mpirun --allow-run-as-root --oversubscribe -np 4 MADbench2.x 640 8 1 8 8 1 1

Add a builder on Grid'5000

Warning

This page is unused for this tutorial: you can skip it.

If you are doing this tutorial with a member of the NixOS Compose team, there is probably a builder machine on Grid'5000 to accelerate the builds. Using it requires a bit a configuration.

Ask for the <BUILDER> address. It should look like dahu-30.

Add your ssh key

You need to get access to the builder machine through ssh. To do this we will copy your ssh key to the builder.

ssh-copy-id -i ~/.ssh/id_rsa.pub root@<BUILDER>

The password is nixos.

You should be able to log to the <BUILDER> via ssh root@<BUILDER>.

Copy the Nix Config

We need to tell Nix where to find this builder and how to use it. The configuration file should be under ~/.config/nix/nix.conf.

# ~/.config/nix/nix.conf
experimental-features = nix-command flakes
builders = ssh://root@<BUILDER>
cores = 0
extra-sandbox-paths =
max-jobs = 0
require-sigs = false
sandbox = false
sandbox-fallback = false
substituters = http://<BUILDER>:8080 ssh-ng://root@<BUILDER> https://cache.nixos.org/
builders-use-substitutes = true
trusted-public-keys = <BUILDER>:snBDi/dGJICacgRUw4nauQ8KkSksAAAhCvPVr9OGTwk=
system-features = nixos-test benchmark big-parallel kvm
allowed-users = *
trusted-users = root

Copy paste the following configuration in the nix configuration file. Don't forget to replace <BUILDER> with the actual address of the builder!.

You can do it with one command. We take here the example where the builder in on node dahu-30.

sed -i 's/<BUILDER>/dahu-30/g' ~/.config/nix/nix.conf

Note that this configuration will not work the next time you log to Grid'5000 as the builder node would have been released. You can revert back to:

# ~/.config/nix/nix.conf
experimental-features = nix-command flakes
max-jobs = 64