
Introduction
This tutorial presents NixOS Compose. This tool generates and deploy fully reproducible system environments. It relies on the Nix package manager and the associated Linux distribution NixOS.
You can find the associated publication here. And some partial documentation here.
First, we will set up the connection to the Grid'5000 machines. Then we will present NixOS Compose on a simple example to get you acquainted with the notions and commands. Finally, we will create step by step a reproducible environment for a distributed experiment.
Note: Beside some SSH configuration, all of the commands will be executed on the Grid'5000 platform.
Grid'5000
Grid'5000 (a.k.a. g5k) is a french testbed for distributed experiments.
This tutorial relies on this platform to perform deployments. We will thus create accounts for you.
Hopefully, by the time you read these lines, your account should have already been created.
You should receive an email like:
Subject: [Grid5000-account] Your Grid5000 account was created by ...
Dear Firstname Lastname (username),
You receive this email because your manager (...) requested a Grid5000
account for you in the context of a tutorial. To get more information about
Grid5000, see the website: http://www.grid5000.fr.
Your login on the Grid5000 platform is: username.
The next two steps for you are now to:
1/ Finish setting up your access to the platform by creating a password and an SSH key.
To do so, open the following URL:
https://public-api.grid5000.fr/stable/users/setup_password?password_token=XXXXXXXXXXXXXXXXXXXXXX#special.
2/ Read carefully the two following pages:
The Grid5000 getting started documentation (https://www.grid5000.fr/w/Getting_Started),
which gives important information on how to use the platform.
The Grid5000 usage policy (https://www.grid5000.fr/w/Grid5000:UsagePolicy),
which gives the rules that MUST be followed when using the platform. Note that any
abuse will automatically be detected and reported to your manager.
Follow the steps in the email before continuing.
Connect to Grid'5000
Add these lines to your ~/.ssh/config
file and replace G5K_USERNAME
by your username:
Host g5k
User G5K_USERNAME
Hostname access.grid5000.fr
ForwardAgent no
Host *.g5k
User G5K_USERNAME
ProxyCommand ssh g5k -W "$(basename %h .g5k):%p"
ForwardAgent no
You should be able to access the different sites of Grid'5000:
ssh grenoble.g5k
We can, for example, use the Grenoble site for this tutorial.
Tips
SSH connections can be broken, which is quite annoying when doing experiments.
We thus recommend to use tmux
to deal with it.
First job reservation
You can try to make a job reservation. Grid'5000 uses OAR as a resources job manager.
oarsub -I --project lab-2025-compas-nxc
-
oarsub
is the command to make a submission -
-I
means that this is an interactive job -
--project lab-2025-compas-nxc
is the accounting project linked for this tutorial
This command should give you access to a node that is not the frontend (fgrenoble
), but a node from the dahu
cluster (dahu-X
).
You can execute some commands there.
And once you are done, just exit the shell, and you will be back on the frontend.
As this is an interactive job, the job will be killed as soon as you exit the shell.
tmux
We recommend using tmux
on Grid'5000 as the connection between your laptop and Grid'5000 could break, and you could lose access to your work.
A cheat-sheet of tmux
is available here: https://tmuxcheatsheet.com/.
Installing NixOS Compose
The python API
We can install NixOS Compose via pip
:
pip install nixos-compose
You might need to modify your $PATH
:
echo "export PATH=$PATH:~/.local/bin" >> ~/.bash_profile
source ~/.bash_profile
The nix package manager
NixOS-Compose unsurprisingly needs Nix. Let us install it locally:
nxc helper install-nix
This will put the nix store in ~/.local/share
Set up a preloaded store
To accelerate a bit the tutorial, we will preload the store to avoid having to download a lot of packages:
curl -sL https://gitlab.inria.fr/nixos-compose/tuto-nxc/-/raw/main/misc/import-base-store.sh | bash
Create your first composition
Before jumping into the creation of an environment for the IOR benchmark, let us go through a simpler example.
In the next sections, we will present the notions and commands of NixOS Compose.
Start from a template
NXC proposes several templates, which are good starting points.
Let us use the basic
one.
mkdir tuto
cd tuto
nxc init -t basic
The previous command created 3 files in the tuto
folder:
-
nxc.json
: a JSON file required by NXC. You never have to modify it. -
flake.nix
: the Nix file responsible to lock all the inputs. -
composition.nix
: the Nix file representing the distributed environment.
Inspect the composition.nix
file
If you open the composition.nix
file, you will find the following:
{ pkgs, ... }: {
roles = {
foo = { pkgs, ... }:
{
# add needed package
# environment.systemPackages = with pkgs; [ socat ];
};
};
testScript = ''
foo.succeed("true")
'';
}
The composition is a function that takes a set as input ({ pkgs, ... }
) and returns a set containing:
-
a
testScript
string -
a
roles
set of NixOS configurations
What interest us for the moment is the roles
set.
In the example above, we define a single role named foo
with an empty configuration.
We can add packages to the environment by uncommenting the environment.systemPackages
line:
{ pkgs, ... }: {
roles = {
foo = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ socat ];
};
};
testScript = ''
foo.succeed("true")
'';
}
Local deployment
In the NixOS Compose workflow, the idea is to first iterate with lightweight flavours such as docker
or vm
.
Before deploying at full scale to Grid'5000, let's try to deploy the environment locally on a single machine.
If you have a Linux machine you can try to run NixOS Compose on your machine, otherwise, you can also use Grid'5000.
Note: all the commands are to be executed from the root directory.
On your local Linux machine
Installing NixOS Compose
Please refer to the previous section to install NixOS Compose and Nix on your machine.
Enter the Nix environment
nix develop
Build the composition
nxc build -f vm
Start the composition
nxc start
Connect to the Virtual machines
In another terminal:
nxc connect
It should open a tmux
window with access to the virtual machine.
On Grid'5000
In this section, we present how to deploy Virtual Machines on a Grid'5000 nodes.
Note that this is not the common usage of NixOS Compose, and thus the workflow is a bit cumbersome.
For instance, make sure that you are not inside a tmux
instance started on the frontend of the Grid'5000 site.
Connect to a compute node
oarsub --project lab-2025-compas-nxc -I
Start tmux
tmux
Build the composition
nxc build -f vm
Install vde2
In order to make the different virtual machines communicate, we need a virtual switch.
We use vde2
, which we will install:
sudo-g5k apt install -y vde2
Start the composition
nxc start
Connect
Create a new tmux
pane (⌃ Control + B + %)
nxc connect
It should open a tmux
window with access to the virtual machine.
Check the presence of socat
[root@foo:~]# socat
2025/05/29 14:46:51 socat[82883] E exactly 2 addresses required (there are 0); use option "-h" for help
Build your first composition
You can build the composition with the nxc build
command
It takes as argument a target platform, that we call flavour.
nxc build -f <FLAVOUR>
There are different flavours that NixOS Compose can build:
-
docker
: Generates a docker compose configuration -
nspawn
: Experimental Generates light-weight container runnable with systemd-nspawn. -
vm
: Generates Qemu Virtual Machines -
g5k-nfs-store
: Generates Kernel images and initrd without a packed/nix/store
, but mounts the store of the frontend. Also deployed withkexec
-
g5k-image
: Generates full system image
In this tutorial we will focus only on g5k-nfs-store
, and g5k-image
if you have time.
For example, let us build the composition with the g5k-nfs-store
flavour:
nxc build -f g5k-nfs-store
Deploying the g5k-nfs-store
flavour
Reserve the nodes
Let us reserve 1 machine for an hour on Grid'5000:
export $(oarsub --project lab-2025-compas-nxc -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
You can use oarstat -u
to check the status of the reservation
List the reserved machines
NixOS-Compose needs a list of target machines where to deploy the software environments. This list will be written to the file OAR.$OAR_JOB_ID.stdout
in the current directory once the machines are available.
cat OAR.$OAR_JOB_ID.stdout
Which should output something like
dahu-2.grenoble.grid5000.fr
If the file does not exist yet, your reservation might not be ready yet. You can check with oarstat -u
, looking for the "S" column. "W" means waiting and "R" means ready. In any case, the next command will wait for the creation of this file.
Deploy
nxc start -m OAR.$OAR_JOB_ID.stdout -W
Release the node
oardel $OAR_JOB_ID
Deploying the g5k-image
flavour
Reserve the nodes
Let us deploy this composition on 1 physical machine
export $(oarsub --project lab-2025-compas-nxc -t deploy -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script)
1h" | grep OAR_JOB_ID)
You can use oarstat -u
to check the status of the reservation
Deploy
nxc start -m OAR.$OAR_JOB_ID.stdout -W
Release the node
oardel $OAR_JOB_ID
Connect to the nodes
Once the deployment over, you can connect to the nodes via the nxc connect
command.
nxc connect
It will open a tmux session with a pane per node, making it easy to navigate between the nodes.
You can provide a hostname to the command to connect to a specific host.
nxc connect foo
Check the presence of socat
[root@foo:~]# socat
2025/05/29 14:46:51 socat[82883] E exactly 2 addresses required (there are 0); use option "-h" for help
IOR
IOR is a parallel IO benchmark that can be used to test the performance of parallel storage systems using various interfaces and access patterns. It uses a common parallel I/O abstraction backend and relies on MPI for synchronization.
You can find the documentation here
Adding IOR to the composition
Start from a template
mkdir ior_bench
cd ior_bench
nxc init -t basic
Add IOR to the composition
The IOR benchmark is available in nixpkgs
and thus accessible in pkgs
.
We also need openmpi
to run the benchmark.
# composition.nix
{ pkgs, ... }: {
roles = {
foo = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior ];
};
};
testScript = ''
foo.succeed("true")
'';
}
Run the benchmark in the environment
As done previously, we have to go through the nxc build
and nxc start
phases.
Building
nxc build -f g5k-nfs-store
Getting the node
export $(oarsub --project lab-2025-compas-nxc -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Deploying
nxc start -m OAR.$OAR_JOB_ID.stdout -W
Running the benchmark
Once the environment is deployed, we connect with nxc connect
.
We can now run the benchmark.
ior
The output should look something like that:
[root@foo:~]# ior
IOR-3.3.0: MPI Coordinated Test of Parallel I/O
Began : Tue Sep 13 14:08:28 2022
Command line : ior
Machine : Linux foo
TestID : 0
StartTime : Tue Sep 13 14:08:28 2022
Path : /root
FS : 1.9 GiB Used FS: 1.3% Inodes: 0.5 Mi Used Inodes: 0.1%
Options:
api : POSIX
apiVersion :
test filename : testFile
access : single-shared-file
type : independent
segments : 1
ordering in a file : sequential
ordering inter file : no tasks offsets
nodes : 1
tasks : 1
clients per node : 1
repetitions : 1
xfersize : 262144 bytes
blocksize : 1 MiB
aggregate filesize : 1 MiB
Results:
access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ----
write 3117.60 13205 0.000076 1024.00 256.00 0.000014 0.000303 0.000004 0.000321 0
read 3459.03 15107 0.000066 1024.00 256.00 0.000022 0.000265 0.000002 0.000289 0
remove - - - - - - - - 0.000121 0
Max Write: 3117.60 MiB/sec (3269.04 MB/sec)
Max Read: 3459.03 MiB/sec (3627.06 MB/sec)
Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum
write 3117.60 3117.60 3117.60 0.00 12470.38 12470.38 12470.38 0.00 0.00032 NA NA 0 1 1 1 0 0 1 0 0 1 1048576 262144 1.0 POSIX 0
read 3459.03 3459.03 3459.03 0.00 13836.14 13836.14 13836.14 0.00 0.00029 NA NA 0 1 1 1 0 0 1 0 0 1 1048576 262144 1.0 POSIX 0
Finished : Tue Sep 13 14:08:28 2022
If the previous command fails, it might be because of the network interface of the Grid'5000 node you deployed on.
You can try to run the ior
command in a more explict way:
[root@foo:~]# mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root -np 4 ior
Release the booking
Now we are done, we exit the connection to the node and release this reservation.
oardel $OAR_JOB_ID
Add a PFS to the composition
For the moment the tests are only performed on the local file system of one computer.
Let us to setup a parallel filesystem to evaluate its performances. For this tutorial, it will be GlusterFS.
The approach will be the following:
-
add a new role
server
to the composition -
setup the PFS server on the
server
role -
mount the PFS export on the compute nodes
Add a role to the composition
To add another role to the composition, we only need to add a new element to the set roles
.
Let us also rename the compute node to node
and empty the testScript
:
{ pkgs, ... }: {
roles = {
node = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior ];
};
server = { pkgs, ... }:
{
# ...
};
};
testScript = ''
'';
}
Setting up the GlusterFS server
To set up the GlusterFS server, we need to configure several things:
-
open ports for the clients to connect: we will actually disable the entire firewall for simplicity stakes, but we could be more precise in the ports we open
-
Enable the
systemd
service for the GlusterFS server -
Define the export point
server = { pkgs, ... }: {
# Disable the firewall
networking.firewall.enable = false;
# Enable the glusterfs server services
services.glusterfs.enable = true;
# Define a partition at /srv that will host the blocks
fileSystems = {
"/srv" = {
device = "/dev/disk/by-partlabel/KDPL_TMP_disk0";
fsType = "ext4";
};
};
# we also add the htop package for light monitoring
environment.systemPackages = with pkgs; [ htop ];
};
Note that the KDPL_TMP_disk0
label is only valid on Grid'5000.
Mount the GlusterFS server on the compute nodes
We now need to make the compute nodes mount the PFS server.
To do this, we will also disable the firewall and create a new mounting point (/data
in our case):
node = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];
# Disable the firewall
networking.firewall.enable = false;
# Mount the PFS
fileSystems."/data" = {
device = "server:/gv0";
fsType = "glusterfs";
};
};
The gv0
represents the GlusterFS volume (not yet created!).
Test the GlusterFS server
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
This time two nodes.
export $(oarsub --project lab-2025-compas-nxc -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Starting the nodes
nxc start -m OAR.$OAR_JOB_ID.stdout -W
Connect
nxc connect
You should now have a tmux
session with 2 panes: one for the server
and one for the node
.
Setting up the GlusterFS volume
We need to create the GlusterFS volume and start it.
From the server
node, run the following commands:
mkdir -p /srv/gv0
gluster volume create gv0 server:/srv/gv0
gluster volume start gv0
Mounting the volume from the compute node
As we already define the mount in the composition, we can simply restart the systemd service:
systemctl restart data.mount
Testing the GlusterFS server
Go on the node
and change directory to the mounted volume:
cd /data
A quick ls
shows that this is empty.
Create a file in it:
touch glusterfs_works
Now go on the server
node and check the directory of the GlusterFS server.
ls /srv/gv0
If the glusterfs_works
file exists, everything worked fine!
Release the nodes
oardel $OAR_JOB_ID
Creating a service
Currently, the creation of the GlusterFS volume is manual, but we would like to do it automatically at boot time.
To do this, we will create a systemd
service and use it in the composition.
In NixOS, a service (or module) is composed of two parts: the interface and the implementation.
In the previous sections, you already interacted with services ! For example:
...
services.glusterfs.enable = true;
...
Creating the module
Let's create a new file to store the content of the service:
# my-module.nix
{ config, lib, pkgs, ... }:
with lib;
let
cfg = config.services.my-glusterfs;
in
{
################################################
#
# Interface
#
options = {
services.my-glusterfs = {
enable = mkEnableOption "My glusterfs";
package = mkOption {
type = types.package;
default = pkgs.glusterfs;
};
volumePath = mkOption {
type = types.str;
default = "/srv";
};
volumeName = mkOption {
type = types.str;
default = "gv0";
};
};
};
################################################
#
# Implementation
#
config = mkIf (cfg.enable) {
systemd.services.my-glusterfs = {
description = "My GlusterFS module";
wantedBy = [ "multi-user.target" ];
after = [ "glusterd.service" "glustereventsd.service" ];
serviceConfig.Type = "oneshot";
script =
''
if [ ! $(${cfg.package}/bin/gluster volume list | grep ${cfg.volumeName}) ]
then
mkdir -p ${cfg.volumePath}/${cfg.volumeName}
${cfg.package}/bin/gluster volume create ${cfg.volumeName} server:${cfg.volumePath}/${cfg.volumeName}
${cfg.package}/bin/gluster volume start ${cfg.volumeName}
fi
'';
};
};
}
Ok, let's decypher what all of this means.
-
we created a service called
my-glusterfs
-
the services has 4 options:
-
enable
: to enable or not the service -
package
: the Nix package containing theglusterfs
binaries -
volumePath
: the path on the server where the volume will be created -
volumeName
: the name of the volume
-
Then, in the implementation part, we explicit:
-
that this service is wanted by the
multi-user
service -
that this service needs to be executed after that the GlusterFS deamon has been started (
glusterd.service
andglustereventsd.service
) -
that this service must only be ran once (
oneshot
) -
and finally, the commands to run. Here the commands are the same as the ones seen in the previous section, but we are using the configuration of the service:
cfg.VolumePath
,cfg.VolumeName
.
Call this service
Let's now use this service in the composition.
server = { pkgs, ... }: {
# We import the definition of the service
imports = [ ./my-module.nix ];
services.my-glusterfs = {
enable = true; # We activate our service
volumePath = "/srv"; # We define where the volume will be
volumeName = "gv0"; # and the name of the volume
};
networking.firewall.enable = false;
services.glusterfs.enable = true;
fileSystems = {
"/srv" = {
device = "/dev/disk/by-partlabel/KDPL_TMP_disk0";
fsType = "ext4";
};
};
environment.systemPackages = with pkgs; [ htop ];
};
Now, everything the server
boots, it will create the volume and start it so that it should be available for the node
s to mount.
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
This time two nodes.
export $(oarsub --project lab-2025-compas-nxc -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Starting the nodes
nxc start -m OAR.$OAR_JOB_ID.stdout -W
Connect
nxc connect
Re-mount the volumes on the node
On the node
:
systemctl restart data.mount
You can now use the volume from the node
!
Add another compute node
For the moment we have a single compute node. In this section, we will add another one and run IOR on several nodes.
Add another role in the composition
We will rename the role node
into node1
and create a new role node2
with the exact same configuration:
roles = {
node1 = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];
# Disable the firewall
networking.firewall.enable = false;
# Mount the PFS
fileSystems."/data" = {
device = "server:/gv0";
fsType = "glusterfs";
};
};
node2 = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];
# Disable the firewall
networking.firewall.enable = false;
# Mount the PFS
fileSystems."/data" = {
device = "server:/gv0";
fsType = "glusterfs";
};
};
server = { pkgs, ... }:
{ # ... };
}
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
export $(oarsub --project lab-2025-compas-nxc -l nodes=3,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Deploying
nxc start -m OAR.$OAR_JOB_ID.stdout -W
Connect to the nodes
nxc connect
Remount the volume from the node
s (run this command once from any of the node):
cat /etc/hosts | grep node | cut -f2 -d" " | xargs -t -I{} systemctl --host root@{} restart data.mount
After building and starting the environment, we now have 3 nodes: node1
, node2
and the server
.
We can now try to run IOR with MPI from the node
s and writing on the PFS (/data
).
All the deployed machines already know each others (you can look at /etc/hosts
to verify).
So we will create the MPI hostfile in myhosts
:
cd /data
printf "node1 slots=8\nnode2 slots=8" > myhosts
The /data/myhosts
file should look like:
node1 slots=8
node2 slots=8
Now, from any node (node1
or node2
), we can run start the benchmark (without the high performance network of Grid'5000) with:
cd /data
mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root --hostfile myhosts -np 16 ior
Release the nodes
oardel $OAR_JOB_ID
Generalization of the composition
As you can see from the previous section, the scaling of the number of computing node is a bit cumbersome.
Fortunately, NixOS Compose provides the notion of role to tackle this issue.
A role is a configuration.
In our case, we actually have only two roles: the NFS server and the compute nodes.
The configuration of the compute nodes is the same no matter how many compute nodes.
Thus having to define the configuration for node1
and node2
is redundant.
roles = {
node = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];
# Disable the firewall
networking.firewall.enable = false;
# Mount the PFS
fileSystems."/data" = {
device = "server:/gv0";
fsType = "glusterfs";
};
};
server = { pkgs, ... }:
{ # ... };
}
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
export $(oarsub --project lab-2025-compas-nxc -l nodes=3,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Starting the nodes
The nxc start
command can take an additional yaml file as input describing the number of machines per role, as well as their hostnames.
The following yaml
file will create 3 machines: server
(one instance per default), two node
instances (node1
and node2
).
# nodes.yaml
node: 2
You can deploy the composition by passing this yaml file to the nxc start
command:
nxc start -m OAR.$OAR_JOB_ID.stdout -W nodes.yaml
Connect to the nodes
nxc connect
Remount the volume (to run once on any of the node
s):
cat /etc/hosts | grep node | cut -f2 -d" " | xargs -t -I{} systemctl --host root@{} restart data.mount
Release the nodes
oardel $OAR_JOB_ID
Adding a configuration file
IOR allows to take a configuration file as input. In this section we will integrate this configuration file into the configuration.
A IOR configuration file looks something like this:
IOR START
api=POSIX
testFile=testFile
hintsFileName=hintsFile
multiFile=0
interTestDelay=5
readFile=1
writeFile=1
filePerProc=0
checkWrite=0
checkRead=0
keepFile=1
quitOnError=0
outlierThreshold=0
setAlignment=1
singleXferAttempt=0
individualDataSets=0
verbose=0
collective=0
preallocate=0
useFileView=0
keepFileWithError=0
setTimeStampSignature=0
useSharedFilePointer=0
useStridedDatatype=0
uniqueDir=0
fsync=0
storeFileOffset=0
maxTimeDuration=60
deadlineForStonewalling=0
useExistingTestFile=0
useO_DIRECT=0
showHints=0
repetitions=3
numTasks=16
segmentCount=16
blockSize=4k
transferSize=1k
summaryFile=/tmp/results_ior.json
summaryFormat=JSON
RUN
IOR STOP
It gathers the information about the experiment.
It can then be run as:
ior -f <IOR_CONFIG_FILE>
We will put this configuration file in the composition in order to make the experiment reproducible.
Let us store this file locally in the file script.ior
.
We can then create a file in /etc/
that will point to the content of the file in the nix store.
In the following configuration we will write the file at /etc/ior_script
:
# ...
node = { pkgs, ... }: {
networking.firewall.enable = false;
environment.systemPackages = with pkgs; [ openmpi ior glusterfs ];
environment.etc = {
ior_script = {
text = builtins.readFile ./script.ior;
};
};
# Mount the PFS
fileSystems."/data" = {
device = "server:/gv0";
fsType = "glusterfs";
};
};
# ...
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
export $(oarsub --project lab-2025-compas-nxc -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Starting the nodes
nxc start -m OAR.$OAR_JOB_ID.stdout -W
Connecting
nxc connect
Running the benchmark with the script
Remount the volume (to run once on any of the node
s):
cat /etc/hosts | grep node | cut -f2 -d" " | xargs -t -I{} systemctl --host root@{} restart data.mount
From the node
:
cd /data
ior -f /etc/ior_script
Release the nodes
oardel $OAR_JOB_ID
Adding scripts to the environment
The command to start the benchmark is quite obscure and cumbersome to type. We would like to create a script to wrap it. However, creating a full Nix package for such a small script is not worth it. Fortunately, Nix provides ways to create reproducible bash (or others) script easily.
Let us create a Nix file called myscripts.nix
.
It will be a function that takes pkgs
as input and return the set of our scripts.
{ pkgs, ... }:
let
# We define some constants here
nfsMountPoint = "/data";
nbProcs = 16;
iorConfig = "/etc/ior_script";
in {
# This function takes the number of compute nodes,
# creates a hostfile for MPI and runs the benchmark
start_ior =
pkgs.writeScriptBin "start_ior" ''
cd ${nfsMountPoint}
NB_NODES=$(cat /etc/hosts | grep node | wc -l)
NB_SLOTS_PER_NODE=$((${builtins.toString nbProcs} / $NB_NODES))
cat /etc/hosts | grep node | awk -v nb_slots="$NB_SLOTS_PER_NODE" '{ print $2 " slots=" nb_slots;}' > my_hosts
mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root -np ${builtins.toString nbProcs} --hostfile my_hosts ior -f ${iorConfig}
'';
remount_glusterfs =
pkgs.writeScriptBin "remount_glusterfs" ''
cat /etc/hosts | grep node | cut -f2 -d" " | xargs -t -I{} systemctl --host root@{} restart data.mount
'';
}
We can now import these scripts in the composition:
# ...
node = { pkgs, ... }:
let
scripts = import ./myscripts.nix { inherit pkgs; };
in
{
networking.firewall.enable = false;
environment.systemPackages = with pkgs; [ openmpi ior glusterfs scripts.start_ior scripts.remount_glusterfs ];
environment.etc = {
ior_script = {
text = builtins.readFile ./script.ior;
};
};
# Mount the PFS
fileSystems."/data" = {
device = "server:/gv0";
fsType = "glusterfs";
};
};
# ...
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
export $(oarsub --project lab-2025-compas-nxc -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Starting the nodes
nxc start -m OAR.$OAR_JOB_ID.stdout -W
Connecting
nxc connect
Running the script
Remount the volumes on the node
s (to run once on a single node
):
remount_glusterfs
We can now simply run start_ior
from the node to run the benchmark.
start_ior
Release the nodes
oardel $OAR_JOB_ID
Integration with an Experiment Engine
For now we can deploy a reproducible distributed environment, connect to the nodes, and run commands. What we would like to do now is to automatize the execution of the commands in an experiment script that we can easily rerun.
Fortunately, NixOS-Compose provides an integration with Execo, a experiment engine for Grid'5000. Execo is a Python library that abstract the usual operations on Grid'5000 (submitting jobs, deploying, executing commands, etc.).
Let's see how to use Execo and NixOS-Compose to run reproducible experiments
Starting Point
The snippet below represents a good starting point for an Execo script with NixOS-Compose.
# script.py
from nixos_compose.nxc_execo import get_oar_job_nodes_nxc
from nixos_compose.g5k import key_sleep_script
import os
from execo import Remote
from execo_engine import Engine, logger, ParamSweeper, sweep
from execo_g5k import oardel, oarsub, OarSubmission, wait_oar_job_start
class NXCEngine(Engine):
def __init__(self):
super(NXCEngine, self).__init__()
parser = self.args_parser
parser.add_argument('--nxc_build_file', help='Path to the NXC build file')
parser.add_argument('--flavour', help='Flavour to deploy')
self.nodes = {}
self.oar_job_id = -1
# --- Where and how many nodes ----
self.nb_nodes = 2
self.site = "grenoble"
self.cluster = "dahu"
def init(self):
# --- Reservation ----
duration = 15 * 60 #seconds
if self.args.flavour == "g5k-image":
self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", duration, job_type=["deploy"], project="lab-2025-compas-nxc"), self.site)])[0]
else:
self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", duration, job_type=[], project="lab-2025-compas-nxc", command=f"{key_sleep_script} {duration}"), self.site)])[0]
wait_oar_job_start(self.oar_job_id, site) # wait for the job to start, otherwise we might get a timeout in the `get_oar_job_nodes_nxc`
# --- How many nodes per role ---
roles_quantities = {"server": ["server"], "node": ["node"]}
# --- Deploy and populate the dict `self.nodes` accordingly ---
self.nodes, self.roles = get_oar_job_nodes_nxc(
self.oar_job_id,
site,
flavour_name=self.args.flavour,
compose_info_file=os.environ['HOME'] + "/.local/share/nix/root" + os.readlink(self.args.nxc_build_file),
roles_quantities=roles_quantities)
def run(self):
my_command = "echo \"Hello from $(whoami) at $(hostname) ($(ip -4 addr | grep \"/20\" | awk '{print $2;}'))\" > /tmp/hello"
hello_remote = Remote(my_command, self.roles["server"], connection_params={'user': 'root'})
hello_remote.run()
my_command2 = "cat /tmp/hello"
cat_remote = Remote(my_command2, self.roles["server"], connection_params={'user': 'root'})
cat_remote.run()
for process in cat_remote.processes:
print(process.stdout)
# --- Giving back the resources ---
oardel([(self.oar_job_id, self.site)])
if __name__ == "__main__":
NXCEngine().start()
Make sure you are in an environment with NixOS-Compose available.
You can then run python3 script.py --help
.
The script takes two arguments:
-
nxc_build_file
which is the path to the result ofnxc build
. Most probably it will be underbuild/composition::FLAVOUR.json
-
and the
flavour
. On Grid'5000 it can beg5k-nfs-store
,g5k-ramdisk
, org5k-image
Let's try to run the script for the g5k-nfs-store
flavour:
python3 script.py --nxc_build_file $(pwd)/build/composition::g5k-nfs-store --flavour g5k-nfs-store
You should see the logs from Execo telling you that it is doing the reservation to OAR, and starting deploying.
When the deployment is finished, you can see that the commands that we ran in the run
function of script.py
are being executed.
Run a real experiment
The code about is just to show the basics of Execo.
In this section, we will run a more realistic experiment calling the start_ior
command that we packaged in a previous section.
# script.py
from nixos_compose.nxc_execo import get_oar_job_nodes_nxc
from nixos_compose.g5k import key_sleep_script
import os
import time
from execo import Remote
from execo_engine import Engine, logger, ParamSweeper, sweep
from execo_g5k import oardel, oarsub, OarSubmission, wait_oar_job_start
class NXCEngine(Engine):
def __init__(self):
super(NXCEngine, self).__init__()
parser = self.args_parser
parser.add_argument('--nxc_build_file', help='Path to the NXC build file')
parser.add_argument('--flavour', help='Flavour to deploy')
parser.add_argument('--nb_nodes', help='Number of nodes')
parser.add_argument('--result_file', help='path to store the results')
self.nodes = {}
self.oar_job_id = -1
# --- Where and how many nodes ----
self.site = "grenoble"
self.cluster = "dahu"
def init(self):
self.nb_nodes = int(self.args.nb_nodes)
assert self.nb_nodes >= 2, "I need at least two nodes"
# --- Reservation ----
duration = 15 * 60 #seconds
if self.args.flavour == "g5k-image":
self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", duration, job_type=["deploy"], project="lab-2025-compas-nxc"), self.site)])[0]
else:
self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", duration, job_type=[], project="lab-2025-compas-nxc", command=f"{key_sleep_script} {duration}"), self.site)])[0]
wait_oar_job_start(self.oar_job_id, site) # wait for the job to start, otherwise we might get a timeout in the `get_oar_job_nodes_nxc`
# --- How many nodes per role ---
# We want one server and all the other nodes are `node`
roles_quantities = {"server": ["server"], "node": [f"node{i}" for i in range(1, self.nb_nodes)]}
# --- Deploy and populate the dict `self.nodes` accordingly ---
self.nodes, self.roles = get_oar_job_nodes_nxc(
self.oar_job_id,
site,
flavour_name=self.args.flavour,
compose_info_file=os.environ['HOME'] + "/.local/share/nix/root" + os.readlink(self.args.nxc_build_file),
roles_quantities=roles_quantities)
def run(self):
result_file = self.args.result_file
time.sleep(10)
remount_volume_remote = Remote("remount_glusterfs", self.roles["node"][0], connection_params={'user': 'root'})
remount_volume_remote.run()
run_ior_remote = Remote("start_ior", self.roles["node"][0], connection_params={'user': 'root'})
run_ior_remote.run()
get_file_remote = Remote(f"cp /tmp/results_ior.json {result_file}", self.roles["node"][0], connection_params={'user': 'root'})
get_file_remote.run()
oardel([(self.oar_job_id, self.site)])
if __name__ == "__main__":
NXCEngine().start()
The previous script can be ran with:
python3 script.py --nxc_build_file $(pwd)/build/composition::g5k-nfs-store --flavour g5k-nfs-store --nb_nodes 2 --result_file $(pwd)/ior_results.json
Deploying OAR
NixOS Compose can deploy complex distributed system, such as the a cluster managed by the OAR batch scheduler.
Clone the composition
git clone --depth=1 git@github.com:oar-team/oar-nixos-compose.git
Variants
This repository contains several variants of the composition deploying OAR.
Let's use the master
one:
cd master/
Inspect the composition
Take a look at the composition.nix
file.
You can see that there are 3 roles:
-
frontend
: this role is reponsible to receive the job requests from the users (theoarsub
commands). This is the same as the frontend nodes on Grid'5000 (e.g.,fgrenoble
,fnancy
). -
server
: this role hosts the "brain" of OAR. It is responsible to schedule the jobs. -
node
: this role represents the compute nodes of the cluster. Notice that theroleDistribution
attribute: it can generate several instances of a role (this is similar to thenodes.yaml
file seen in 'Generalization of the composition', but more static).
Build the composition
nxc build -f g5k-nfs-store
Start the composition
Reserve the nodes
We thus need to reserve 4 nodes (1 frontend
, 1 server
and 2 node
):
export $(oarsub --project lab-2025-compas-nxc -l nodes=4,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Deploy the composition
nxc start -m OAR.$OAR_JOB_ID.stdout -W
Connect to the frontend
nxc connect frontend
Submiting a OAR job in our deployed OAR
[root@frontend:~]# su user1
[user1@frontend:/root]# cd ~
[user1@frontend:/root]# oarsub -I
You should then be connected on one of the compute node (node1
or node2
) with the OAR_JOB_ID=1
.
Release the nodes
When you are done, do not forget to release the nodes from the Grid'5000 frontend:
oardel $OAR_JOB_ID
Parametrized Builds
We often want to have variation in the environment. In our example, it could be the configuration of the NFS server.
One solution could be to have several composition.nix
files for each variant, but there is a lot of redundancy.
Instead, we will use the notion of setup
of NixOS Compose to parametrize the composition.
Add the setup to the flake.nix
In the flake.nix
file, add the following:
# ...
outputs = { self, nixpkgs, nxc }:
let
system = "x86_64-linux";
in {
packages.${system} = nxc.lib.compose {
inherit nixpkgs system;
composition = ./composition.nix;
# Defines the setup
setup = ./setup.toml;
};
defaultPackage.${system} =
self.packages.${system}."composition::nixos-test";
devShell.${system} = nxc.devShells.${system}.nxcShellFull;
};
}
Create the setup.toml
file
Create a setup.toml
file and add the following content:
# setup.toml
[project]
[params]
nfsNbProcs=8
Use the paramaters
The above setup.toml
defines a paramater called nfsNbProcs
.
The value of this paramater can be used in the composition.
The value of the paramater is available at setup.param.<VARIABLE_NAME>
.
# composition.nix
# Add `setup` in the arguments of the composition
{ pkgs, setup, ... }: {
# ....
server = { pkgs, ... }: {
networking.firewall.enable = false;
services.nfs.server.enable = true;
services.nfs.server.exports = ''
/srv/shared *(rw,no_subtree_check,fsid=0,no_root_squash)
'';
services.nfs.server.createMountPoints = true;
# Use the value of the paramater
services.nfs.server.nproc = setup.params.nfsNbProcs;
environment.systemPackages = with pkgs; [ htop ];
};
}
Introduce variations
For the moment, we only have a single value for the parameter. But we want to have variation. In the current situation, we have a NFS server with 8 processes managing the requests. We want to see how this number of processes influences the performances of the IOR benchmark. We will create two variants:
fast
: with more processes for the NFS server (let us say 32)slow
: with fewer processes (let us say 2)
We can add the variants in the setup.toml
file as follows:
# setup.toml
[project]
[params]
nfsNbProcs=8
[slow.params]
# to select this variant use:
# nxc build --setup slow
nfsNbProcs=2
[fast.params]
# to select this variant use:
# nxc build --setup fast
nfsNbProcs=32
When we build the composition we can pick the variant as:
nxc build --setup fast
# or
nxc build -s slow
If there is no --setup
or -s
flag, NixOS Compose will take the default values of the parameters.
Release the nodes
Once we are done with the experiment, we have to give back the resources.
List all your running jobs
oarstat -u
Job id Name User Submission Date S Queue
---------- -------------- -------------- ------------------- - ----------
2155685 qguilloteau 2022-09-23 10:50:36 R default
Delete the job
oardel 2155685
Deleting the job = 2155685 ...REGISTERED.
The job(s) [ 2155685 ] will be deleted in the near future.
The one-liner
You can delete all of your jobs with this one command:
oarstat -u -J | jq --raw-output 'keys | .[]' | xargs -I {} oardel {}
Deploy the environment
To deploy the built flavour use the start
subcommand of nxc
.
By default, it uses the last built environment from any flavour, but you can specify the flavour with the -f
flag.
nxc start -f <FLAVOUR>
For the flavours on Grid'5000, we first need to reserve some resources before deploying. The next sections give more details.
Packaging the MADbench2 benchmark
TODO: create a repo with a make make install setup for MB2
Nix Expression
{ stdenv, openmpi }:
stdenv.mkDerivation {
name = "MADbench2";
src = ./.;
buildInputs = [ openmpi ];
installPhase = ''
mkdir -p $out/bin
mpicc -D SYSTEM -D COLUMBIA -D IO -o MADbench2.x MADbench2.c -lm
mv MADbench2.x $out/bin
'';
}
Questions:
- do we make them add to a flake this and import it later?
Try to run the benchmark locally
First, enter a nix shell
with the packaged MADbench2 and openmpi
.
nix shell .#MADbench2 nixpkgs#openmpi
Once in the shell, you can run the benchmark with:
mpirun -np 4 MADbench2.x 640 8 1 8 8 1 1
It should output the following:
MADbench 2.0 IO-mode
no_pe = 4 no_pix = 640 no_bin = 8 no_gang = 1 sblocksize = 8 fblocksize = 8 r_mod = 1 w_mod = 1
IOMETHOD = POSIX IOMODE = SYNC FILETYPE = UNIQUE REMAP = CUSTOM
S_cc 0.00 [ 0.00: 0.00]
S_bw 0.00 [ 0.00: 0.00]
S_w 0.01 [ 0.01: 0.01]
-------
S_total 0.01 [ 0.01: 0.01]
W_cc 0.00 [ 0.00: 0.00]
W_bw 0.12 [ 0.12: 0.12]
W_r 0.00 [ 0.00: 0.00]
W_w 0.00 [ 0.00: 0.00]
-------
W_total 0.13 [ 0.13: 0.13]
C_cc 0.00 [ 0.00: 0.00]
C_bw 0.00 [ 0.00: 0.00]
C_r 0.00 [ 0.00: 0.00]
-------
C_total 0.00 [ 0.00: 0.00]
dC[0] = 0.00000e+00
What interest us in this tutorial is the total time spent writing: W_total
.
Create your first composition
Before jumping into the creation of an environment for the IOR benchmark, let us go through a simpler example.
In the next sections, we will present the notions and commands of NixOS Compose.
Making MADbench2 available
Option 1: You have a flake reprository with MADbench2 packaged
In the flake.nix
file, add your reprository as an input:
{
description = "nixos-compose - basic setup";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/21.05";
nxc.url = "git+https://gitlab.inria.fr/nixos-compose/nixos-compose.git";
mypkgs.url = "URL TO YOUR PKGS";
};
outputs = { self, nixpkgs, nxc, mypkgs }:
{
# ...
};
}
Then create an overlay with adding the benchmark:
{
# ...
outputs = { self, nixpkgs, nxc, mypkgs }:
let
system = "x86_64-linux";
myOverlay = final: prev:
{
mb2 = mypkgs.packages.${system}.MADbench2;
};
in {
packages.${system} = nxc.lib.compose {
inherit nixpkgs system;
overlays = [ myOverlay ];
composition = ./composition.nix;
# setup = ./setup.toml;
};
# ...
};
Option 2: You have a local file packaging MADbench2
In this case, you do not have to add another input, and you can simply call the Nix file packaging MADbench2 (MADbench2.nix
in the example below).
Similarly, we create an overlay with the package to add.
{
description = "nixos-compose - basic setup";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/22.05";
nxc.url = "git+https://gitlab.inria.fr/nixos-compose/nixos-compose.git";
};
outputs = { self, nixpkgs, nxc }:
let
system = "x86_64-linux";
myOverlay = final: prev:
{
mb2 = prev.callPackage ./MADbench2 { };
};
in {
packages.${system} = nxc.lib.compose {
inherit nixpkgs system;
overlays = [ myOverlay ];
composition = ./composition.nix;
setup = ./setup.toml;
};
defaultPackage.${system} =
self.packages.${system}."composition::nixos-test";
devShell.${system} = nxc.devShells.${system}.nxcShellFull;
};
}
Adding MADbench2 to the composition
Once MADbench2
has been added to the flake.nix
file, we can access the package inside the composition.
In our case, we want an environment with both MADbench2
and openmpi
to run it.
{ pkgs, ... }: {
roles = {
foo = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi MADbench2 ];
};
};
testScript = ''
foo.succeed("true")
'';
}
Run the benchmark in the environment
As done previously, we have to go through the nxc build
and nxc start
phases.
Once the environment deployed, we connect with nxc connect
We can now try to run the benchmark.
mpirun -np 4 MADbench2.x 640 8 1 8 8 1 1
Unfortunalety this will fail for several reasons:
-
we are running as root
-
there are not enough slots to start 4 processes
As we are setting up the environment for now, we can disable these warnings by adding some flags:
mpirun --allow-run-as-root --oversubscribe -np 4 MADbench2.x 640 8 1 8 8 1 1
Add a builder on Grid'5000
If you are doing this tutorial with a member of the NixOS Compose team, there is probably a builder machine on Grid'5000 to accelerate the builds. Using it requires a bit a configuration.
Ask for the <BUILDER>
address.
It should look like dahu-30
.
Add your ssh key
You need to get access to the builder machine through ssh
.
To do this we will copy your ssh key to the builder.
ssh-copy-id -i ~/.ssh/id_rsa.pub root@<BUILDER>
The password is nixos
.
You should be able to log to the <BUILDER>
via ssh root@<BUILDER>
.
Copy the Nix Config
We need to tell Nix where to find this builder and how to use it.
The configuration file should be under ~/.config/nix/nix.conf
.
# ~/.config/nix/nix.conf
experimental-features = nix-command flakes
builders = ssh://root@<BUILDER>
cores = 0
extra-sandbox-paths =
max-jobs = 0
require-sigs = false
sandbox = false
sandbox-fallback = false
substituters = http://<BUILDER>:8080 ssh-ng://root@<BUILDER> https://cache.nixos.org/
builders-use-substitutes = true
trusted-public-keys = <BUILDER>:snBDi/dGJICacgRUw4nauQ8KkSksAAAhCvPVr9OGTwk=
system-features = nixos-test benchmark big-parallel kvm
allowed-users = *
trusted-users = root
Copy paste the following configuration in the nix configuration file.
Don't forget to replace <BUILDER>
with the actual address of the builder!.
You can do it with one command. We take here the example where the builder in on node dahu-30
.
sed -i 's/<BUILDER>/dahu-30/g' ~/.config/nix/nix.conf
Note that this configuration will not work the next time you log to Grid'5000 as the builder node would have been released. You can revert back to:
# ~/.config/nix/nix.conf
experimental-features = nix-command flakes
max-jobs = 64