Introduction
This tutorial presents NixOS Compose. This tool generates and deploy fully reproducible system environements. It relies on the Nix package manager and the associated Linux distribution NixOS.
You can find the associated publication here. And some partial documentation here.
First, we will set up the connection to the Grid'5000 machines. Then we will present NixOS Compose on a simple example to get you accointed with the notions and commands. Finally, we will create step by step a reproducible environement for a distributed experiment.
Note: Beside some SSH configuration, all of the commands will be executed on the Grid'5000 platform.
Grid'5000
Grid'5000 (a.k.a. g5k) is a french testbed for distributed experiments.
This tutorial relies on this platform to perform deployments. We will thus create accounts for you.
Hopefully, by the time you read these lines, your account should have already been created.
You should receive an email like:
Subject: [Grid5000-account] Your Grid5000 account was created by ...
Dear Firstname Lastname (username),
You receive this email because your manager (...) requested a Grid5000
account for you in the context of a tutorial. To get more information about
Grid5000, see the website: http://www.grid5000.fr.
Your login on the Grid5000 platform is: username.
The next two steps for you are now to:
1/ Finish setting up your access to the platform by creating a password and an SSH key.
To do so, open the following URL:
https://public-api.grid5000.fr/stable/users/setup_password?password_token=XXXXXXXXXXXXXXXXXXXXXX#special.
2/ Read carefully the two following pages:
The Grid5000 getting started documentation (https://www.grid5000.fr/w/Getting_Started),
which gives important information on how to use the platform.
The Grid5000 usage policy (https://www.grid5000.fr/w/Grid5000:UsagePolicy),
which gives the rules that MUST be followed when using the platform. Note that any
abuse will automatically be detected and reported to your manager.
Follow the steps in the email before continuing.
Connect to Grid'5000
Add these lines to your ~/.ssh/config
file and replace G5K_USERNAME
by your username:
Host g5k
User G5K_USERNAME
Hostname access.grid5000.fr
ForwardAgent no
Host *.g5k
User G5K_USERNAME
ProxyCommand ssh g5k -W "$(basename %h .g5k):%p"
ForwardAgent no
You should be able to access the different sites of Grid'5000:
ssh grenoble.g5k
We will use the Grenoble site for this tutorial.
Tips
SSH connections can be broken, which is quite annoying when doing experiments.
We thus recommend to use tmux
to deal with it.
First job reservation
You can try to make a job reservation. Grid'5000 uses OAR as a resources job manager.
oarsub -I -t inner=2155707
-
oarsub
is the command to make a submission -
-I
means that this is an interactive job -
-t inner=2155707
means that this jobs should be executed inside the job2155707
(which is the job hosting all the jobs for this tutorial)
This command should give you access to a node that is not the frontend (fgrenoble
), but a node from the dahu
cluster (dahu-X
).
You can execute some commands there.
And once you are done, just exit the shell, and you will be back on the frontend.
As this is an interactive job, the job will be killed as soon as you exit the shell.
Add a builder on Grid'5000
If you are doing this tutorial with a member of the NixOS Compose team, there is probably a builder machine on Grid'5000 to accelerate the builds. Using it requires a bit a configuration.
Ask for the <BUILDER>
address.
It should look like dahu-30
.
Add your ssh key
You need to get access to the builder machine through ssh
.
To do this we will copy your ssh key to the builder.
ssh-copy-id -i ~/.ssh/id_rsa.pub root@<BUILDER>
The password is nixos
.
You should be able to log to the <BUILDER>
via ssh root@<BUILDER>
.
Copy the Nix Config
We need to tell Nix where to find this builder and how to use it.
The configuration file should be under ~/.config/nix/nix.conf
.
# ~/.config/nix/nix.conf
experimental-features = nix-command flakes
builders = ssh://root@<BUILDER>
cores = 0
extra-sandbox-paths =
max-jobs = 0
require-sigs = false
sandbox = false
sandbox-fallback = false
substituters = http://<BUILDER>:8080 ssh-ng://root@<BUILDER> https://cache.nixos.org/
builders-use-substitutes = true
trusted-public-keys = <BUILDER>:snBDi/dGJICacgRUw4nauQ8KkSksAAAhCvPVr9OGTwk=
system-features = nixos-test benchmark big-parallel kvm
allowed-users = *
trusted-users = root
Copy paste the following configuration in the nix configuration file.
Don't forget to replace <BUILDER>
with the actual address of the builder!.
You can do it with one command. We take here the example where the builder in on node dahu-30
.
sed -i 's/<BUILDER>/dahu-30/g' ~/.config/nix/nix.conf
Note that this configuration will not work the next time you log to Grid'5000 as the builder node would have been released. You can revert back to:
# ~/.config/nix/nix.conf
experimental-features = nix-command flakes
max-jobs = 64
Installing NixOS Compose
We can install NixOS Compose via pip
:
pip install nixos-compose
You might need to modify your $PATH
:
export PATH=$PATH:~/.local/bin
Create your first composition
Before jumping into the creation of an evironment for the IOR benchmark, let us go through a simpler example.
In the next sections, we will present the notions and commands of NixOS Compose.
Start from a template
NXC proposes several templates, which are good starting points.
Let us use the basic
one.
mkdir tuto
cd tuto
nxc init -t basic
The previous command created 3 files in the tuto
folder:
-
nxc.json
: a JSON file required by NXC. You never have to modify it. -
flake.nix
: the Nix file responsible to lock all the inputs. -
composition.nix
: the Nix file representing the distributed environment.
To ensure a good reproducibility of the composition, Nix (flake) uses Git to trace files.
Let us create a Git repository and add every created file to it:
git init
git add **
Inspect the composition.nix
file
If you open the composition.nix
file, you will find the following:
{ pkgs, ... }: {
roles = {
foo = { pkgs, ... }:
{
# add needed package
# environment.systemPackages = with pkgs; [ socat ];
};
};
testScript = ''
foo.succeed("true")
'';
}
The composition is a function that takes a set as input ({ pkgs, ... }
) and returns a set containing:
-
a
testScript
string -
a
roles
set of NixOS configurations
What interest us for the moment is the roles
set.
In the example above, we define a single role named foo
with an empty configuration.
We can add packages to the environment by uncommenting the environment.systemPackages
line:
{ pkgs, ... }: {
roles = {
foo = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ socat ];
};
};
testScript = ''
foo.succeed("true")
'';
}
Build your first composition
You can build the composition with the nxc build
command
It takes as argument a target platform, that we call flavour.
nxc build -f <FLAVOUR>
There are different flavours that NixOS Compose can build:
-
docker
: Generates a docker compose configuration -
vm-ramdisk
: Generates Qemu Virtual Machines -
g5k-ramdisk
: Generates Kernel images and initrd deployed withkexec
-
g5k-nfs-store
: Generates Kernel images and initrd without a packed/nix/store
, but mounts the store of the frontend. Also deployed withkexec
-
g5k-image
: Generates full system image
In this tutorial we will focus only on g5k-ramdisk
, g5k-nfs-store
, and g5k-image
if you have time.
For example, let us build the composition with the g5k-nfs-store
flavour:
nxc build -f g5k-nfs-store
Deploy the environment
To deploy the built environment use the start
subcommand of nxc
.
By default, it uses the last built environment from any flavour, but you can specify the flavour with the -f
flag.
nxc start -f <FLAVOUR>
For the flavours on Grid'5000, we first need to reserve some resources before deploying. The next sections give more details.
g5k-ramdisk
and g5k-nfs-store
Reserve the nodes
export $(oarsub -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Deploy
nxc start -m machines
where machine
is the file containing all the addresses of the nodes to use.
Here is an example:
dahu-1
dahu-2
dahu-4
You can generate this machines file with the command:
oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines
g5k-image
Reserve the nodes
Below user want to deploy his composition on 3 physical machines.
oarsub -l nodes=3,walltime=1:0:0 -t deploy
Deploy
nxc start -m machines
where machine
is the file containing all the addresses of the nodes to use.
Here is an example:
dahu-1
dahu-2
dahu-4
You can generate this machines file with the command:
oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines
Connect to the nodes
Once the deployment over, you can connect to the nodes via the nxc connect
command.
nxc connect
It will open a tmux session with a pane per node, making it easy to navigate between the nodes.
You can provide a hostname to the command to connect to a specific host.
nxc connect server
Release the nodes
Once we are done with the experiment, we have to give back the resources.
List all your running jobs
oarstat -u
Job id Name User Submission Date S Queue
---------- -------------- -------------- ------------------- - ----------
2155685 qguilloteau 2022-09-23 10:50:36 R default
Delete the job
oardel 2155685
Deleting the job = 2155685 ...REGISTERED.
The job(s) [ 2155685 ] will be deleted in the near future.
The one-liner
You can delete all of your jobs with this one command:
oarstat -u -J | jq --raw-output 'keys | .[]' | xargs -I {} oardel {}
IOR
IOR is a parallel IO benchmark that can be used to test the performance of parallel storage systems using various interfaces and access patterns. It uses a common parallel I/O abstraction backend and relies on MPI for synchronization.
You can find the documentation here
Adding IOR to the composition
Start from a template
mkdir ior_bench
cd ior_bench
nxc init -t basic
Add IOR to the composition
The IOR benchmark is available in nixpkgs
and thus accessible in pkgs
.
We also need openmpi
to run the benchmark.
# composition.nix
{ pkgs, ... }: {
roles = {
foo = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior ];
};
};
testScript = ''
foo.succeed("true")
'';
}
Run the benchmark in the environment
As done previously, we have to go through the nxc build
and nxc start
phases.
Building
nxc build -f g5k-nfs-store
Getting the node
export $(oarsub -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines
Deploying
nxc start -m machines
Running the benchmark
Once the environment deployed, we connect with nxc connect
We can now try to run the benchmark.
ior
The output should look something like that:
[root@foo:~]# ior
IOR-3.3.0: MPI Coordinated Test of Parallel I/O
Began : Tue Sep 13 14:08:28 2022
Command line : ior
Machine : Linux foo
TestID : 0
StartTime : Tue Sep 13 14:08:28 2022
Path : /root
FS : 1.9 GiB Used FS: 1.3% Inodes: 0.5 Mi Used Inodes: 0.1%
Options:
api : POSIX
apiVersion :
test filename : testFile
access : single-shared-file
type : independent
segments : 1
ordering in a file : sequential
ordering inter file : no tasks offsets
nodes : 1
tasks : 1
clients per node : 1
repetitions : 1
xfersize : 262144 bytes
blocksize : 1 MiB
aggregate filesize : 1 MiB
Results:
access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ----
write 3117.60 13205 0.000076 1024.00 256.00 0.000014 0.000303 0.000004 0.000321 0
read 3459.03 15107 0.000066 1024.00 256.00 0.000022 0.000265 0.000002 0.000289 0
remove - - - - - - - - 0.000121 0
Max Write: 3117.60 MiB/sec (3269.04 MB/sec)
Max Read: 3459.03 MiB/sec (3627.06 MB/sec)
Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum
write 3117.60 3117.60 3117.60 0.00 12470.38 12470.38 12470.38 0.00 0.00032 NA NA 0 1 1 1 0 0 1 0 0 1 1048576 262144 1.0 POSIX 0
read 3459.03 3459.03 3459.03 0.00 13836.14 13836.14 13836.14 0.00 0.00029 NA NA 0 1 1 1 0 0 1 0 0 1 1048576 262144 1.0 POSIX 0
Finished : Tue Sep 13 14:08:28 2022
Add a NFS server to the composition
For the moment, one only have one node which write the result onto its own filesytem. However, we want to setup a distibuted filesytem to evaluate its performances. We will take the example of NFS for this tutorial.
The approach will be the following:
-
add a new role
server
to the composition -
setup the NFS server on the
server
-
mount the NFS server on the compute nodes
Add a role to the composition
To add another role to the composition, we only need to add a new element to the set roles
.
Let us also rename the compute node to node
and empty the testScript
:
{ pkgs, ... }: {
roles = {
node = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior ];
};
server = { pkgs, ... }:
{
# ...
};
};
testScript = ''
'';
}
Setting up the NFS server
To set up the NFS server, we need to configurate several things:
-
open ports for the clients to connect: we will actually disable the entire firewall for simplicity stakes, but we could be more precise in the ports we open
-
Enable the NFS server
systemd
services -
Define the export point:
/srv/shared
in our case
server = { pkgs, ... }: {
# Disable the firewall
networking.firewall.enable = false;
# Enable the nfs server services
services.nfs.server.enable = true;
# Define a mount point at /srv/shared
services.nfs.server.exports = ''
/srv/shared *(rw,no_subtree_check,fsid=0,no_root_squash)
'';
services.nfs.server.createMountPoints = true;
# we also add the htop package for light monitoring
environment.systemPackages = with pkgs; [ htop ];
};
Mount the NFS server on the compute nodes
We now need to make the compute nodes mount the NFS server.
To do this, we will also disable the firewall and create a new mounting point (/data
in our case):
node = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior ];
# Disable the firewall
networking.firewall.enable = false;
# Mount the NFS
fileSystems."/data" = {
device = "server:/";
fsType = "nfs";
};
};
Test the NFS server
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
export $(oarsub -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Getting the machine file
oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines
Starting the nodes
nxc start -m machines
Connect
nxc connect
You should now have a tmux
session with 2 panes: one for the server
and one for the node
.
Testing the NFS server
Go on the node
and change directory to the mounted NFS:
cd /data
A quick ls
shows that this is empty.
Create a file in it:
touch nfs_works
Now go on the server
node and check the directory of the NFS server.
ls /srv/shared
If the nfs_works
file exists, everything worked fine!
Add another compute node
For the moment we have a single compute node. In this section, we will add another one and run IOR on several nodes.
Add another role in the composition
We will rename the role node
into node1
and create a new role node2
with the exact same configuration:
roles = {
node1 = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior ];
# Disable the firewall
networking.firewall.enable = false;
# Mount the NFS
fileSystems."/data" = {
device = "server:/";
fsType = "nfs";
};
};
node2 = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior ];
# Disable the firewall
networking.firewall.enable = false;
# Mount the NFS
fileSystems."/data" = {
device = "server:/";
fsType = "nfs";
};
};
server = { pkgs, ... }:
{ # ... };
}
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
export $(oarsub -l nodes=3,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Getting the machine file
oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines
Deploying
nxc start -m machines
Connect to the nodes
nxc connect
After building and starting the environment, we now have 3 nodes: node1
, node2
and the server
.
From the NFS server (/data
on the nodes), we can try to run IOR by specifying the hosts.
The nodes already know each others (you can look at /etc/hosts
to verify).
So we will create the MPI hostfile in myhosts
:
cd /data
printf "node1 slots=8\nnode2 slots=8" > myhosts
The /data/myhosts
file should look like:
node1 slots=8
node2 slots=8
Now, from any node (node1
or node2
), we can run start the benchmark (without the high performance network of Grid'5000) with:
cd /data
mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root --hostfile myhosts -np 16 ior
Generalization of the composition
As you can see from the previous section, the scaling of the number of computing node is a bit cumbersome.
Fortunately, NixOS Compose provides the notion of role to tackle this issue.
A role is a configuration.
In our case, we actually have only two roles: the NFS server and the compute nodes.
The configuration of the compute nodes is the same no matter how many compute nodes.
Thus having to define the configuration for node1
and node2
is redundant.
roles = {
node = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi ior ];
# Disable the firewall
networking.firewall.enable = false;
# Mount the NFS
fileSystems."/data" = {
device = "server:/";
fsType = "nfs";
};
};
server = { pkgs, ... }:
{ # ... };
}
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
export $(oarsub -l nodes=3,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Getting the machine file
oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines
Starting the nodes
The nxc start
command can take an additional yaml file as input describing the number of machines per role, as well as their hostnames.
nxc start -m machines nodes.yaml
The following yaml
file will create 3 machines: server
, node1
and node2
.
# nodes.yaml
node: 2
server: 1
You can specify the hostnames in the yaml
file:
# nodes.yaml
node:
- foo
- bar
server: 1
The above yaml
file will create 3 machines: server
, foo
and bar
.
Adding a configuration file
IOR allows to take a configuration file as input. In this section we will integrate this configuration file into the configuration.
A IOR configuration file looks something like this:
IOR START
api=POSIX
testFile=testFile
hintsFileName=hintsFile
multiFile=0
interTestDelay=5
readFile=1
writeFile=1
filePerProc=0
checkWrite=0
checkRead=0
keepFile=1
quitOnError=0
outlierThreshold=0
setAlignment=1
singleXferAttempt=0
individualDataSets=0
verbose=0
collective=0
preallocate=0
useFileView=0
keepFileWithError=0
setTimeStampSignature=0
useSharedFilePointer=0
useStridedDatatype=0
uniqueDir=0
fsync=0
storeFileOffset=0
maxTimeDuration=60
deadlineForStonewalling=0
useExistingTestFile=0
useO_DIRECT=0
showHints=0
repetitions=3
numTasks=16
segmentCount=16
blockSize=4k
transferSize=1k
summaryFile=/tmp/results_ior.json
summaryFormat=JSON
RUN
IOR STOP
It gathers the information about the experiment.
It can then be run as:
ior -f <IOR_CONFIG_FILE>
We will put this configuration file in the composition in order to make the experiment reproducible.
Let us store this file locally in the file script.ior
.
We can then create a file in /etc/
that will point to the content of the file in the nix store.
In the following configuration we will write the file at /etc/ior_script
:
# ...
node = { pkgs, ... }: {
networking.firewall.enable = false;
environment.systemPackages = with pkgs; [ openmpi ior ];
environment.etc = {
ior_script = {
text = builtins.readFile ./script.ior;
};
};
fileSystems."/data" = {
device = "server:/";
fsType = "nfs";
};
};
# ...
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
export $(oarsub -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Getting the machine file
oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines
Starting the nodes
nxc start -m machines
Connecting
nxc connect
Running the benchmark with the script
From the node
:
cd /data
ior -f /etc/ior_script
Adding scripts to the environment
The command to start the benchmark is quite obscure and cumbersome to type. We would like to create a script to wrap it. However, creating a full Nix package for such a small script is not worth it. Fortunately, Nix provides ways to create reproducible bash (or others) script easily.
Let us create a Nix file called myscripts.nix
.
It will be a function that takes pkgs
as input and return the set of our scripts.
{ pkgs, ... }:
let
# We define some constants here
nfsMountPoint = "/data";
nbProcs = 16;
iorConfig = "/etc/ior_script";
in {
# This function takes the number of compute nodes,
# creates a hostfile for MPI and runs the benchmark
start_ior =
pkgs.writeScriptBin "start_ior" ''
cd ${nfsMountPoint}
NB_NODES=$(cat /etc/hosts | grep node | wc -l)
NB_SLOTS_PER_NODE=$((${builtins.toString nbProcs} / $NB_NODES))
cat /etc/hosts | grep node | awk -v nb_slots="$NB_SLOTS_PER_NODE" '{ print $2 " slots=" nb_slots;}' > my_hosts
mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root -np ${builtins.toString nbProcs} --hostfile my_hosts ior -f ${iorConfig}
'';
}
We can now import these scripts in the composition:
# ...
node = { pkgs, ... }:
let
scripts = import ./myscripts.nix { inherit pkgs; };
in
{
networking.firewall.enable = false;
environment.systemPackages = with pkgs; [ openmpi ior scripts.start_ior ];
environment.etc = {
ior_script = {
text = builtins.readFile ./script.ior;
};
};
fileSystems."/data" = {
device = "server:/";
fsType = "nfs";
};
};
# ...
Building
nxc build -f g5k-nfs-store
Deploying
Reserving the resources
export $(oarsub -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
Getting the machine file
oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines
Starting the nodes
nxc start -m machines
Connecting
nxc connect
Running the script
After building and deploying, we can simply run start_ior
from the node to run the benchmark.
start_ior
Parametrized Builds
We often want to have variation in the environment. In our example, it could be the configuration of the NFS server.
One solution could be to have several composition.nix
files for each variant, but there is a lot of redundancy.
Instead, we will use the notion of setup
of NixOS Compose to parametrize the composition.
Add the setup to the flake.nix
In the flake.nix
file, add the following:
# ...
outputs = { self, nixpkgs, nxc }:
let
system = "x86_64-linux";
in {
packages.${system} = nxc.lib.compose {
inherit nixpkgs system;
composition = ./composition.nix;
# Defines the setup
setup = ./setup.toml;
};
defaultPackage.${system} =
self.packages.${system}."composition::nixos-test";
devShell.${system} = nxc.devShells.${system}.nxcShellFull;
};
}
Create the setup.toml
file
Create a setup.toml
file and add the following content:
# setup.toml
[project]
[params]
nfsNbProcs=8
Don't forget to add setup.toml
to the git:
git add setup.toml
Use the paramaters
The above setup.toml
defines a paramater called nfsNbProcs
.
The value of this paramater can be used in the composition.
The value of the paramater is available at setup.param.<VARIABLE_NAME>
.
# composition.nix
# Add `setup` in the arguments of the composition
{ pkgs, setup, ... }: {
# ....
server = { pkgs, ... }: {
networking.firewall.enable = false;
services.nfs.server.enable = true;
services.nfs.server.exports = ''
/srv/shared *(rw,no_subtree_check,fsid=0,no_root_squash)
'';
services.nfs.server.createMountPoints = true;
# Use the value of the paramater
services.nfs.server.nproc = setup.params.nfsNbProcs;
environment.systemPackages = with pkgs; [ htop ];
};
}
Introduce variations
For the moment, we only have a single value for the parameter. But we want to have variation. In the current situation, we have a NFS server with 8 processes managing the requests. We want to see how this number of processes influences the performances of the IOR benchmark. We will create two variants:
fast
: with more processes for the NFS server (let us say 32)slow
: with fewer processes (let us say 2)
We can add the variants in the setup.toml
file as follows:
# setup.toml
[project]
[params]
nfsNbProcs=8
[slow.params]
# to select this variant use:
# nxc build --setup slow
nfsNbProcs=2
[fast.params]
# to select this variant use:
# nxc build --setup fast
nfsNbProcs=32
When we build the composition we can pick the variant as:
nxc build --setup fast
# or
nxc build -s slow
If there is no --setup
or -s
flag, NixOS Compose will take the default values of the parameters.
Integration with an Experiment Engine
For now we can deploy a reproducible distributed environment, connect to the nodes, and run commands. What we would like to do now is to automatize the execution of the commands in an experiment script that we can easily rerun.
Fortunately, NixOS-Compose provides an integration with Execo, a experiment engine for Grid'5000. Execo is a Python library that abstract the usual operations on Grid'5000 (submitting jobs, deploying, executing commands, etc.).
Let's see how to use Execo and NixOS-Compose to run reproducible experiments
Starting Point
The snippet below represents a good starting point for an Execo script with NixOS-Compose.
# script.py
from nixos_compose.nxc_execo import get_oar_job_nodes_nxc
from execo import Remote
from execo_g5k import oardel, oarsub, OarSubmission, wait_oar_job_start
class NXCEngine(execo_engine.Engine):
def __init__(self):
super(MyEngine, self).__init__()
parser = self.args_parser
parser.add_argument('--nxc_build_file', help='Path to the NXC build file')
parser.add_argument('--flavour', help='Flavour to deploy')
self.nodes = {}
self.oar_job_id = -1
# --- Where and how many nodes ----
self.nb_nodes = 2
self.site = "grenoble"
self.cluster = "dahu"
def init(self):
# --- Reservation ----
self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", 15*60, job_type=["deploy"] if self.args.flavour == "g5k-image" else ["allow_classic_ssh"]), self.site)])[0]
wait_oar_job_start(self.oar_job_id, site) # wait for the job to start, otherwise we might get a timeout in the `get_oar_job_nodes_nxc`
# --- How many nodes per role ---
roles_quantities = {"server": ["server"], "node": ["node"]}
# --- Deploy and populate the dict `self.nodes` accordingly ---
self.nodes = get_oar_job_nodes_nxc(
self.oar_job_id,
site,
flavour_name=self.flavour,
compose_info_file=nxc_build_file,
roles_quantities=roles_quantities)
def run(self):
my_command = "echo \"Hello from $(whoami) at $(hostname) ($(ip -4 addr | grep \"/20\" | awk '{print $2;}'))\" > /tmp/hello"
hello_remote = Remote(my_command, self.nodes["server"], connection_params={'user': 'root'})
hello_remote.run()
my_command2 = "cat /tmp/hello"
cat_remote = Remote(my_command2, self.nodes["server"], connection_params={'user': 'root'})
cat_remote.run()
for process in cat_remote.processes:
print(process.stdout)
# --- Giving back the resources ---
oardel([(self.oar_job_id, self.site)])
if __name__ == "__main__":
NXCEngine().start()
Make sure you are in an environment with NixOS-Compose available.
You can then run python3 script.py --help
.
The script takes two arguments:
-
nxc_build_file
which is the path to the result ofnxc build
. Most probably it will be underbuild/composition::FLAVOUR.json
-
and the
flavour
. On Grid'5000 it can beg5k-nfs-store
,g5k-ramdisk
, org5k-image
Let's try to run the script for the g5k-image
flavour (make sure to have run nxc build -f g5k-image
before):
python3 script.py --nxc_build_file $(pwd)/build/composition::g5k-image --flavour g5k-image
You should see the logs from Execo telling you that it is doing the reservation to OAR, and starting deploying.
When the deployment is finished, you can see that the commands that we ran in the run
function of script.py
are being executed.
Run a real experiment
The code about is just to show the basics of Execo.
In this section, we will run a more realistic experiment calling the start_ior
command that we packaged in a previous section.
# script.py
from nixos_compose.nxc_execo import get_oar_job_nodes_nxc
from execo import Remote
from execo_g5k import oardel, oarsub, OarSubmission, wait_oar_job_start
class NXCEngine(Engine):
def __init__(self):
super(MyEngine, self).__init__()
parser = self.args_parser
parser.add_argument('--nxc_build_file', help='Path to the NXC build file')
parser.add_argument('--flavour', help='Flavour to deploy')
parser.add_argument('--nb_nodes', help='Number of nodes')
parser.add_argument('--result_dir', help='path to store the results')
self.nodes = {}
self.oar_job_id = -1
# --- Where and how many nodes ----
self.nb_nodes = 2
self.site = "grenoble"
self.cluster = "dahu"
def init(self):
# We might have more than two nodes
self.nb_nodes = int(args.nb_nodes)
assert self.nb_nodes > 2, "I need at least two nodes"
# --- Reservation ----
self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", 15*60, job_type=["deploy"] if self.args.flavour == "g5k-image" else ["allow_classic_ssh"]), self.site)])[0]
wait_oar_job_start(self.oar_job_id, site) # wait for the job to start, otherwise we might get a timeout in the `get_oar_job_nodes_nxc`
# --- How many nodes per role ---
# We want one server and all the other nodes are `node`
roles_quantities = {"server": ["server"], "node": [f"node{i}" for i in range(1, self.nb_nodes)]}
# --- Deploy and populate the dict `self.nodes` accordingly ---
self.nodes = get_oar_job_nodes_nxc(
self.oar_job_id,
site,
flavour_name=self.flavour,
compose_info_file=nxc_build_file,
roles_quantities=roles_quantities)
def run(self):
result_dir = self.args.result_dir
result_file = f"{result_dir}/results_ior_{self.nb_nodes}_nodes_{self.flavour}_flavour_{self.oar_job_id}"
run_ior_remote = Remote(f"start_ior", self.nodes["node"][0], connection_params={'user': 'root'})
run_ior_remote.run()
get_file_remote = Remote(f"cp /srv/shared/results_ior.json {result_file}", self.nodes["server"], connection_params={'user': 'root'})
get_file_remote.run()
oardel([(self.oar_job_id, self.site)])
if __name__ == "__main__":
NXCEngine().start()
Packaging the MADbench2 benchmark
TODO: create a repo with a make make install setup for MB2
Nix Expression
{ stdenv, openmpi }:
stdenv.mkDerivation {
name = "MADbench2";
src = ./.;
buildInputs = [ openmpi ];
installPhase = ''
mkdir -p $out/bin
mpicc -D SYSTEM -D COLUMBIA -D IO -o MADbench2.x MADbench2.c -lm
mv MADbench2.x $out/bin
'';
}
Questions:
- do we make them add to a flake this and import it later?
Try to run the benchmark locally
First, enter a nix shell
with the packaged MADbench2 and openmpi
.
nix shell .#MADbench2 nixpkgs#openmpi
Once in the shell, you can run the benchmark with:
mpirun -np 4 MADbench2.x 640 8 1 8 8 1 1
It should output the following:
MADbench 2.0 IO-mode
no_pe = 4 no_pix = 640 no_bin = 8 no_gang = 1 sblocksize = 8 fblocksize = 8 r_mod = 1 w_mod = 1
IOMETHOD = POSIX IOMODE = SYNC FILETYPE = UNIQUE REMAP = CUSTOM
S_cc 0.00 [ 0.00: 0.00]
S_bw 0.00 [ 0.00: 0.00]
S_w 0.01 [ 0.01: 0.01]
-------
S_total 0.01 [ 0.01: 0.01]
W_cc 0.00 [ 0.00: 0.00]
W_bw 0.12 [ 0.12: 0.12]
W_r 0.00 [ 0.00: 0.00]
W_w 0.00 [ 0.00: 0.00]
-------
W_total 0.13 [ 0.13: 0.13]
C_cc 0.00 [ 0.00: 0.00]
C_bw 0.00 [ 0.00: 0.00]
C_r 0.00 [ 0.00: 0.00]
-------
C_total 0.00 [ 0.00: 0.00]
dC[0] = 0.00000e+00
What interest us in this tutorial is the total time spent writing: W_total
.
Create your first composition
Before jumping into the creation of an evironment for the IOR benchmark, let us go through a simpler example.
In the next sections, we will present the notions and commands of NixOS Compose.
Making MADbench2 available
Option 1: You have a flake reprository with MADbench2 packaged
In the flake.nix
file, add your reprository as an input:
{
description = "nixos-compose - basic setup";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/21.05";
nxc.url = "git+https://gitlab.inria.fr/nixos-compose/nixos-compose.git";
mypkgs.url = "URL TO YOUR PKGS";
};
outputs = { self, nixpkgs, nxc, mypkgs }:
{
# ...
};
}
Then create an overlay with adding the benchmark:
{
# ...
outputs = { self, nixpkgs, nxc, mypkgs }:
let
system = "x86_64-linux";
myOverlay = final: prev:
{
mb2 = mypkgs.packages.${system}.MADbench2;
};
in {
packages.${system} = nxc.lib.compose {
inherit nixpkgs system;
overlays = [ myOverlay ];
composition = ./composition.nix;
# setup = ./setup.toml;
};
# ...
};
Option 2: You have a local file packaging MADbench2
In this case, you do not have to add another input, and you can simply call the Nix file packaging MADbench2 (MADbench2.nix
in the example below).
Similarly, we create an overlay with the package to add.
{
description = "nixos-compose - basic setup";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/22.05";
nxc.url = "git+https://gitlab.inria.fr/nixos-compose/nixos-compose.git";
};
outputs = { self, nixpkgs, nxc }:
let
system = "x86_64-linux";
myOverlay = final: prev:
{
mb2 = prev.callPackage ./MADbench2 { };
};
in {
packages.${system} = nxc.lib.compose {
inherit nixpkgs system;
overlays = [ myOverlay ];
composition = ./composition.nix;
setup = ./setup.toml;
};
defaultPackage.${system} =
self.packages.${system}."composition::nixos-test";
devShell.${system} = nxc.devShells.${system}.nxcShellFull;
};
}
Adding MADbench2 to the composition
Once MADbench2
has been added to the flake.nix
file, we can access the package inside the composition.
In our case, we want an environment with both MADbench2
and openmpi
to run it.
{ pkgs, ... }: {
roles = {
foo = { pkgs, ... }:
{
# add needed package
environment.systemPackages = with pkgs; [ openmpi MADbench2 ];
};
};
testScript = ''
foo.succeed("true")
'';
}
Run the benchmark in the environment
As done previously, we have to go through the nxc build
and nxc start
phases.
Once the environment deployed, we connect with nxc connect
We can now try to run the benchmark.
mpirun -np 4 MADbench2.x 640 8 1 8 8 1 1
Unfortunalety this will fail for several reasons:
-
we are running as root
-
there are not enough slots to start 4 processes
As we are setting up the environment for now, we can disable these warnings by adding some flags:
mpirun --allow-run-as-root --oversubscribe -np 4 MADbench2.x 640 8 1 8 8 1 1