Introduction

This tutorial presents NixOS Compose. This tool generates and deploy fully reproducible system environements. It relies on the Nix package manager and the associated Linux distribution NixOS.

You can find the associated publication here. And some partial documentation here.

First, we will set up the connection to the Grid'5000 machines. Then we will present NixOS Compose on a simple example to get you accointed with the notions and commands. Finally, we will create step by step a reproducible environement for a distributed experiment.

Note: Beside some SSH configuration, all of the commands will be executed on the Grid'5000 platform.

Grid'5000

Grid'5000 (a.k.a. g5k) is a french testbed for distributed experiments.

This tutorial relies on this platform to perform deployments. We will thus create accounts for you.

Hopefully, by the time you read these lines, your account should have already been created.

Email

You should receive an email like:

Subject: [Grid5000-account] Your Grid5000 account was created by ...

Dear Firstname Lastname (username),

You receive this email because your manager (...) requested a Grid5000
account for you in the context of a tutorial. To get more information about
Grid5000, see the website: http://www.grid5000.fr.

Your login on the Grid5000 platform is: username.

The next two steps for you are now to:

1/ Finish setting up your access to the platform by creating a password and an SSH key.

   To do so, open the following URL:
   https://public-api.grid5000.fr/stable/users/setup_password?password_token=XXXXXXXXXXXXXXXXXXXXXX#special.

2/ Read carefully the two following pages:

   The Grid5000 getting started documentation (https://www.grid5000.fr/w/Getting_Started),
   which gives important information on how to use the platform.

   The Grid5000 usage policy (https://www.grid5000.fr/w/Grid5000:UsagePolicy),
   which gives the rules that MUST be followed when using the platform. Note that any
   abuse will automatically be detected and reported to your manager.

Follow the steps in the email before continuing.

Connect to Grid'5000

Add these lines to your ~/.ssh/config file and replace G5K_USERNAME by your username:

Host g5k
  User G5K_USERNAME
  Hostname access.grid5000.fr
  ForwardAgent no
Host *.g5k
  User G5K_USERNAME
  ProxyCommand ssh g5k -W "$(basename %h .g5k):%p"
  ForwardAgent no

You should be able to access the different sites of Grid'5000:

ssh grenoble.g5k

We will use the Grenoble site for this tutorial.

Tips

SSH connections can be broken, which is quite annoying when doing experiments. We thus recommend to use tmux to deal with it.

First job reservation

You can try to make a job reservation. Grid'5000 uses OAR as a resources job manager.

oarsub -I -t inner=2155707
  • oarsub is the command to make a submission

  • -I means that this is an interactive job

  • -t inner=2155707 means that this jobs should be executed inside the job 2155707 (which is the job hosting all the jobs for this tutorial)

This command should give you access to a node that is not the frontend (fgrenoble), but a node from the dahu cluster (dahu-X). You can execute some commands there. And once you are done, just exit the shell, and you will be back on the frontend. As this is an interactive job, the job will be killed as soon as you exit the shell.

Add a builder on Grid'5000

If you are doing this tutorial with a member of the NixOS Compose team, there is probably a builder machine on Grid'5000 to accelerate the builds. Using it requires a bit a configuration.

Ask for the <BUILDER> address. It should look like dahu-30.

Add your ssh key

You need to get access to the builder machine through ssh. To do this we will copy your ssh key to the builder.

ssh-copy-id -i ~/.ssh/id_rsa.pub root@<BUILDER>

The password is nixos.

You should be able to log to the <BUILDER> via ssh root@<BUILDER>.

Copy the Nix Config

We need to tell Nix where to find this builder and how to use it. The configuration file should be under ~/.config/nix/nix.conf.

# ~/.config/nix/nix.conf
experimental-features = nix-command flakes
builders = ssh://root@<BUILDER>
cores = 0
extra-sandbox-paths =
max-jobs = 0
require-sigs = false
sandbox = false
sandbox-fallback = false
substituters = http://<BUILDER>:8080 ssh-ng://root@<BUILDER> https://cache.nixos.org/
builders-use-substitutes = true
trusted-public-keys = <BUILDER>:snBDi/dGJICacgRUw4nauQ8KkSksAAAhCvPVr9OGTwk=
system-features = nixos-test benchmark big-parallel kvm
allowed-users = *
trusted-users = root

Copy paste the following configuration in the nix configuration file. Don't forget to replace <BUILDER> with the actual address of the builder!.

You can do it with one command. We take here the example where the builder in on node dahu-30.

sed -i 's/<BUILDER>/dahu-30/g' ~/.config/nix/nix.conf

Note that this configuration will not work the next time you log to Grid'5000 as the builder node would have been released. You can revert back to:

# ~/.config/nix/nix.conf
experimental-features = nix-command flakes
max-jobs = 64

Installing NixOS Compose

We can install NixOS Compose via pip:

pip install nixos-compose

You might need to modify your $PATH:

export PATH=$PATH:~/.local/bin

Create your first composition

Before jumping into the creation of an evironment for the IOR benchmark, let us go through a simpler example.

In the next sections, we will present the notions and commands of NixOS Compose.

Start from a template

NXC proposes several templates, which are good starting points.

Let us use the basic one.

mkdir tuto
cd tuto
nxc init -t basic

The previous command created 3 files in the tuto folder:

  • nxc.json: a JSON file required by NXC. You never have to modify it.

  • flake.nix: the Nix file responsible to lock all the inputs.

  • composition.nix: the Nix file representing the distributed environment.

To ensure a good reproducibility of the composition, Nix (flake) uses Git to trace files.

Let us create a Git repository and add every created file to it:

git init
git add **

Inspect the composition.nix file

If you open the composition.nix file, you will find the following:

{ pkgs, ... }: {
  roles = {
    foo = { pkgs, ... }:
      {
        # add needed package
        # environment.systemPackages = with pkgs; [ socat ];
      };
  };
  testScript = ''
    foo.succeed("true")
  '';
}

The composition is a function that takes a set as input ({ pkgs, ... }) and returns a set containing:

  • a testScript string

  • a roles set of NixOS configurations

What interest us for the moment is the roles set. In the example above, we define a single role named foo with an empty configuration. We can add packages to the environment by uncommenting the environment.systemPackages line:

{ pkgs, ... }: {
  roles = {
    foo = { pkgs, ... }:
      {
        # add needed package
        environment.systemPackages = with pkgs; [ socat ];
      };
  };
  testScript = ''
    foo.succeed("true")
  '';
}

Build your first composition

You can build the composition with the nxc build command

It takes as argument a target platform, that we call flavour.

nxc build -f <FLAVOUR>

There are different flavours that NixOS Compose can build:

  • docker: Generates a docker compose configuration

  • vm-ramdisk: Generates Qemu Virtual Machines

  • g5k-ramdisk: Generates Kernel images and initrd deployed with kexec

  • g5k-nfs-store: Generates Kernel images and initrd without a packed /nix/store, but mounts the store of the frontend. Also deployed with kexec

  • g5k-image: Generates full system image

In this tutorial we will focus only on g5k-ramdisk, g5k-nfs-store, and g5k-image if you have time.

For example, let us build the composition with the g5k-nfs-store flavour:

nxc build -f g5k-nfs-store

Deploy the environment

To deploy the built environment use the start subcommand of nxc. By default, it uses the last built environment from any flavour, but you can specify the flavour with the -f flag.

nxc start -f <FLAVOUR>

For the flavours on Grid'5000, we first need to reserve some resources before deploying. The next sections give more details.

g5k-ramdisk and g5k-nfs-store

Reserve the nodes

export $(oarsub -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Deploy

nxc start -m machines

where machine is the file containing all the addresses of the nodes to use.

Here is an example:

dahu-1
dahu-2
dahu-4

You can generate this machines file with the command:

oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines

g5k-image

Reserve the nodes

Below user want to deploy his composition on 3 physical machines.

oarsub -l nodes=3,walltime=1:0:0 -t deploy

Deploy

nxc start -m machines

where machine is the file containing all the addresses of the nodes to use.

Here is an example:

dahu-1
dahu-2
dahu-4

You can generate this machines file with the command:

oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines

Connect to the nodes

Once the deployment over, you can connect to the nodes via the nxc connect command.

nxc connect

It will open a tmux session with a pane per node, making it easy to navigate between the nodes.

You can provide a hostname to the command to connect to a specific host.

nxc connect server

Release the nodes

Once we are done with the experiment, we have to give back the resources.

List all your running jobs

oarstat -u
Job id     Name           User           Submission Date     S Queue
---------- -------------- -------------- ------------------- - ----------
2155685                   qguilloteau    2022-09-23 10:50:36 R default

Delete the job

oardel 2155685
Deleting the job = 2155685 ...REGISTERED.
The job(s) [ 2155685 ] will be deleted in the near future.

The one-liner

You can delete all of your jobs with this one command:

oarstat -u -J | jq --raw-output 'keys | .[]' | xargs -I {} oardel {}

IOR

IOR is a parallel IO benchmark that can be used to test the performance of parallel storage systems using various interfaces and access patterns. It uses a common parallel I/O abstraction backend and relies on MPI for synchronization.

You can find the documentation here

Adding IOR to the composition

Start from a template

mkdir ior_bench
cd ior_bench
nxc init -t basic

Add IOR to the composition

The IOR benchmark is available in nixpkgs and thus accessible in pkgs. We also need openmpi to run the benchmark.

# composition.nix
{ pkgs, ... }: {
  roles = {
    foo = { pkgs, ... }:
      {
        # add needed package
        environment.systemPackages = with pkgs; [ openmpi ior ];
      };
  };
  testScript = ''
    foo.succeed("true")
  '';
}

Run the benchmark in the environment

As done previously, we have to go through the nxc build and nxc start phases.

Building

nxc build -f g5k-nfs-store

Getting the node

export $(oarsub -l nodes=1,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)
oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines

Deploying

nxc start -m machines

Running the benchmark

Once the environment deployed, we connect with nxc connect

We can now try to run the benchmark.

ior

The output should look something like that:

[root@foo:~]# ior
IOR-3.3.0: MPI Coordinated Test of Parallel I/O
Began               : Tue Sep 13 14:08:28 2022
Command line        : ior
Machine             : Linux foo
TestID              : 0
StartTime           : Tue Sep 13 14:08:28 2022
Path                : /root
FS                  : 1.9 GiB   Used FS: 1.3%   Inodes: 0.5 Mi   Used Inodes: 0.1%

Options:
api                 : POSIX
apiVersion          :
test filename       : testFile
access              : single-shared-file
type                : independent
segments            : 1
ordering in a file  : sequential
ordering inter file : no tasks offsets
nodes               : 1
tasks               : 1
clients per node    : 1
repetitions         : 1
xfersize            : 262144 bytes
blocksize           : 1 MiB
aggregate filesize  : 1 MiB

Results:

access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
write     3117.60    13205      0.000076    1024.00    256.00     0.000014   0.000303   0.000004   0.000321   0
read      3459.03    15107      0.000066    1024.00    256.00     0.000022   0.000265   0.000002   0.000289   0
remove    -          -          -           -          -          -          -          -          0.000121   0
Max Write: 3117.60 MiB/sec (3269.04 MB/sec)
Max Read:  3459.03 MiB/sec (3627.06 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev   Max(OPs)   Min(OPs)  Mean(OPs)     StdDev    Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt   blksiz    xsize aggs(MiB)   API RefNum
write        3117.60    3117.60    3117.60       0.00   12470.38   12470.38   12470.38       0.00    0.00032         NA            NA     0      1   1    1   0     0        1         0    0      1  1048576   262144       1.0 POSIX      0
read         3459.03    3459.03    3459.03       0.00   13836.14   13836.14   13836.14       0.00    0.00029         NA            NA     0      1   1    1   0     0        1         0    0      1  1048576   262144       1.0 POSIX      0
Finished            : Tue Sep 13 14:08:28 2022

Add a NFS server to the composition

For the moment, one only have one node which write the result onto its own filesytem. However, we want to setup a distibuted filesytem to evaluate its performances. We will take the example of NFS for this tutorial.

The approach will be the following:

  1. add a new role server to the composition

  2. setup the NFS server on the server

  3. mount the NFS server on the compute nodes

Add a role to the composition

To add another role to the composition, we only need to add a new element to the set roles. Let us also rename the compute node to node and empty the testScript:

{ pkgs, ... }: {
  roles = {
    node = { pkgs, ... }:
      {
        # add needed package
        environment.systemPackages = with pkgs; [ openmpi ior ];
      };
    server = { pkgs, ... }:
      {
        # ...
      };
  };
  testScript = ''
  '';
}

Setting up the NFS server

To set up the NFS server, we need to configurate several things:

  1. open ports for the clients to connect: we will actually disable the entire firewall for simplicity stakes, but we could be more precise in the ports we open

  2. Enable the NFS server systemd services

  3. Define the export point: /srv/shared in our case

server = { pkgs, ... }: {
  # Disable the firewall
  networking.firewall.enable = false;

  # Enable the nfs server services
  services.nfs.server.enable = true;

  # Define a mount point at /srv/shared
  services.nfs.server.exports = ''
    /srv/shared *(rw,no_subtree_check,fsid=0,no_root_squash)
  '';
  services.nfs.server.createMountPoints = true;

  # we also add the htop package for light monitoring
  environment.systemPackages = with pkgs; [ htop ];
};

Mount the NFS server on the compute nodes

We now need to make the compute nodes mount the NFS server.

To do this, we will also disable the firewall and create a new mounting point (/data in our case):

node = { pkgs, ... }:
{
  # add needed package
  environment.systemPackages = with pkgs; [ openmpi ior ];

  # Disable the firewall
  networking.firewall.enable = false;

  # Mount the NFS
  fileSystems."/data" = {
    device = "server:/";
    fsType = "nfs";
  };
};

Test the NFS server

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

export $(oarsub -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Getting the machine file

oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines

Starting the nodes

nxc start -m machines

Connect

nxc connect

You should now have a tmux session with 2 panes: one for the server and one for the node.

Testing the NFS server

Go on the node and change directory to the mounted NFS:

cd /data

A quick ls shows that this is empty.

Create a file in it:

touch nfs_works

Now go on the server node and check the directory of the NFS server.

ls /srv/shared

If the nfs_works file exists, everything worked fine!

Add another compute node

For the moment we have a single compute node. In this section, we will add another one and run IOR on several nodes.

Add another role in the composition

We will rename the role node into node1 and create a new role node2 with the exact same configuration:


roles = {

  node1 = { pkgs, ... }:
  {
    # add needed package
    environment.systemPackages = with pkgs; [ openmpi ior ];
  
    # Disable the firewall
    networking.firewall.enable = false;
  
    # Mount the NFS
    fileSystems."/data" = {
      device = "server:/";
      fsType = "nfs";
    };
  };

  node2 = { pkgs, ... }:
  {
    # add needed package
    environment.systemPackages = with pkgs; [ openmpi ior ];
  
    # Disable the firewall
    networking.firewall.enable = false;
  
    # Mount the NFS
    fileSystems."/data" = {
      device = "server:/";
      fsType = "nfs";
    };
  };

  server = { pkgs, ... }:
  { # ... };
}

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

export $(oarsub -l nodes=3,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Getting the machine file

oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines

Deploying

nxc start -m machines

Connect to the nodes

nxc connect

After building and starting the environment, we now have 3 nodes: node1, node2 and the server.

From the NFS server (/data on the nodes), we can try to run IOR by specifying the hosts.

The nodes already know each others (you can look at /etc/hosts to verify). So we will create the MPI hostfile in myhosts:

cd /data
printf "node1 slots=8\nnode2 slots=8" > myhosts

The /data/myhosts file should look like:

node1 slots=8
node2 slots=8

Now, from any node (node1 or node2), we can run start the benchmark (without the high performance network of Grid'5000) with:

cd /data
mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root --hostfile myhosts -np 16 ior

Generalization of the composition

As you can see from the previous section, the scaling of the number of computing node is a bit cumbersome.

Fortunately, NixOS Compose provides the notion of role to tackle this issue.

A role is a configuration. In our case, we actually have only two roles: the NFS server and the compute nodes. The configuration of the compute nodes is the same no matter how many compute nodes. Thus having to define the configuration for node1 and node2 is redundant.

roles = {

  node = { pkgs, ... }:
  {
    # add needed package
    environment.systemPackages = with pkgs; [ openmpi ior ];
  
    # Disable the firewall
    networking.firewall.enable = false;
  
    # Mount the NFS
    fileSystems."/data" = {
      device = "server:/";
      fsType = "nfs";
    };
  };

  server = { pkgs, ... }:
  { # ... };
}

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

export $(oarsub -l nodes=3,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Getting the machine file

oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines

Starting the nodes

The nxc start command can take an additional yaml file as input describing the number of machines per role, as well as their hostnames.

nxc start -m machines nodes.yaml

The following yaml file will create 3 machines: server, node1 and node2.

# nodes.yaml
node: 2
server: 1

You can specify the hostnames in the yaml file:

# nodes.yaml
node:
    - foo
    - bar
server: 1

The above yaml file will create 3 machines: server, foo and bar.

Adding a configuration file

IOR allows to take a configuration file as input. In this section we will integrate this configuration file into the configuration.

A IOR configuration file looks something like this:

IOR START
    api=POSIX
    testFile=testFile
    hintsFileName=hintsFile
    multiFile=0
    interTestDelay=5
    readFile=1
    writeFile=1
    filePerProc=0
    checkWrite=0
    checkRead=0
    keepFile=1
    quitOnError=0
    outlierThreshold=0
    setAlignment=1
    singleXferAttempt=0
    individualDataSets=0
    verbose=0
    collective=0
    preallocate=0
    useFileView=0
    keepFileWithError=0
    setTimeStampSignature=0
    useSharedFilePointer=0
    useStridedDatatype=0
    uniqueDir=0
    fsync=0
    storeFileOffset=0
    maxTimeDuration=60
    deadlineForStonewalling=0
    useExistingTestFile=0
    useO_DIRECT=0
    showHints=0

    repetitions=3
    numTasks=16
    segmentCount=16
    blockSize=4k
    transferSize=1k

    summaryFile=/tmp/results_ior.json
    summaryFormat=JSON
RUN
IOR STOP

It gathers the information about the experiment.

It can then be run as:

ior -f <IOR_CONFIG_FILE>

We will put this configuration file in the composition in order to make the experiment reproducible. Let us store this file locally in the file script.ior.

We can then create a file in /etc/ that will point to the content of the file in the nix store. In the following configuration we will write the file at /etc/ior_script:

# ...
  node = { pkgs, ... }: {
    networking.firewall.enable = false;

    environment.systemPackages = with pkgs; [ openmpi ior ];

    environment.etc = {
      ior_script = {
        text = builtins.readFile ./script.ior;
      };
    };

    fileSystems."/data" = {
      device = "server:/";
      fsType = "nfs";
    };
  };
# ...

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

export $(oarsub -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Getting the machine file

oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines

Starting the nodes

nxc start -m machines

Connecting

nxc connect

Running the benchmark with the script

From the node:

cd /data
ior -f /etc/ior_script

Adding scripts to the environment

The command to start the benchmark is quite obscure and cumbersome to type. We would like to create a script to wrap it. However, creating a full Nix package for such a small script is not worth it. Fortunately, Nix provides ways to create reproducible bash (or others) script easily.

Let us create a Nix file called myscripts.nix. It will be a function that takes pkgs as input and return the set of our scripts.

{ pkgs, ... }:
let
  # We define some constants here
  nfsMountPoint = "/data";
  nbProcs = 16;
  iorConfig = "/etc/ior_script";
in {
  # This function takes the number of compute nodes,
  # creates a hostfile for MPI and runs the benchmark
  start_ior =
    pkgs.writeScriptBin "start_ior" ''
      cd ${nfsMountPoint}

      NB_NODES=$(cat /etc/hosts | grep node | wc -l)
      NB_SLOTS_PER_NODE=$((${builtins.toString nbProcs} / $NB_NODES))

      cat /etc/hosts | grep node | awk -v nb_slots="$NB_SLOTS_PER_NODE" '{ print $2 " slots=" nb_slots;}' > my_hosts

      mpirun --mca pml ^ucx --mca mtl ^psm2,ofi --mca btl ^ofi,openib --allow-run-as-root -np ${builtins.toString nbProcs} --hostfile my_hosts ior -f ${iorConfig}
    '';
}

We can now import these scripts in the composition:

# ...
  node = { pkgs, ... }:
  let
    scripts = import ./myscripts.nix { inherit pkgs; };
  in
  {
    networking.firewall.enable = false;

    environment.systemPackages = with pkgs; [ openmpi ior scripts.start_ior ];

    environment.etc = {
      ior_script = {
        text = builtins.readFile ./script.ior;
      };
    };

    fileSystems."/data" = {
      device = "server:/";
      fsType = "nfs";
    };
  };
# ...

Building

nxc build -f g5k-nfs-store

Deploying

Reserving the resources

export $(oarsub -l nodes=2,walltime=1:0:0 "$(nxc helper g5k_script) 1h" | grep OAR_JOB_ID)

Getting the machine file

oarstat -u -J | jq --raw-output 'to_entries | .[0].value.assigned_network_address | .[]' > machines

Starting the nodes

nxc start -m machines

Connecting

nxc connect

Running the script

After building and deploying, we can simply run start_ior from the node to run the benchmark.

start_ior

Parametrized Builds

We often want to have variation in the environment. In our example, it could be the configuration of the NFS server.

One solution could be to have several composition.nix files for each variant, but there is a lot of redundancy.

Instead, we will use the notion of setup of NixOS Compose to parametrize the composition.

Add the setup to the flake.nix

In the flake.nix file, add the following:

# ...
  outputs = { self, nixpkgs, nxc }:
    let
      system = "x86_64-linux";
    in {
      packages.${system} = nxc.lib.compose {
        inherit nixpkgs system;
        composition = ./composition.nix;

        # Defines the setup
        setup = ./setup.toml;

      };

      defaultPackage.${system} =
        self.packages.${system}."composition::nixos-test";

      devShell.${system} = nxc.devShells.${system}.nxcShellFull;
    };
}

Create the setup.toml file

Create a setup.toml file and add the following content:

# setup.toml

[project]

[params]
nfsNbProcs=8

Don't forget to add setup.toml to the git:

git add setup.toml

Use the paramaters

The above setup.toml defines a paramater called nfsNbProcs. The value of this paramater can be used in the composition. The value of the paramater is available at setup.param.<VARIABLE_NAME>.

# composition.nix

# Add `setup` in the arguments of the composition
{ pkgs, setup, ... }: {
  # ....
  server = { pkgs, ... }: {
    networking.firewall.enable = false;
    services.nfs.server.enable = true;
    services.nfs.server.exports = ''
      /srv/shared *(rw,no_subtree_check,fsid=0,no_root_squash)
    '';
    services.nfs.server.createMountPoints = true;

    # Use the value of the paramater
    services.nfs.server.nproc = setup.params.nfsNbProcs;

    environment.systemPackages = with pkgs; [ htop ];
  };
}

Introduce variations

For the moment, we only have a single value for the parameter. But we want to have variation. In the current situation, we have a NFS server with 8 processes managing the requests. We want to see how this number of processes influences the performances of the IOR benchmark. We will create two variants:

  • fast: with more processes for the NFS server (let us say 32)
  • slow: with fewer processes (let us say 2)

We can add the variants in the setup.toml file as follows:

# setup.toml

[project]

[params]
nfsNbProcs=8

[slow.params]
# to select this variant use:
# nxc build --setup slow
nfsNbProcs=2

[fast.params]
# to select this variant use:
# nxc build --setup fast
nfsNbProcs=32

When we build the composition we can pick the variant as:

nxc build --setup fast
# or
nxc build -s slow

If there is no --setup or -s flag, NixOS Compose will take the default values of the parameters.

Integration with an Experiment Engine

For now we can deploy a reproducible distributed environment, connect to the nodes, and run commands. What we would like to do now is to automatize the execution of the commands in an experiment script that we can easily rerun.

Fortunately, NixOS-Compose provides an integration with Execo, a experiment engine for Grid'5000. Execo is a Python library that abstract the usual operations on Grid'5000 (submitting jobs, deploying, executing commands, etc.).

Let's see how to use Execo and NixOS-Compose to run reproducible experiments

Starting Point

The snippet below represents a good starting point for an Execo script with NixOS-Compose.

# script.py
from nixos_compose.nxc_execo import get_oar_job_nodes_nxc

from execo import Remote
from execo_g5k import oardel, oarsub, OarSubmission, wait_oar_job_start

class NXCEngine(execo_engine.Engine):
    def __init__(self):
        super(MyEngine, self).__init__()
        parser = self.args_parser
        parser.add_argument('--nxc_build_file', help='Path to the NXC build file')
        parser.add_argument('--flavour', help='Flavour to deploy')
        self.nodes = {}
        self.oar_job_id = -1
        # --- Where and how many nodes ----
        self.nb_nodes = 2
        self.site = "grenoble"
        self.cluster = "dahu"

    def init(self):

        # --- Reservation ----
        self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", 15*60, job_type=["deploy"] if self.args.flavour == "g5k-image" else ["allow_classic_ssh"]), self.site)])[0]
        wait_oar_job_start(self.oar_job_id, site) # wait for the job to start, otherwise we might get a timeout in the `get_oar_job_nodes_nxc`

        # --- How many nodes per role ---
        roles_quantities = {"server": ["server"], "node": ["node"]}

        # --- Deploy and populate the dict `self.nodes` accordingly ---
        self.nodes = get_oar_job_nodes_nxc(
            self.oar_job_id,
            site,
            flavour_name=self.flavour,
            compose_info_file=nxc_build_file,
            roles_quantities=roles_quantities)

    def run(self):
        my_command = "echo \"Hello from $(whoami) at $(hostname) ($(ip -4 addr | grep \"/20\" | awk '{print $2;}'))\" > /tmp/hello"
        hello_remote = Remote(my_command, self.nodes["server"], connection_params={'user': 'root'})
        hello_remote.run()

        my_command2 = "cat /tmp/hello"
        cat_remote = Remote(my_command2, self.nodes["server"], connection_params={'user': 'root'})
        cat_remote.run()
        for process in cat_remote.processes:
            print(process.stdout)

        # --- Giving back the resources ---
        oardel([(self.oar_job_id, self.site)])


if __name__ == "__main__":
    NXCEngine().start()

Make sure you are in an environment with NixOS-Compose available.

You can then run python3 script.py --help.

The script takes two arguments:

  • nxc_build_file which is the path to the result of nxc build. Most probably it will be under build/composition::FLAVOUR.json

  • and the flavour. On Grid'5000 it can be g5k-nfs-store, g5k-ramdisk, or g5k-image

Let's try to run the script for the g5k-image flavour (make sure to have run nxc build -f g5k-image before):

python3 script.py --nxc_build_file $(pwd)/build/composition::g5k-image --flavour g5k-image

You should see the logs from Execo telling you that it is doing the reservation to OAR, and starting deploying. When the deployment is finished, you can see that the commands that we ran in the run function of script.py are being executed.

Run a real experiment

The code about is just to show the basics of Execo. In this section, we will run a more realistic experiment calling the start_ior command that we packaged in a previous section.

# script.py
from nixos_compose.nxc_execo import get_oar_job_nodes_nxc

from execo import Remote
from execo_g5k import oardel, oarsub, OarSubmission, wait_oar_job_start

class NXCEngine(Engine):
    def __init__(self):
        super(MyEngine, self).__init__()
        parser = self.args_parser
        parser.add_argument('--nxc_build_file', help='Path to the NXC build file')
        parser.add_argument('--flavour', help='Flavour to deploy')
        parser.add_argument('--nb_nodes', help='Number of nodes')
        parser.add_argument('--result_dir', help='path to store the results')
        self.nodes = {}
        self.oar_job_id = -1
        # --- Where and how many nodes ----
        self.nb_nodes = 2
        self.site = "grenoble"
        self.cluster = "dahu"

    def init(self):
        # We might have more than two nodes
        self.nb_nodes = int(args.nb_nodes)
        assert self.nb_nodes > 2, "I need at least two nodes"

        # --- Reservation ----
        self.oar_job_id, site = oarsub([(OarSubmission(f"{{cluster='{self.cluster}'}}/nodes={self.nb_nodes}", 15*60, job_type=["deploy"] if self.args.flavour == "g5k-image" else ["allow_classic_ssh"]), self.site)])[0]
        wait_oar_job_start(self.oar_job_id, site) # wait for the job to start, otherwise we might get a timeout in the `get_oar_job_nodes_nxc`

        # --- How many nodes per role ---
        # We want one server and all the other nodes are `node`
        roles_quantities = {"server": ["server"], "node": [f"node{i}" for i in range(1, self.nb_nodes)]}

        # --- Deploy and populate the dict `self.nodes` accordingly ---
        self.nodes = get_oar_job_nodes_nxc(
            self.oar_job_id,
            site,
            flavour_name=self.flavour,
            compose_info_file=nxc_build_file,
            roles_quantities=roles_quantities)

    def run(self):
        result_dir = self.args.result_dir
        result_file = f"{result_dir}/results_ior_{self.nb_nodes}_nodes_{self.flavour}_flavour_{self.oar_job_id}"

        run_ior_remote = Remote(f"start_ior", self.nodes["node"][0], connection_params={'user': 'root'})
        run_ior_remote.run()
        get_file_remote = Remote(f"cp /srv/shared/results_ior.json {result_file}", self.nodes["server"], connection_params={'user': 'root'})
        get_file_remote.run()

        oardel([(self.oar_job_id, self.site)])

if __name__ == "__main__":
    NXCEngine().start()

Packaging the MADbench2 benchmark

TODO: create a repo with a make make install setup for MB2

Nix Expression

{ stdenv, openmpi }:

stdenv.mkDerivation {
  name = "MADbench2";
  src = ./.;
  buildInputs = [ openmpi ];
  installPhase = ''
    mkdir -p $out/bin
    mpicc -D SYSTEM -D COLUMBIA -D IO -o MADbench2.x MADbench2.c -lm
    mv MADbench2.x $out/bin
  '';
}

Questions:

  • do we make them add to a flake this and import it later?

Try to run the benchmark locally

First, enter a nix shell with the packaged MADbench2 and openmpi.

nix shell .#MADbench2 nixpkgs#openmpi

Once in the shell, you can run the benchmark with:

mpirun -np 4 MADbench2.x 640 8 1 8 8 1 1

It should output the following:

MADbench 2.0 IO-mode
no_pe = 4  no_pix = 640  no_bin = 8  no_gang = 1  sblocksize = 8  fblocksize = 8  r_mod = 1  w_mod = 1
IOMETHOD = POSIX  IOMODE = SYNC  FILETYPE = UNIQUE  REMAP = CUSTOM

S_cc         0.00   [      0.00:      0.00]
S_bw         0.00   [      0.00:      0.00]
S_w          0.01   [      0.01:      0.01]
          -------
S_total      0.01   [      0.01:      0.01]

W_cc         0.00   [      0.00:      0.00]
W_bw         0.12   [      0.12:      0.12]
W_r          0.00   [      0.00:      0.00]
W_w          0.00   [      0.00:      0.00]
          -------
W_total      0.13   [      0.13:      0.13]

C_cc         0.00   [      0.00:      0.00]
C_bw         0.00   [      0.00:      0.00]
C_r          0.00   [      0.00:      0.00]
          -------
C_total      0.00   [      0.00:      0.00]


dC[0] = 0.00000e+00

What interest us in this tutorial is the total time spent writing: W_total.

Create your first composition

Before jumping into the creation of an evironment for the IOR benchmark, let us go through a simpler example.

In the next sections, we will present the notions and commands of NixOS Compose.

Making MADbench2 available

Option 1: You have a flake reprository with MADbench2 packaged

In the flake.nix file, add your reprository as an input:

{
  description = "nixos-compose - basic setup";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/21.05";
    nxc.url = "git+https://gitlab.inria.fr/nixos-compose/nixos-compose.git";
    mypkgs.url = "URL TO YOUR PKGS";
  };
  outputs = { self, nixpkgs, nxc, mypkgs }:
  {
    # ...
  };
}

Then create an overlay with adding the benchmark:

{
  # ...

  outputs = { self, nixpkgs, nxc, mypkgs }:
    let
      system = "x86_64-linux";
      myOverlay = final: prev:
        {
          mb2 = mypkgs.packages.${system}.MADbench2;
        };
    in {
      packages.${system} = nxc.lib.compose {
        inherit nixpkgs system;
        overlays = [ myOverlay ];
        composition = ./composition.nix;
        # setup = ./setup.toml;
      };

      # ...
};

Option 2: You have a local file packaging MADbench2

In this case, you do not have to add another input, and you can simply call the Nix file packaging MADbench2 (MADbench2.nix in the example below).

Similarly, we create an overlay with the package to add.

{
  description = "nixos-compose - basic setup";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/22.05";
    nxc.url = "git+https://gitlab.inria.fr/nixos-compose/nixos-compose.git";
  };

  outputs = { self, nixpkgs, nxc }:
    let
      system = "x86_64-linux";
      myOverlay = final: prev:
        {
          mb2 = prev.callPackage ./MADbench2 { };
        };
    in {
      packages.${system} = nxc.lib.compose {
        inherit nixpkgs system;
        overlays = [ myOverlay ];
        composition = ./composition.nix;
        setup = ./setup.toml;
      };

      defaultPackage.${system} =
        self.packages.${system}."composition::nixos-test";

      devShell.${system} = nxc.devShells.${system}.nxcShellFull;
    };
}

Adding MADbench2 to the composition

Once MADbench2 has been added to the flake.nix file, we can access the package inside the composition. In our case, we want an environment with both MADbench2 and openmpi to run it.

{ pkgs, ... }: {
  roles = {
    foo = { pkgs, ... }:
      {
        # add needed package
        environment.systemPackages = with pkgs; [ openmpi MADbench2 ];
      };
  };
  testScript = ''
    foo.succeed("true")
  '';
}

Run the benchmark in the environment

As done previously, we have to go through the nxc build and nxc start phases.

Once the environment deployed, we connect with nxc connect

We can now try to run the benchmark.

mpirun -np 4 MADbench2.x 640 8 1 8 8 1 1

Unfortunalety this will fail for several reasons:

  • we are running as root

  • there are not enough slots to start 4 processes

As we are setting up the environment for now, we can disable these warnings by adding some flags:

mpirun --allow-run-as-root --oversubscribe -np 4 MADbench2.x 640 8 1 8 8 1 1