Skip to content
Open
Changes from 1 commit
Commits
Show all changes
159 commits
Select commit Hold shift + click to select a range
b9ee40e
Initial commit, already contains the structure to work on the first t…
klust Mar 7, 2022
183ef3b
Updated the "What is EasyBuild" page of the tutorial for LUMI."
klust Mar 7, 2022
6541e8c
Updated the structure, initial updates of terminology and a new page …
klust Mar 9, 2022
b2a55cd
Additions to the Lmod section and some restructuring, and better info…
klust Mar 10, 2022
1692a80
Corrections to the README file.
klust Mar 10, 2022
82dbfae
Corrections to the overview for part I.
klust Mar 10, 2022
29d5ba5
Further work on the Lmod section.
klust Mar 11, 2022
2eae491
Finished the Lmod section.
klust Mar 14, 2022
8ca50b8
Continued merging of the new CSC tutorial in the structure of the rep…
klust Mar 14, 2022
71966b5
Tutorial page about the Cray PE, and correction of a typo.
klust Mar 14, 2022
91e7c86
Corrected a typo in mkdocs.yml
klust Mar 16, 2022
766a0f2
Adding an example of implementing a hierarchy to the Lmod section of …
klust Mar 16, 2022
70889a5
More information about hierarchy in the Cray PE, and definition of to…
klust Mar 21, 2022
e101d7b
Proposed change to mkdocs.yml to support mermaid.js graphs.
klust Mar 21, 2022
df081ae
Add a mermaid.js diagram with the toolchain hierarchy.
klust Mar 21, 2022
0fd57c5
Finished terminology session, extended the installation section with …
klust Mar 21, 2022
2aee856
Tutorial on EasyBuild configuration added, and some restructuring.
klust Mar 21, 2022
53d564d
Restructuring continued.
klust Mar 21, 2022
fea45d2
Finished reworking the basic usage section, except for the exercises.
klust Mar 30, 2022
c844b96
Additions to the LUMI software stack page of the tutorial.
klust Mar 30, 2022
cbdb67e
Integrating parts of the old tutorial into the new one, texts still n…
klust Mar 30, 2022
8d6b6f3
Troubleshooting section reworked for LUMI.
klust Mar 30, 2022
51c7e3d
Section on creating easyconfig files.
klust Apr 1, 2022
c14df7a
Adding in an additional section about external modules.
klust Apr 1, 2022
f89ee1f
Adding in a section taken from a previous tutorial and integrating so…
klust Apr 1, 2022
28741b4
Implementing EasyBlocks adapted for LUMI.
klust Apr 5, 2022
5f5898e
Correction of typos.
klust Apr 5, 2022
bbe836e
Using EasyBuild as a library corrected for LUMI.
klust Apr 5, 2022
272f2fe
Section about hooks extended with references to additional examples.
klust Apr 5, 2022
3234fd9
Slurm job submission from EasyBuild a bit reworked.
klust Apr 6, 2022
ad2a7f5
Some explanation in the overview of part 3
klust Apr 6, 2022
4f1dcab
GitHub integration section, mostly old text and not all suited for LUMI.
klust Apr 6, 2022
83d63d8
Adapt the structure and include an additional reading section.
klust Apr 6, 2022
d1aee2d
Additional reading section.
klust Apr 6, 2022
3cf8a3c
Correcting a number of links.
klust Apr 6, 2022
7b1f470
Restructuring for nicer navigation bar on the left.
klust Apr 7, 2022
724e059
Link corrections.
klust Apr 7, 2022
64775be
Correction of links.
klust Apr 7, 2022
d915944
Corrected a number of spelling mistakes.
klust Apr 8, 2022
22772b6
Removed a new line from a module file as it caused problems with the …
klust Apr 8, 2022
6ea43a8
Removed some TODOs to complete the tutorial.
klust Apr 21, 2022
d1f6c7b
Corrected and updated several links.
klust Apr 21, 2022
8e0353a
Various corrections (typos etc.) and minor additions.
klust May 9, 2022
dc0895f
Reworked the exercises of the troubleshooting section.
klust May 9, 2022
a9e0d3e
Reworked the exercises for the basic usage section.
klust May 9, 2022
d851ae1
Multiple minor corrections.
klust May 10, 2022
8f29ae4
Corrected two typos.
klust May 10, 2022
8f4dd15
Corrections to the example
klust May 11, 2022
430f584
Removed an unnecessary accent.
klust May 11, 2022
10a3be8
Corrected wrong termination of code block.
klust May 11, 2022
728e972
Initial commit, already contains the structure to work on the first t…
klust Mar 7, 2022
f20177b
Updated the "What is EasyBuild" page of the tutorial for LUMI."
klust Mar 7, 2022
df7f0f0
Updated the structure, initial updates of terminology and a new page …
klust Mar 9, 2022
d873a20
Additions to the Lmod section and some restructuring, and better info…
klust Mar 10, 2022
78d54a5
Corrections to the README file.
klust Mar 10, 2022
48d6d02
Corrections to the overview for part I.
klust Mar 10, 2022
04b642e
Further work on the Lmod section.
klust Mar 11, 2022
71b0dcb
Finished the Lmod section.
klust Mar 14, 2022
46fe516
Continued merging of the new CSC tutorial in the structure of the rep…
klust Mar 14, 2022
ec860bd
Tutorial page about the Cray PE, and correction of a typo.
klust Mar 14, 2022
8764aa3
Adding an example of implementing a hierarchy to the Lmod section of …
klust Mar 16, 2022
e24aafa
More information about hierarchy in the Cray PE, and definition of to…
klust Mar 21, 2022
4d47bb7
Add a mermaid.js diagram with the toolchain hierarchy.
klust Mar 21, 2022
dac3c94
Finished terminology session, extended the installation section with …
klust Mar 21, 2022
99a46fe
Tutorial on EasyBuild configuration added, and some restructuring.
klust Mar 21, 2022
6a59f94
Restructuring continued.
klust Mar 21, 2022
e277aa5
Finished reworking the basic usage section, except for the exercises.
klust Mar 30, 2022
a8c9b04
Additions to the LUMI software stack page of the tutorial.
klust Mar 30, 2022
7570644
Integrating parts of the old tutorial into the new one, texts still n…
klust Mar 30, 2022
abfbf61
Troubleshooting section reworked for LUMI.
klust Mar 30, 2022
5881892
Section on creating easyconfig files.
klust Apr 1, 2022
0ba449e
Adding in an additional section about external modules.
klust Apr 1, 2022
a778c98
Adding in a section taken from a previous tutorial and integrating so…
klust Apr 1, 2022
ca73ca7
Implementing EasyBlocks adapted for LUMI.
klust Apr 5, 2022
bdc3324
Correction of typos.
klust Apr 5, 2022
87138ea
Using EasyBuild as a library corrected for LUMI.
klust Apr 5, 2022
492683e
Section about hooks extended with references to additional examples.
klust Apr 5, 2022
8e734c5
Slurm job submission from EasyBuild a bit reworked.
klust Apr 6, 2022
aff9979
Some explanation in the overview of part 3
klust Apr 6, 2022
855101c
GitHub integration section, mostly old text and not all suited for LUMI.
klust Apr 6, 2022
89e01df
Adapt the structure and include an additional reading section.
klust Apr 6, 2022
ed863b9
Correcting a number of links.
klust Apr 6, 2022
557ba61
Restructuring for nicer navigation bar on the left.
klust Apr 7, 2022
df0d8e5
Some improvments to the module naming schemes section based on the IS…
klust May 19, 2022
36934df
Several minor corrections, including a new least of EasyBuild communi…
klust Jun 2, 2022
f62ed31
Minor corrections to the Lmod section.
klust Jun 3, 2022
28c00ba
Updated the "What is EasyBuild" page of the tutorial for LUMI."
klust Mar 7, 2022
2086896
Updated the structure, initial updates of terminology and a new page …
klust Mar 9, 2022
c4d368f
Additions to the Lmod section and some restructuring, and better info…
klust Mar 10, 2022
861690e
Corrections to the overview for part I.
klust Mar 10, 2022
cd69082
Further work on the Lmod section.
klust Mar 11, 2022
0f401f5
Finished the Lmod section.
klust Mar 14, 2022
290a375
Continued merging of the new CSC tutorial in the structure of the rep…
klust Mar 14, 2022
744d0a7
Tutorial page about the Cray PE, and correction of a typo.
klust Mar 14, 2022
8496f45
Corrected a typo in mkdocs.yml
klust Mar 16, 2022
d5b2647
Adding an example of implementing a hierarchy to the Lmod section of …
klust Mar 16, 2022
7eda98d
More information about hierarchy in the Cray PE, and definition of to…
klust Mar 21, 2022
e1bde80
Add a mermaid.js diagram with the toolchain hierarchy.
klust Mar 21, 2022
a0f4141
Finished terminology session, extended the installation section with …
klust Mar 21, 2022
6338bb1
Tutorial on EasyBuild configuration added, and some restructuring.
klust Mar 21, 2022
42d40a4
Restructuring continued.
klust Mar 21, 2022
863b933
Finished reworking the basic usage section, except for the exercises.
klust Mar 30, 2022
4a51424
Additions to the LUMI software stack page of the tutorial.
klust Mar 30, 2022
20f05ed
Integrating parts of the old tutorial into the new one, texts still n…
klust Mar 30, 2022
158c78d
Troubleshooting section reworked for LUMI.
klust Mar 30, 2022
202a53b
Section on creating easyconfig files.
klust Apr 1, 2022
71e04e4
Adding in an additional section about external modules.
klust Apr 1, 2022
da4ad40
Adding in a section taken from a previous tutorial and integrating so…
klust Apr 1, 2022
fa09f7d
Implementing EasyBlocks adapted for LUMI.
klust Apr 5, 2022
1c6894a
Correction of typos.
klust Apr 5, 2022
81274e3
Using EasyBuild as a library corrected for LUMI.
klust Apr 5, 2022
4e5ac8f
Section about hooks extended with references to additional examples.
klust Apr 5, 2022
db5cdec
Slurm job submission from EasyBuild a bit reworked.
klust Apr 6, 2022
cc30e75
Some explanation in the overview of part 3
klust Apr 6, 2022
7a9f3c7
GitHub integration section, mostly old text and not all suited for LUMI.
klust Apr 6, 2022
9a01408
Adapt the structure and include an additional reading section.
klust Apr 6, 2022
fe44f87
Correcting a number of links.
klust Apr 6, 2022
064afbe
Restructuring for nicer navigation bar on the left.
klust Apr 7, 2022
7145870
Correction of links.
klust Apr 7, 2022
4a7d65d
Multiple minor corrections.
klust May 10, 2022
6b0ebef
Initial commit, already contains the structure to work on the first t…
klust Mar 7, 2022
0445175
Updated the "What is EasyBuild" page of the tutorial for LUMI."
klust Mar 7, 2022
e438194
Updated the structure, initial updates of terminology and a new page …
klust Mar 9, 2022
7ec3ad4
Additions to the Lmod section and some restructuring, and better info…
klust Mar 10, 2022
feae23b
Corrections to the overview for part I.
klust Mar 10, 2022
6513998
Further work on the Lmod section.
klust Mar 11, 2022
f41da00
Finished the Lmod section.
klust Mar 14, 2022
5415f72
Tutorial page about the Cray PE, and correction of a typo.
klust Mar 14, 2022
0f19843
Adding an example of implementing a hierarchy to the Lmod section of …
klust Mar 16, 2022
70f23f1
More information about hierarchy in the Cray PE, and definition of to…
klust Mar 21, 2022
ad10f45
Add a mermaid.js diagram with the toolchain hierarchy.
klust Mar 21, 2022
36439fc
Finished terminology session, extended the installation section with …
klust Mar 21, 2022
0e2a846
Tutorial on EasyBuild configuration added, and some restructuring.
klust Mar 21, 2022
b5ea26d
Restructuring continued.
klust Mar 21, 2022
555afb7
Finished reworking the basic usage section, except for the exercises.
klust Mar 30, 2022
d24d31b
Additions to the LUMI software stack page of the tutorial.
klust Mar 30, 2022
bbec7d0
Integrating parts of the old tutorial into the new one, texts still n…
klust Mar 30, 2022
6dbfb52
Troubleshooting section reworked for LUMI.
klust Mar 30, 2022
7f4a8a7
Section on creating easyconfig files.
klust Apr 1, 2022
1d27441
Adding in an additional section about external modules.
klust Apr 1, 2022
4fd6638
Adding in a section taken from a previous tutorial and integrating so…
klust Apr 1, 2022
0b02bd1
Implementing EasyBlocks adapted for LUMI.
klust Apr 5, 2022
cca0812
Correction of typos.
klust Apr 5, 2022
ed39db4
Using EasyBuild as a library corrected for LUMI.
klust Apr 5, 2022
1a1c143
Section about hooks extended with references to additional examples.
klust Apr 5, 2022
5da4db5
Slurm job submission from EasyBuild a bit reworked.
klust Apr 6, 2022
ad1142f
Some explanation in the overview of part 3
klust Apr 6, 2022
7f742b9
GitHub integration section, mostly old text and not all suited for LUMI.
klust Apr 6, 2022
10280f8
Adapt the structure and include an additional reading section.
klust Apr 6, 2022
bf1dd5e
Additional reading section.
klust Apr 6, 2022
50a3aa1
Correcting a number of links.
klust Apr 6, 2022
5119b5e
Restructuring for nicer navigation bar on the left.
klust Apr 7, 2022
806738e
Correction of links.
klust Apr 7, 2022
a1b93c9
Corrected a number of spelling mistakes.
klust Apr 8, 2022
8492956
Corrected and updated several links.
klust Apr 21, 2022
5bb56bd
Multiple minor corrections.
klust May 10, 2022
aabc649
Updated the first page of the CSC course to integrate with the regula…
klust Nov 4, 2022
d897f35
Corrected two typos.
klust Nov 4, 2022
f5bf0a8
Correction after rebase.
klust May 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Slurm job submission from EasyBuild a bit reworked.
  • Loading branch information
klust committed May 3, 2023
commit db5cdec7a27204fc3e19ad227793691d907ee73b
109 changes: 69 additions & 40 deletions docs/2022-CSC_and_LO/3_03_slurm_jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,30 @@

EasyBuild can submit jobs to different backends including Slurm to install software,
to *distribute* the often time-consuming installation of a set of software applications and
the dependencies they require to a cluster.
the dependencies they require to a cluster. Each individual package is installed in a separate
job and job dependencies are used to manage the dependencies between package so that no build
is started before the dependencies are in place.

This is done via the ``--job`` command line option.

It is important to be aware of some details before you start using this, which we'll cover here.

!!! Warning "This section is not supported on LUMI, use at your own risk"

EasyBuild on LUMI is currently not fully configured to support job submission via Slurm. Several
changes would be needed to the configuration of EasyBuild, including the location of the
temporary files and build directory. Those have to be made by hand.

Due to the setup of the central software stack, this feature is currently useless to install
the central stack. For user installations, there are also limitations as the enviornment
on the compute nodes is different from the login nodes so, e.g., different locations for
temporary files are being used. These would only be refreshed if the EasyBuild configuration
modules are reloaded on the compute nodes which cannot be done currently in the way Slurm
job submission is set up in EasyBuild.

Use material in this section with care; it has not been completely tested.


## Configuration

The EasyBuild configuration that is active at the time that ``eb --job`` is used
Expand All @@ -25,6 +43,8 @@ that are specified via an [EasyBuild configuration file](configuration.md#config
This implies that any EasyBuild configuration files or ``$EASYBUILD_*`` environment variables
that are in place in the job environment are most likely *irrelevant*, since configuration settings
they specify they will most likely be overruled by the corresponding command line options.
It does also imply however that the EasyBuild configuration that is in place when ``eb --job`` is used
does also work on the compute nodes to which the job is submitted.


## Using ``eb --job``
Expand All @@ -39,6 +59,9 @@ to ``Slurm``, for example by setting the corresponding environment variable:
export EASYBUILD_JOB_BACKEND='Slurm'
```

On LUMI this is taken care of in the EasyBuild configuration modules such as ``EasyBuild-user``.


### Job resources

To submit an installation as a job, simply use ``eb --job``:
Expand Down Expand Up @@ -73,13 +96,13 @@ For example, to specify a particular account that should be used for the jobs su
(equivalent with using the ``-A`` or ``--account`` command line option for ``sbatch``):

```shell
export SBATCH_ACCOUNT='example_project'
export SBATCH_ACCOUNT='project_XXXXXXXXX'
```

Or to submit to a particular Slurm partition (equivalent with the ``-p`` or ``--partition`` option for ``sbatch``):

```shell
export SBATCH_PARTITION='example_partition'
export SBATCH_PARTITION='small'
```

For more information about supported ``$SBATCH_*`` environment variables,
Expand Down Expand Up @@ -113,24 +136,29 @@ as jobs, to avoid that they fail almost instantly due to a lack of disk space.
Keep in mind that the active EasyBuild configuration is passed down into the submitted jobs,
so any configuration that is present on the workernodes may not have any effect.

For example, if you commonly use `/tmp/$USER` for build directories on a login node,
you may need to tweak that when submitting jobs to use a different location:
For example, on LUMI it is possible to use ``$XDG_RUNTIME_DIR`` on the login nodes which has
the advantage that any leftovers of failed builds will be cleaned up when the user ends their last
login session on that node, but it is not possible to do so on the compute nodes.

```shell
# EasByuild is configured to use /tmp/$USER on the login node
login01 $ eb --show-config | grep buildpath
buildpath (E) = /tmp/example
uan01 $ eb --show-config | grep buildpath
buildpath (E) = /run/user/XXXXXXXX/easybuild/build

# use /localdisk/$USER for build directories when submitting installations as jobs
login01 $ eb --job --buildpath /localdisk/$USER example.eb --robot
# use /dev/shm/$USER for build directories when submitting installations as jobs
login01 $ eb --job --buildpath /dev/shm/$USER/easybuild example.eb --robot
```


### Temporary log files and build directories

The temporary log file that EasyBuild creates is most likely going to end up on the local disk
of the workernode on which the job was started (by default in `$TMPDIR` or `/tmp`).
If an installation fails, the job will finish and temporary files will likely be cleaned up instantly,
which may leave you wondering about the actual cause of the failing installation...
The problems for the temporary log files are twofold. First, they may end up in a place
that is not available on the compute nodes. E.g., for the same reasons as for the build
path, the LUMI EasyBuild configuration will place the temporary files in a subdirectory of
``$XDG_RUNTIME_DIR`` on the loginnodes but a subdirectory of ``/dev/shm/$USER`` on the
compute nodes. The second problem however is that if an installation fails, those log files are
not even accessible anymore which may leave you wondering about the actual cause of the failing
installation...

To remedy this, there are a couple of EasyBuild configuration options you can use:

Expand All @@ -139,18 +167,21 @@ To remedy this, there are a couple of EasyBuild configuration options you can us
```shell
$ eb --job example.eb --tmp-logdir $HOME/eb_tmplogs
```
This will move at least the log file to a suitable place.

* If you prefer having the entire log file stored in the Slurm job output files,
you can use ``--logtostdout`` when submitting the jobs. This will result in extensive logging
to your terminal window when submitting the jobs, but it will also make EasyBuild
log to ``stdout`` when the installation is running in the job, and hence the log messages will be
captured in the job output files.

The same remark applies to build directories: they should be on a local filesystem (to avoid problems
that often occur when building software on a parallel filesystem like GPFS or Lustre),
which will probably be cleaned up automatically when a job fails. Here it is less easy to provide
general advice on how to deal with this, but one thing you can consider is retrying the installation
in an interactive job, so you can inspect the build directory after the installation fails.
The build directory of course also suffers from the problem of being no longer accessible if the
installation fails, but there it is not so easy to find a solution. Building on a shared file system
is not only much slower, but in particular on parallel file systems like GPFS/SpectrumScale, Lustre
or BeeGFS buiding sometimes fails in strange ways. One thing you can consider if you cannot do the
build on a login node (e.g., because the code is not suitable for cross-compiling or the configure
system does tests that would fail on the login node), is to rety the installation in an
interactive job, so you can inspect the build directory after the installation fails.

### Lock files

Expand All @@ -171,37 +202,37 @@ subdirectory of ``installpath``) manually, or re-submit the job with ``eb --job

As an example, we will let EasyBuild submit jobs to install ``AUGUSTUS`` with the ``foss/2020b`` toolchain.

!!! Warning "This example does not work on LUMI"

Note that this is an example using the FOSS common toolchain. For this reason it does not work on
LUMI.

### Configuration

Before using ``--job``, let's make sure that EasyBuild is properly configured:

```shell
# use $HOME/easybuild for software, modules, sources, etc.
export EASYBUILD_PREFIX=$HOME/easybuild
# Load the EasyBuild-user module (central installations will not work at all
# using job submission)
module load LUMI/21.12
module load partition/C
module load EasyBuild-user

# use ramdisk for build directories
export EASYBUILD_BUILDPATH=/dev/shm/$USER
export EASYBUILD_BUILDPATH=/dev/shm/$USER/build
export EASYBUILD_TMPDIR=/dev/shm/$USER/tmp

# use Slurm as job backend
export EASYBUILD_JOB_BACKEND=Slurm
```

In addition, add the path to the centrally installed software to ``$MODULEPATH`` via ``module use``:

```shell
module use /easybuild/modules/all
```

Load the EasyBuild module:

```shell
module load EasyBuild
```

Let's assume that we also need to inform Slurm that jobs should be submitted into a particular account:
We will also need to inform Slurm that jobs should be submitted into a particular account, and
in a particular partition:

```shell
export SBATCH_ACCOUNT=example_project
export SBATCH_ACCOUNT=project_XXXXXXXXX
export SBATCH_PARTITION='small'
```

This will be picked up by the ``sbatch`` commands that EasyBuild will run to submit the software installation jobs.
Expand Down Expand Up @@ -234,14 +265,14 @@ $ eb AUGUSTUS-3.4.0-foss-2020b.eb --missing
Several dependencies are not installed yet, so we will need to use ``--robot`` to ensure that
EasyBuild also submits jobs to install these first.

To speed up the installations a bit, we will request 10 cores for each submitted job (via ``--job-cores``).
To speed up the installations a bit, we will request 8 cores for each submitted job (via ``--job-cores``).
That should be sufficient to let each installation finish in (well) under 1 hour,
so we only request 1 hour of walltime per job (via ``--job-max-walltime``).

In order to have some meaningful job output files, we also enable trace mode (via ``--trace``).

```
$ eb AUGUSTUS-3.4.0-foss-2020b.eb --job --job-cores 10 --job-max-walltime 1 --robot --trace
$ eb AUGUSTUS-3.4.0-foss-2020b.eb --job --job-cores 8 --job-max-walltime 1 --robot --trace
...
== resolving dependencies ...
...
Expand Down Expand Up @@ -278,7 +309,7 @@ these jobs will be able to start.
After about 20 minutes, AUGUSTUS and all missing dependencies should be installed:

```
$ ls -lrt $HOME/easybuild/modules/all/*/*.lua | tail -11
$ ls -lrt $HOME/EasyBuild/modules/.../*.lua | tail -11
-rw-rw----. 1 example example 1634 Mar 29 10:13 /users/example/easybuild/modules/all/HTSlib/1.11-GCC-10.2.0.lua
-rw-rw----. 1 example example 1792 Mar 29 10:13 /users/example/easybuild/modules/all/SAMtools/1.11-GCC-10.2.0.lua
-rw-rw----. 1 example example 1147 Mar 29 10:13 /users/example/easybuild/modules/all/BamTools/2.5.1-GCC-10.2.0.lua
Expand All @@ -291,11 +322,9 @@ $ ls -lrt $HOME/easybuild/modules/all/*/*.lua | tail -11
-rw-rw----. 1 example example 1365 Mar 29 10:28 /users/example/easybuild/modules/all/SuiteSparse/5.8.1-foss-2020b-METIS-5.1.0.lua
-rw-rw----. 1 example example 2233 Mar 29 10:30 /users/example/easybuild/modules/all/AUGUSTUS/3.4.0-foss-2020b.lua

$ module use $HOME/easybuild/modules/all

$ module avail AUGUSTUS

-------- /users/hkenneth/easybuild/modules/all --------
-- EasyBuild managed user software for software stack ... --
AUGUSTUS/3.4.0-foss-2020b
```

Expand Down