I recently improved my strategy for organizing code and data for simulations run at NERSC, I’ll write it here for reference.
I mostly use Python (often with C/C++ extensions), so I first rely on the Anaconda
module maintained by NERSC, currently
If I need to add many more packages I can create a conda environment, but for just installing
1 or 2 packages I prefer to just add them to my
I have core libraries that I rely on and often modify to run my simulations,
those should be installed on Global Common Software:
which is specifically designed to access small files like Python packages.
I generally create a subfolder and reference it with an environment variable:
Then I create a
env.sh script in the source folder of the package (in Global Home) that loads
module load python/3.6-anaconda-4.4 export PREFIX=/global/common/software/projectname/zonca/python_prefix export PATH=$PREFIX/bin:$PATH export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH export PYTHONPATH=$PREFIX/lib/python3.6/site-packages:$PYTHONPATH
This environment is automatically propagated to the computing nodes when I submit a SLURM script, therefore I do not add any of these environment details to my SLURM scripts.
Then I can install a package there with:
python setup.py install --prefix=$PREFIX
or from pip:
pip install apackage --prefix=$PREFIX
It is also common to install a newer version of a package which is already provided by the base environment:
pip install apackage --ignore-installed --upgrade --no-deps --prefix=$PREFIX
Simulations SLURM scripts and configuration files
I first create a repository on Github for my simulations and clone it to my home folder at NERSC. I generally create a repository for each experiment, then I create a subfolder for each type of simulation I am working on.
Inside a folder I create parameters files to configure my run and slurm scripts to launch the simulations and put everything under version control immediately, I often create a Pull Request on Github and ask my collaborators to cross-check the configuration before a submit a run.
Smaller input data files, even binaries, can be added for convenience to the Github repository.
Once a run has been validated, inside the simulation type folder I createa a subfolder
README.md, this will include all the details about the simulation.
I also tag both the core library I depend on and the simulation repository with the same name e.g.:
git tag -a 201806_details_about_run -m "software version used for 201806_details_about_run"
I’ll also add the path at NERSC of the input data and output results.
Then for future simulations I’ll keep modifying the SLURM scripts and parameter files but always have a reference to each previous version.
Larger input data and output data
Larger input data and outputs are not suitable for version control and should live in a SCRATCH filesystem.
I always use the Global Scratch
$CSCRATCH which is available both on Edison on Cori and also
from the Jupyter Notebook environment at: https://jupyter.nersc.gov.
I create a root folder for the project at:
Then a subfolder for each simulation type:
Then I symlink those inside the simulation repository as the folder
cd $HOME/projectname/simulation_type_1 ln -s $CSCRATCH/projectname/simulation_type_1 out
Therefore I can setup my simulation software to save all results inside
and this is going to be written to
This setup makes it very convenient to regularly backup everything to tape using
cput which just backs up
files that are not already on tape, e.g.:
cd $CSCRATCH hsi cput -R projectname
This is going to synchronize the backup on tape with the latest results on
I do the same for input files:
mkdir $CSCRATCH/projectname/input_simulation_type_1 cd $HOME/projectname/simulation_type_1 ln -s $CSCRATCH/projectname/input_simulation_type_1 input