How to run CESM 1.0.6 on ARCHER
CESM 1.0.6 User Guide
The User Guide can be found at http://www.cesm.ucar.edu/models/cesm1.0/cesm/
NB: The PDF version is useful for searching the entire guide, however the PDF version gives no clue as to what text may be presented as links in the HTML version, so it is recommended to use both.
Installing CESM 1.0.6 on ARCHER
Download CESM 1.0.6 into your /work directory using svn. You'll need to run
module load svn
and then by following the instructions on http://www.cesm.ucar.edu/models/cesm1.0/tags/index.html#CESM1_0_6.
Once downloaded, add the following 3 files into the 'scripts' directory.
NB then rename 106_mkbatch.archer to mkbatch.archer (e.g. remove the first 4 characters).
In the same directory, namely scripts/ccsm_utils/Machines, replace the existing config_machines.xml file with the following version (this version is identical to the release version; however, an entry for 'archer' has been added)
NB these files produce a debug version for ARCHER, at present, and have passed the 11 test cases described in step 1 of http://www.cesm.ucar.edu/models/cesm1.0/cesm/cesm_doc_1_0_4/x2333.html
All users must edit the config_machines.xml file to reflect their local installation. Specifically, all lines which include the string 'gavin2' must be changed, at least.
Before building CESM
Note, before building CESM, the input and output directories must exist. (These directories are sometimes referred to as the "temporary archives" in the CESM User Guides.) An input directory exists on ARCHER, and resides in a space which any ARCHER user can read and but not write to. This input directory contains large and popular input data files. NB both input and output directories must be created by hand by each user in their own work directory, e.g.
mkdir /work/ecse0116/ecse0116/gavin2/CESM1.0/output
These directories are then referenced in config_Machines.xml, e.g.
DIN_LOC_ROOT_CSMDATA="/work/ecse0116/shared/CESM1.0/inputdata"DIN_LOC_ROOT_CLMQIAN="/work/ecse0116/shared/CESM1.0/inputdata/atm/datm7/atm_forcing.datm7.Qian.T62.c080727"DOUT_S_ROOT="/work/ecse0116/ecse0116/gavin2/CESM1.0/output/$CASE"
Building the cprnc tool
Finally, one must build, by hand, the cprnc tool.
To make the cprnc tool, first upload the following Makefile.archer (20 Aug 2014) to your cprnc directory, which will resemble:
/work/ecse0116/ecse0116/gavin2/CESM1.0/models/atm/cam/tools/cprnc
Once uploaded, run the following commands to make the cprnc tool:
module switch PrgEnv-cray PrgEnv-intel
module load netcdf
make clean -f Makefile.archer
make -f Makefile.archer
Once built, the config_machines.xml file must be updated with the full path and name of your new executable, e.g.
CCSM_CPRNC="/work/ecse0116/ecse0116/gavin/CESM1.0/models/atm/cam/tools/cprnc/cprnc"
NB ensure your current directory is included in your INCLUDE path
Building CESM
Firstly, change directory to the scripts directory in the 'work' installation of CESM, e.g.
cd /work/ecse0116/ecse0116/gavin/CESM1.0/scripts
Then issue the following three commands:
module load svn
module switch PrgEnv-cray PrgEnv-intel
module load netcdf
The first enables any missing input/restart files to be downloaded during the building process.
The second switches to the intel programming environment.
And, finally, the third loads the netcdf library. Due to this specific order, it is the intel netcdf library that is loaded.
Building tests
The process of building the CESM tests is slightly different from building simulations.
For the test ERS.f19_g16.X, say, issue the following commands in the 'scripts' directory.
./create_test -testname ERS.f19_g16.X.archer -testid t01
cd ERS.f19_g16.X.archer.t01
./ERS.f19_g16.X.archer.t01.build
At present, the output of these processes contain multiple instances of the following string. NB this 'error' can safely be ignored.
ModuleCmd_Switch.c(172):ERROR:152: Module 'PrgEnv-cray' is currently not loaded
Running the test
To run the test, run the following command:
qsub ERS.f19_g16.X.archer.t01.run
Building your simulation
The code is built, from scratch, for each simulation the user wants to run.
Consider, say, the following Atmospheric model (2 degree): 1.9x2.5_1.9x2.5 f19_f19 F_AMIP_CAM5.
This is configured, build and submitted, for a case called, my_first_sim, say, using the following commands:
./create_newcase -case my_first_sim -res f19_f19 -compset F_AMIP_CAM5 -mach archer
cd my_first_sim
./configure -case
./my_first_sim.build
Consider the create_newcase command: the -case flag assigns a local name to be given. Here I have used 'my_first_sim'; the -res flag assigns the mesh resolution; the -compset flag assigns the computation set of codes to employ; finally, the -mach flag assigns the name of the platform; in this case 'archer'.
Consider the build command: if the input/restart files are not present, then the build command downloads the necessary files. As such, this command can take over an hour. Further, if the build fails with an error which references the /tmp directory, simply run the build command again as it is likely the system was very busy and the build command temporarily ran out of memory.
Before running the simulation, users should check both the your_name_for_this.archer.run file and the env_run.xml file, as the default values produce only a short run.
Running your simulation
To run the simulation, run the following command:
qsub my_first_sim.run
How to change from the default settings
Before building
Changing the number of cores
Changing the number of cores to 128, say
cd $CASE
NTASKS=128
./xmlchange -file env_mach_pes.xml -id NTASKS_ATM -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_LND -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_ICE -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_OCN -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_CPL -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_GLC -val $NTASKS
./xmlchange -file env_mach_pes.xml -id TOTALPES -val $NTASKS
./configure -cleanmach
./configure -case
./$CASE.$MACH.build
Changing simulation units
cd $CASE
# change given STOP_OPTION value to nyears
./xmlchange -file env_run.xml -id STOP_OPTION -val nyears
# change given STOP_N to 20
./xmlchange -file env_run.xml -id STOP_N -val 20
# don't produce restart files at end
./xmlchange -file env_run.xml -id REST_OPTION -val never
# or *do* produce restart files at end
#./xmlchange -file env_run.xml -id REST_OPTION -val $STOP_N
./configure -cleanmac
./configure -case
./$CASE.$MACH.build
Parallel netcdf library
The parallel and serial versions of the netcdf are both available within the default build on ARCHER.
The default setting is to employ the serial netcdf libraries.
To employ the parallel netcdf libraries, change directory to the $CASE and run
./xmlchange -file env_run.xml -id PIO_TYPENAME -val pnetcdf
which change the value of PIO_TYPENAME from netcdf to pnetcdf, before building. (This is contrary to the User Guide which states the value is changed after building)
The number of IO tasks is PIO_NUMTASKS, and the default value is -1 which instructs the library to select a suitable default value.
Changing the batch script
Change the budget to your budget account
In the file mkbatch.archer, change the line
set account_name = "ecse0116"
to
set account_name = "<budget>"
where the string <budget> is replaced by the name of your budget on ARCHER.
Requesting high memory nodes
Archer has two types of compute nodes, 2632 nodes with 64MBs of shared memory and 376 nodes with 128MBs. Both have 24 cores which share this memory.
During the Validation Process, it was found that the larger memory nodes were required to run some of the tests. To use the larger memory nodes, update the batch script, namely the *.run file, to select the larger memory nodes, specifically, to run one 4 large memory nodes, set
#PBS -l select=4:bigmem=true
else to run on 4 smaller memory nodes set
#PBS -l select=4:bigmem=false
or, if you don't mind which node you run on, set
#PBS -l select=4
Requesting longer wall times
Users are limited to requesting a maximum walltime of 24 hours, e.g.
#PBS -l walltime=24:00:00
however, if the job requires more time, then users can increase this limit to 48 hours by using the PBS long flag, e.g.
#PBS -q long
#PBS -l walltime=48:00:00