How to run CESM 1.2.2 on ARCHER
CESM 1.2.2 User Guide
The User Guide can be found at http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/usersguide/book1.html
NB: The PDF version is useful for searching the entire guide, however the PDF version gives no clue as to what text may be presented as links in the HTML version, so it is recommended to use both.
Installing CESM 1.2.2 on ARCHER
Firstly, edit your ~/.bashrc file and append the following lines
export CRAYPE_LINK_TYPE=dynamic
module load cmake
module load svn
module swap PrgEnv-cray PrgEnv-intel
module load cray-netcdf/4.3.2
module load cray-parallel-netcdf
module load cray-hdf5/1.8.13
These lines are required for each login session and batch job, thus placing them in the ~/.bashrc file will ensure the user does not forget to run them. This code requires the intel14 compiler which, in turn requires specific versions of craype, cray-parallel-netcdf, etc.
Download CESM 1.2.2 into your /work directory using svn, and then by following the instructions on http://www.cesm.ucar.edu/models/cesm1.2/tags/index.html#CESM1_2_2
At present, this download fails with
svn: warning: W160013: Unable to connect to a repository at URL 'http://parallelio.googlecode.com/svn/genf90/trunk_tags/genf90_140121’As such, the PIO component is not downloaded and its directory is empty, and this causes the build to fail. To add PIO, please following the instructions at https://bb.cgd.ucar.edu/googlecode-repositories-are-offline-pio-source-not-found
Once downloaded, add the following 5 files into the 'scripts/ccsm_utils/Machines' directory.
NB rename 122_mkbatch.archer to mkbatch.archer, 122_config_compilers.xml to config_compilers.xml, and 122_config_machines.xml to config_machines.xml (i.e remove the first 4 characters)
These files may contain references to directories which start with /work/ecse0116/. All such occurances must be replaced by directories in your own workspace.
NB before building CESM, the input, archive and scratch directories must exist.
When users build their case, the input directory will be probed to check if the associated input files are available. If they are not, then the scripts will automatically pull the necessary associated input files from the CESM svn repository and place them in the input directory. As such, the input directory can become huge.
In an ideal environment, there would exist a directory where every ARCHER user has both read and write access to it. Unfortunately, such a directory does not exist on ARCHER. As such, there is no shared inputdata directory and all users must create and manage their own input directory
;Please note that there is a shared input directory which contains the largest and more popular input data files, but this directory is read only. This CESM shared input data directory is located at /work/n02/shared/cesm/inputdata/. This shared directory may be used by any ARCHER users, and not just NCAR (n02) users, and may only be read from. Users may copy relevant input files from this shared directory to their own local input data files. The use of this shared directory will save a significant amout of disk space and time.
The input, archive and scratch directories must all be created by hand by each user in their own work directory, e.g.
mkdir /work/ecse0116/ecse0116/gavin2/cesm1_2_2/archive
mkdir /work/ecse0116/ecse0116/gavin2/cesm1_2_2/scratch
mkdir /work/ecse0116/ecse0116/gavin2/cesm1_2_2/inputdata
NB if you will use parallel-netcdf and not simply netcdf then to gain best performance, you should set the LFS stripe to -1 for these three directories using the following three commands
lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/SCRATCH
lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/archive
lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/inputdata
These directories are then referenced in config_machines.xml, e.g.
<DIN_LOC_ROOT>/work/ecse0116/ecse0116/gavin2/cesm1_2_2/inputdata</DIN_LOC_ROOT>
<DIN_LOC_ROOT_CLMFORC>/work/ecse0116/ecse0116/gavin2/cesm1_2_2/ccsm1/inputdata/atm/datm7</DIN_LOC_ROOT_CLMFORC>
<DOUT_S_ROOT>/work/ecse0116/ecse0116/gavin2/cesm1_2_2/archive/$CASE</DOUT_S_ROOT>
<CESMSCRATCHROOT>/work/ecse0116/ecse0116/gavin2/cesm1_2_2/scratch</CESMSCRATCHROOT>
Building the cprnc tool
Finally, one must build, by hand, the cprnc tool.
To make the cprnc tool, first upload the following file Makefile.cprnc122.archer to your cprnc directory, which will resemble:
/work/ecse0116/ecse0116/gavin2/cesm1_2_2/tools/cprnc
Once uploaded, run the following commands to make the cprnc tool (if they are not present in your ~/.bashrc file):
export CRAYPE_LINK_TYPE=dynamic
net
module load cmake
module load svn
module swap PrgEnv-cray PrgEnv-intel
module load cray-netcdf/4.3.2
module load cray-parallel-netcdf
module load cray-hdf5/1.8.13
and then copy over a file strangely missing from the down
cp ../../models/csm_share/shr/dtypes.h .
and then make the executable using the following three commands. (The 2nd throws an error which is fixed by simply running the command again)
make realclean -f Makefile.cprnc122.archer
make -f Makefile.cprnc122.archer
make -f Makefile.cprnc122.archer
A Known Error may occur here, where the error message reads somethine similar to:
cprnc.F90: Error in opening the comiled module file. Check INCLUDE paths.
The work-around is to replace the existing compare_vars_mod.F90, which will be empty, with the following file: compare_vars_mod.F90, and then re-make
Once the cprnc executable has been built, you must then edit the config_machines.xml file and replace the existing value of of CCSM_CPRNC to point to the location of your new cprnc executable, e.g. the following line must be changed by hand from
CCSM_CPRNC="/work/ecse0116/ecse0116/gavin2/CESM1.0/models/atm/cam/tools/cprnc/cprnc"
to something similar toCCSM_CPRNC="/work/ecse0116/ecse0116/gavin/cesm1_2_2/tools/cprnc/cprnc"
This was a temporary bug in the intel compiler which may case the cprnc tool to throw either of the following errors at runtime:
Fatal Error: This program was not built to run in your system.
Please verify that both the operating system and the processor support Intel(R) AVX, F16C and RDRAND instructions.or
Please verify that both the operating system and the processor support Intel(R) F16C instructions
This can be fixed by running the following commands
module swap craype-ivybridge craype-sandybridge
make clean -f Makefile.cprnc122.archer
make -f Makefile.cprnc122.archer
module swap craype-sandybridge craype-ivybridge
Completing the configuration process
Tools directory
By default, the taskmaker.pl tool is found in the scripts/ccsm_utils/Machines directory; however, the code expects this tool to reside in the scripts/ccsm_utils/Tools directory. One workaround is to copy the tool to the expected directory, e.g.cd scripts/ccsm_utils cp Machines/taskmaker.pls Tools/.Furthermore, some CESM scripts are not, by default, executable. A simple work-around which ensures the Tools are executable is to run the follow ing command./p>
chmod 755 scripts/ccsm_utils/Tools/*Building CESM
Firstly, Change directory to the scripts directory in the 'work' installation of CESM, e.g.
cd /work/ecse0116/ecse0116/gavin/CESM1.0/scripts
Building tests
The process of building the CESM tests is slightly different from building simulations.
For the test ERS_D.f19_g16.X, say, issue the following commands in the 'scripts' directory.
./create_test -testname ERS_D.f19_g16.X.archer_intel -testid t21
cd ERS_D.f19_g16.X.archer_intel.t21
./ERS_D.f19_g16.X.archer_intel.t21.test_buildAt present, the output of these processes contain multiple instances of the following string. NB this 'error' can safely be ignored.
ModuleCmd_Switch.c(172):ERROR:152: Module 'PrgEnv-cray' is currently not loaded
Running the test
To run the test, run the following command:
./ERS_D.f19_g16.X.archer_intel.t21.submit
Known Bugs
There is a known bug which affects users wishing to use the RESTART facility. If you wish to set RESUBMIT to .true, then please edit the filescripts/ccsm_utils/Tools/cesm_postrun_setupand add a new line, namelyat the beginning of the resubmit section. Without this, neither certain commands, such as xmlchange, nor the batch script file will be foundcd $CASEROOT
Building your simulation
The code is built, from scratch, for each simulation the user wants to run.
Consider, say, the following model: f19_g16 B_18050_CAM5_CN
This is configured, build and submitted, for a case called, my_first_sim, say, using the following commands:
./create_newcase -case my_first_sim -res f19_g16 -compset B_1850_CAM5_CN -mach archer -compiler intel
cd my_first_sim
./cesm_setup
./my_first_sim.buildConsider the create_newcase command: the -case flag assigns a local name to be given. Here I have used 'my_first_sim'; the -res flag assigns the mesh resolution; the -compset flag assigns the computation set of codes to employ; the -mach flag assigns the name of the platform; in this case 'archer'; and finally the -compiler flag assigns thename of the compiler; in this case 'intel' (which will employ intel14).
Consider the build command: if the input/restart files are not present, then the build command down loads the necessary files. As such, this command can take over an hour. Further, if the build fails with an error which references the /tmp directory, simply run the build command again as it is likely the system was very busy and the build command temporarily ran out of memory.
Before running the simulation, users should check both the your_name_for_this.archer.run file and the env_run.xml file, as the default values produce only a short run.
Running your simulation
To run the simulation, run the following command:
./my_first_sim.submit
How to change from the default settings
Before building
Changing the number of cores
Changing the number of cores to 128, say
cd $CASE
NTASKS=128
./xmlchange -file env_mach_pes.xml -id NTASKS_ATM -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_LND -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_ICE -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_OCN -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_CPL -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_GLC -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_ROF -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_WAV -val $NTASKS
./xmlchange -file env_mach_pes.xml -id TOTALPES -val $NTASKS./cesm_setup -clean
./cesm_setup
./*.clean_build
./*.buildChanging simulation units
cd $CASE
# change given STOP_OPTION value to nyears
./xmlchange -file env_run.xml -id STOP_OPTION -val nyears
# change given STOP_N to 20
./xmlchange -file env_run.xml -id STOP_N -val 20
# don't produce restart files at end
./xmlchange -file env_run.xml -id REST_OPTION -val never
# or *do* produce restart files at end
#./xmlchange -file env_run.xml -id REST_OPTION -val $STOP_N
./cesm_setup -clean
./cesm_setup
./*.clean_build
./*.buildParallel netcdf library
The parallel and serial versions of the netcdf are both available within the default build on ARCHER.
The default setting is to employ the serial netcdf libraries.
To employ the parallel netcdf libraries, change directory to the $CASE and run
./xmlchange -file env_run.xml -id PIO_TYPENAME -val pnetcdf
which change the value of PIO_TYPENAME from netcdf to pnetcdf, before building. (This is contrary to the User Guide which states the value is changed after building)
The number of IO tasks is PIO_NUMTASKS, and the default value is -1 which instructs the library to select a suitable default value.
As stated above, if using parallel-netcdf and not simply netcdf, then to gain best performance, you should set the LFS stripe to -1 for your SCRATCH, archive and inputdata directories.
lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/SCRATCH
lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/archive
lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/inputdata
Changing the batch script
Change the budget to your budget account
In the file mkbatch.archer, change the line
set account_name = "ecse0116"
to
set account_name = "<budget>"
where the string <budget> is replaced by the name of your budget on ARCHER.
Editing the batch script
The batch script is a file which ends with '.run', thus to edit the batch script using vi, say, type the following
vi *.run
Requesting high memory nodes
Archer has two types of compute nodes, 2632 nodes with 64MBs of shared memory and 376 nodes with 128MBs. Both have 24 cores which share this memory.
During the Validation Process, it was found that the larger memory nodes were required to run some of the tests. To use the larger memory nodes, update the batch script, namely the *.run file, to select the larger memory nodes, specifically, to run one 4 large memory nodes, set
#PBS -l select=4:bigmem=true
else to run on 4 smaller memory nodes set
#PBS -l select=4:bigmem=false
or, if you don't mind which node you run on, set
#PBS -l select=4
Requesting longer wall times
Users are limited to requesting a maximum walltime of 24 hours, e.g.
#PBS -l walltime=24:00:00
however, if the job requires more time, then users can increase this limit to 48 hours by using the PBS long flag, e.g.
#PBS -q long
#PBS -l walltime=48:00:00