How to run CESM 1.2.2 on ARCHER

CESM 1.2.2 User Guide

The User Guide can be found at http://www.cesm.ucar.edu/models/cesm1.2/cesm/doc/usersguide/book1.html

NB: The PDF version is useful for searching the entire guide, however the PDF version gives no clue as to what text may be presented as links in the HTML version, so it is recommended to use both.

Installing CESM 1.2.2 on ARCHER

Firstly, edit your ~/.bashrc file and append the following lines

export CRAYPE_LINK_TYPE=dynamic
module load cmake
module load svn
module swap PrgEnv-cray PrgEnv-intel
module load cray-netcdf/4.3.2
module load cray-parallel-netcdf
module load cray-hdf5/1.8.13

These lines are required for each login session and batch job, thus placing them in the ~/.bashrc file will ensure the user does not forget to run them. This code requires the intel14 compiler which, in turn requires specific versions of craype, cray-parallel-netcdf, etc.

Download CESM 1.2.2 into your /work directory using svn, and then by following the instructions on http://www.cesm.ucar.edu/models/cesm1.2/tags/index.html#CESM1_2_2

At present, this download fails with

svn: warning: W160013: Unable to connect to a repository at URL 'http://parallelio.googlecode.com/svn/genf90/trunk_tags/genf90_140121’
As such, the PIO component is not downloaded and its directory is empty, and this causes the build to fail. To add PIO, please following the instructions at https://bb.cgd.ucar.edu/googlecode-repositories-are-offline-pio-source-not-found

Once downloaded, add the following 5 files into the 'scripts/ccsm_utils/Machines' directory.

122_config_machines.xml

env_mach_specific.archer

Depends.intel

122_config_compilers.xml

122_mkbatch.archer

NB rename 122_mkbatch.archer to mkbatch.archer, 122_config_compilers.xml to config_compilers.xml, and 122_config_machines.xml to config_machines.xml (i.e remove the first 4 characters)

These files may contain references to directories which start with /work/ecse0116/. All such occurances must be replaced by directories in your own workspace.

NB before building CESM, the input, archive and scratch directories must exist.

When users build their case, the input directory will be probed to check if the associated input files are available. If they are not, then the scripts will automatically pull the necessary associated input files from the CESM svn repository and place them in the input directory. As such, the input directory can become huge.

In an ideal environment, there would exist a directory where every ARCHER user has both read and write access to it. Unfortunately, such a directory does not exist on ARCHER. As such, there is no shared inputdata directory and all users must create and manage their own input directory

;

Please note that there is a shared input directory which contains the largest and more popular input data files, but this directory is read only. This CESM shared input data directory is located at /work/n02/shared/cesm/inputdata/. This shared directory may be used by any ARCHER users, and not just NCAR (n02) users, and may only be read from. Users may copy relevant input files from this shared directory to their own local input data files. The use of this shared directory will save a significant amout of disk space and time.

The input, archive and scratch directories must all be created by hand by each user in their own work directory, e.g.

mkdir /work/ecse0116/ecse0116/gavin2/cesm1_2_2/archive

mkdir /work/ecse0116/ecse0116/gavin2/cesm1_2_2/scratch

mkdir /work/ecse0116/ecse0116/gavin2/cesm1_2_2/inputdata

NB if you will use parallel-netcdf and not simply netcdf then to gain best performance, you should set the LFS stripe to -1 for these three directories using the following three commands

lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/SCRATCH

lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/archive

lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/inputdata

These directories are then referenced in config_machines.xml, e.g.

<DIN_LOC_ROOT>/work/ecse0116/ecse0116/gavin2/cesm1_2_2/inputdata</DIN_LOC_ROOT>

<DIN_LOC_ROOT_CLMFORC>/work/ecse0116/ecse0116/gavin2/cesm1_2_2/ccsm1/inputdata/atm/datm7</DIN_LOC_ROOT_CLMFORC>

<DOUT_S_ROOT>/work/ecse0116/ecse0116/gavin2/cesm1_2_2/archive/$CASE</DOUT_S_ROOT>

<CESMSCRATCHROOT>/work/ecse0116/ecse0116/gavin2/cesm1_2_2/scratch</CESMSCRATCHROOT>

Building the cprnc tool

Finally, one must build, by hand, the cprnc tool.

To make the cprnc tool, first upload the following file Makefile.cprnc122.archer to your cprnc directory, which will resemble:

/work/ecse0116/ecse0116/gavin2/cesm1_2_2/tools/cprnc

Once uploaded, run the following commands to make the cprnc tool (if they are not present in your ~/.bashrc file):

export CRAYPE_LINK_TYPE=dynamic
module load cmake
module load svn
module swap PrgEnv-cray PrgEnv-intel
module load cray-netcdf/4.3.2
module load cray-parallel-netcdf
module load cray-hdf5/1.8.13

net

and then copy over a file strangely missing from the down

cp ../../models/csm_share/shr/dtypes.h .

and then make the executable using the following three commands. (The 2nd throws an error which is fixed by simply running the command again)

make realclean -f Makefile.cprnc122.archer
make -f Makefile.cprnc122.archer
make -f Makefile.cprnc122.archer

A Known Error may occur here, where the error message reads somethine similar to:

cprnc.F90: Error in opening the comiled module file. Check INCLUDE paths.

The work-around is to replace the existing compare_vars_mod.F90, which will be empty, with the following file: compare_vars_mod.F90, and then re-make

Once the cprnc executable has been built, you must then edit the config_machines.xml file and replace the existing value of of CCSM_CPRNC to point to the location of your new cprnc executable, e.g. the following line must be changed by hand from

CCSM_CPRNC="/work/ecse0116/ecse0116/gavin2/CESM1.0/models/atm/cam/tools/cprnc/cprnc"

to something similar to

CCSM_CPRNC="/work/ecse0116/ecse0116/gavin/cesm1_2_2/tools/cprnc/cprnc"

This was a temporary bug in the intel compiler which may case the cprnc tool to throw either of the following errors at runtime:

Fatal Error: This program was not built to run in your system.
Please verify that both the operating system and the processor support Intel(R) AVX, F16C and RDRAND instructions.

or 

Please verify that both the operating system and the processor support Intel(R) F16C instructions

This can be fixed by running the following commands

module swap craype-ivybridge craype-sandybridge

make clean -f Makefile.cprnc122.archer

make -f Makefile.cprnc122.archer

module swap craype-sandybridge craype-ivybridge

Completing the configuration process

Tools directory

By default, the taskmaker.pl tool is found in the scripts/ccsm_utils/Machines directory; however, the code expects this tool to reside in the scripts/ccsm_utils/Tools directory. One workaround is to copy the tool to the expected directory, e.g.
cd scripts/ccsm_utils cp Machines/taskmaker.pls Tools/.

Furthermore, some CESM scripts are not, by default, executable. A simple work-around which ensures the Tools are executable is to run the follow ing command./p>

chmod 755 scripts/ccsm_utils/Tools/*

Building CESM

Firstly, Change directory to the scripts directory in the 'work' installation of CESM, e.g.

cd /work/ecse0116/ecse0116/gavin/CESM1.0/scripts

Building tests

The process of building the CESM tests is slightly different from building simulations.

For the test ERS_D.f19_g16.X, say, issue the following commands in the 'scripts' directory.

./create_test -testname ERS_D.f19_g16.X.archer_intel -testid t21
cd ERS_D.f19_g16.X.archer_intel.t21
./ERS_D.f19_g16.X.archer_intel.t21.test_build

At present, the output of these processes contain multiple instances of the following string. NB this 'error' can safely be ignored.

ModuleCmd_Switch.c(172):ERROR:152: Module 'PrgEnv-cray' is currently not loaded

Running the test

To run the test, run the following command:

./ERS_D.f19_g16.X.archer_intel.t21.submit

Known Bugs

There is a known bug which affects users wishing to use the RESTART facility. If you wish to set RESUBMIT to .true, then please edit the file
scripts/ccsm_utils/Tools/cesm_postrun_setup
and add a new line, namely

cd $CASEROOT

at the beginning of the resubmit section. Without this, neither certain commands, such as xmlchange, nor the batch script file will be found

Building your simulation

The code is built, from scratch, for each simulation the user wants to run.

Consider, say, the following model: f19_g16 B_18050_CAM5_CN

This is configured, build and submitted, for a case called, my_first_sim, say, using the following commands:

./create_newcase -case my_first_sim -res f19_g16 -compset B_1850_CAM5_CN -mach archer -compiler intel
cd my_first_sim
./cesm_setup
./my_first_sim.build

Consider the create_newcase command: the -case flag assigns a local name to be given. Here I have used 'my_first_sim'; the -res flag assigns the mesh resolution; the -compset flag assigns the computation set of codes to employ; the -mach flag assigns the name of the platform; in this case 'archer'; and finally the -compiler flag assigns thename of the compiler; in this case 'intel' (which will employ intel14).

Consider the build command: if the input/restart files are not present, then the build command down loads the necessary files. As such, this command can take over an hour. Further, if the build fails with an error which references the /tmp directory, simply run the build command again as it is likely the system was very busy and the build command temporarily ran out of memory.

Before running the simulation, users should check both the your_name_for_this.archer.run file and the env_run.xml file, as the default values produce only a short run.

Running your simulation

To run the simulation, run the following command:

./my_first_sim.submit

How to change from the default settings

Before building

Changing the number of cores

Changing the number of cores to 128, say

cd $CASE
NTASKS=128
./xmlchange -file env_mach_pes.xml -id NTASKS_ATM -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_LND -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_ICE -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_OCN -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_CPL -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_GLC -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_ROF -val $NTASKS
./xmlchange -file env_mach_pes.xml -id NTASKS_WAV -val $NTASKS
./xmlchange -file env_mach_pes.xml -id TOTALPES -val $NTASKS

./cesm_setup -clean
./cesm_setup
./*.clean_build
./*.build

Changing simulation units

cd $CASE

# change given STOP_OPTION value to nyears

./xmlchange -file env_run.xml -id STOP_OPTION -val nyears

# change given STOP_N to 20

./xmlchange -file env_run.xml -id STOP_N -val 20

# don't produce restart files at end

./xmlchange -file env_run.xml -id REST_OPTION -val never

# or *do* produce restart files at end

#./xmlchange -file env_run.xml -id REST_OPTION -val $STOP_N

./cesm_setup -clean
./cesm_setup
./*.clean_build
./*.build

Parallel netcdf library

The parallel and serial versions of the netcdf are both available within the default build on ARCHER.

The default setting is to employ the serial netcdf libraries.

To employ the parallel netcdf libraries, change directory to the $CASE and run

./xmlchange -file env_run.xml -id PIO_TYPENAME -val pnetcdf

which change the value of PIO_TYPENAME from netcdf to pnetcdf, before building.  (This is contrary to the User Guide which states the value is changed after building)

The number of IO tasks is PIO_NUMTASKS, and the default value is -1 which instructs the library to select a suitable default value.

As stated above, if using parallel-netcdf and not simply netcdf, then to gain best performance, you should set the LFS stripe to -1 for your SCRATCH, archive and inputdata directories.

lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/SCRATCH

lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/archive

lfs setstripe -c -1 /work/ecse0116/ecse0116/gavin2/cesm1_2_2/inputdata

Changing the batch script

Change the budget to your budget account

In the file mkbatch.archer, change the line

set account_name = "ecse0116"

to

set account_name = "<budget>"

where the string <budget> is replaced by the name of your budget on ARCHER.

Editing the batch script

The batch script is a file which ends with '.run', thus to edit the batch script using vi, say, type the following

vi *.run

Requesting high memory nodes

Archer has two types of compute nodes, 2632 nodes with 64MBs of shared memory and 376 nodes with 128MBs. Both have 24 cores which share this memory.

During the Validation Process, it was found that the larger memory nodes were required to run some of the tests. To use the larger memory nodes, update the batch script, namely the *.run file, to select the larger memory nodes, specifically, to run one 4 large memory nodes, set

#PBS -l select=4:bigmem=true

else to run on 4 smaller memory nodes set

#PBS -l select=4:bigmem=false

or, if you don't mind which node you run on, set

#PBS -l select=4

Requesting longer wall times

Users are limited to requesting a maximum walltime of 24 hours, e.g.

#PBS -l walltime=24:00:00

however, if the job requires more time, then users can increase this limit to 48 hours by using the PBS long flag, e.g.

#PBS -q long

#PBS -l walltime=48:00:00