Getting Started with E3SM

This page provides a guide to setting up E3SM on Keeling.

Downloading the model

Create a E3SM2_1 directory in your home directory

mkdir E3SM2_1/

Next, create a E3SM folder in your /a/ directory. This will store the model outputs and data about the cases that are being run.

cd a

mkdir E3SM

Create a SSH key to your GitHub account to download E3SM

Click here for a guide through the setup process.

Downloading the model

First, make sure to navigate to your E3SM folder from your home directory. cd ~/E3SM2_1/

Use the following command to begin installing the required model files into this directory. git clone -b maint-2.1 --recursive https://github.com/E3SM-Project/E3SM.git

It is not unusual for the model to take a long time to download at first, since the model consists of several very large files.

Checking for a successful download

Navigate to your E3SM directory: cd E3SM and list the files using ls.

You should see the following files:

  • AUTHORS

  • CONTRIBUTING.md

  • README.md

  • cime_config

  • components

  • driver-moab

  • run_e3sm.template.sh

  • CITATION.cff

  • LICENSE

  • cime

  • codemeta.json

  • driver-mct

  • externals

  • share

Modules

Navigate to your home directory, then edit the file .modules7.

Add the following lines:

module load gnu/gnu-9.3.0
module load gnu/netcdf4-4.7.4-gnu-9.3.0
module load gnu/openmpi-4.1.2-gnu-9.3.0

After saving the file, you will either need to relogin to Keeling, or use the commands module purge source .modules7 to load the modules correctly.

Note that these modules can fail CESM builds - you will have to come back and delete those lines if you need to run a CESM model.

Edit XML Scripts for Keeling

Machine Configuration

Navigate to the cime_config/machines folder. cd /data/keeling/a/<NetId>/E3SM2_1/E3SM/cime_config/machines/

Add the following to the file config_machines.xml in /data/keeling/a/<NetID>/E3SM2_1/E3SM/cime_config/machines Don’t forget to change the netID from <NetID>. You can paste this code block near the end of the file but make sure you paste it before <default_run_suffix>:

<machine MACH="keeling">
  <DESC>UIUC CentOS 7.9, os is Linux, 16 pes/node, batch system is SLURM</DESC>
  <OS>LINUX</OS>
  <COMPILERS>gnu</COMPILERS>
  <MPILIBS>openmpi</MPILIBS>
  <CIME_OUTPUT_ROOT>/data/keeling/a/netID/a/E3SM/$CASE/run</CIME_OUTPUT_ROOT>
  <DIN_LOC_ROOT>/data/keeling/a/netID/a/E3SM/E3SM_input_data</DIN_LOC_ROOT>
  <DIN_LOC_ROOT_CLMFORC>$DIN_LOC_ROOT/atm/datm7</DIN_LOC_ROOT_CLMFORC>
  <DOUT_S_ROOT>/data/keeling/a/netID/a/E3SM/$CASE/E3SM_output_data</DOUT_S_ROOT>
  <BASELINE_ROOT>/data/keeling/a/netID/a/E3SM/CCSM_BASELINE</BASELINE_ROOT>
  <CCSM_CPRNC>/data/keeling/a/netID/E3SM2_1/E3SM/cime_config/tools/cprnc/cprnc</CCSM_CPRNC>
  <GMAKE_J>8</GMAKE_J>
  <BATCH_SYSTEM>slurm</BATCH_SYSTEM>
  <SUPPORTED_BY>e3sm</SUPPORTED_BY>
  <MAX_TASKS_PER_NODE>48</MAX_TASKS_PER_NODE>
  <MAX_MPITASKS_PER_NODE>48</MAX_MPITASKS_PER_NODE>
  <PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED>
  <mpirun mpilib="openmpi">
    <executable>mpirun</executable>
      <arguments>
      <arg name="num_tasks">-n {{ total_tasks }}  </arg>

    </arguments>

  </mpirun>
  <module_system type="module">
  </module_system>

</machine>

Checking the validity of config_machines.xml

Run the following command from any directory:

xmllint --xinclude --noout --schema /data/keeling/a/$USER/E3SM2_1/E3SM/cime/CIME/data/config/xml_schemas/config_machines.xsd /data/keeling/a/$USER/E3SM2_1/E3SM/cime_config/machines/config_machines.xml

If it successfully ports, you should see the following message:

/data/keeling/a/<NetId>/E3SM2_1/E3SM/cime_config/machines/config_machines.xml validates

Batch Configuration

From the same directory, edit the file config_batch.xml

Add the following lines:

<batch_system type="slurm">
   <batch_query per_job_arg="-j">squeue</batch_query>
   <batch_submit>sbatch</batch_submit>
   <batch_cancel>scancel</batch_cancel>
   <batch_directive>#SBATCH</batch_directive>
   <jobid_pattern>(\d+)$</jobid_pattern>
   <depend_string>--dependency=afterok:jobid</depend_string>
   <depend_allow_string>--dependency=afterany:jobid</depend_allow_string>
   <depend_separator>:</depend_separator>
   <walltime_format>%H:%M:%S</walltime_format>
   <batch_mail_flag>--mail-user netID@illinois.edu</batch_mail_flag>
   <batch_mail_type_flag>--mail-type</batch_mail_type_flag>
   <batch_mail_type>none, all, begin, end, fail</batch_mail_type>
   <submit_args>
     <arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
     <arg flag="-q" name="$JOB_QUEUE"/>
   </submit_args>
   <directives>
     <directive> --job-name={{ job_id }}</directive>
     <directive> --partition=sesempi</directive>
     <directive> --nodes={{ num_nodes }}</directive>
     <directive> --ntasks-per-node={{ tasks_per_node }}</directive>
     <directive> --constraint=j48</directive>
     <directive> --output={{ job_id }}.%j </directive>
     <directive> --exclusive </directive>
     <directive> --mail-type=FAIL </directive>
     <directive> --mail-type=END </directive>
     <directive> --mail-user=NetId@illinois.edu </directive>

   </directives>
    <queues>
     <queue walltimemax="168:00:00">sesempi</queue>
    </queues>
 </batch_system>

Important: You will find another batch system <batch_system type=”slurm” > in the middle of the config_batch.xml file. So you need to remove this block:

<batch_system type="slurm" >
  <batch_query per_job_arg="-j">squeue</batch_query>
  <batch_submit>sbatch</batch_submit>
  <batch_cancel>scancel</batch_cancel>
  <batch_directive>#SBATCH</batch_directive>
  <jobid_pattern>(\d+)$</jobid_pattern>
  <depend_string>--dependency=afterok:jobid</depend_string>
  <depend_allow_string>--dependency=afterany:jobid</depend_allow_string>
  <depend_separator>:</depend_separator>
  <walltime_format>%H:%M:%S</walltime_format>
  <batch_mail_flag>--mail-user</batch_mail_flag>
  <batch_mail_type_flag>--mail-type</batch_mail_type_flag>
  <batch_mail_type>none, all, begin, end, fail</batch_mail_type>
  <submit_args>
    <arg flag="--time" name="$JOB_WALLCLOCK_TIME"/>
    <arg flag="-p" name="$JOB_QUEUE"/>
    <arg flag="--account" name="$PROJECT"/>
  </submit_args>
  <directives>
    <directive> --job-name={{ job_id }}</directive>
    <directive> --nodes={{ num_nodes }}</directive>
    <directive> --output={{ job_id }}.%j </directive>
    <directive> --exclusive </directive>
  </directives>
</batch_system>

Checking the validity of config_batch.xml

Run the following command: xmllint --xinclude --noout --schema /data/keeling/a/$USER/E3SM2_1/E3SM/cime/CIME/data/config/xml_schemas/config_batch.xsd /data/keeling/a/$USER/E3SM2_1/E3SM/cime_config/machines/config_batch.xml

If it successfully ports, you should see the following message:

/data/keeling/a/<NetId>/E3SM2_1/E3SM/cime_config/machines/config_machines.xml validates

Editing the userdefined files

Go to the cmake_macros folder: cd /data/keeling/a/<NetId>/E3SM2_1/E3SM/cime_config/machines/cmake_macros/

Here, we will make a copy of userdefined.cmake in case of mistakes or for comparison. cp userdefined.cmake userdefined_copy.cmake

First, rename userdefined.cmake to keeling.cmake, as shown below:

cp userdefined.cmake keeling.cmake

Now, edit the keeling.cmake file. Change the contents to the code provided below.

set(SUPPORTS_CXX "TRUE")
if (COMP_NAME STREQUAL gptl)
  string(APPEND CPPDEFS " -DHAVE_VPRINTF -DHAVE_GETTIMEOFDAY -DHAVE_BACKTRACE")
endif()
set(NETCDF_PATH "/sw/netcdf4-4.7.4-gnu-9.3.0")
if (NOT DEBUG)
  string(APPEND FFLAGS " -fno-unsafe-math-optimizations")
endif()
if (DEBUG)
  string(APPEND FFLAGS " -g -fbacktrace -fbounds-check -ffpe-trap=invalid,zero,overflow")
endif()
string(APPEND SLIBS " -L${NETCDF_PATH}/lib/ -lnetcdff -lnetcdf -lcurl -llapack -lblas")
if (MPILIB STREQUAL mpi-serial)
  set(SCC "gcc")
endif()
if (MPILIB STREQUAL mpi-serial)
  set(SFC "gfortran")
endif()
string(APPEND CXX_LIBS " -lstdc++")

Building a Case

Navigate to the directory cd /data/keeling/a/<NetId>/E3SM2_1/E3SM/cime/scripts/

Create a new case with any name for the case, for example, “case1” with the following command.

./create_newcase --case <case name> --compset A --res f45_g37 --mach keeling

Setup and build your case

cd <caseName>

./case.setup

./case.build

If your case was built successfully, you will see the following message:

MODEL BUILD HAS FINISHED SUCCESSFULLY

Preview case run

Run ./preview_run to see the case info, check the number of total tasks and how many nodes are required to run the case. Make sure to change the nodes in your env_batch.xml file according to the case info displayed here, and change –ntasks if required. Never occupy more nodes than your case info says it needs, Keeling will still occupy those extra nodes for no reason.

Note that –ntasks=nodes*48 For example, if you require 3 nodes then –ntasks=144.

Running a test case

To run a standard test case, edit your env_batch.xml file using the xmlchange command:

From your case directory, ./xmlchange NTHRDS=2

We are only using 2 threads for this test case. For a real case, you will have to change this number to match your requirements.

To run the case after making this change, run the following commands to reset, clean, and rebuild your run.

./case.setup --reset

./case.build --clean-all

./case.build

Submit the case run

To run your test case, run the command: ./case.submit If your batch configuration was correct, you should get updates by Email from slurm when your run begins and ends. You can also type sqq to look at everything that is currently being run.

Output files from E3SM

To find your output files, navigate to /data/keeling/a/<netID>/a/E3SM/<case_name>/run/<case_name>/run

These files do not provide the same conventional gridded output as CESM - and instead provide data as vectors - so they cannot be analyzed in the same way. They need to be converted first.