Debugging for GCM
This builds on the Basic GCM pipeline page, and shows how to:
- output last model timestep on CPU
- print the location of crash
This is an interim solution and the demo branch can be seen here here. It uses the Held Suarez setup, using the GCMdriver (modularised driver which let you easily mix and match initial and boundary conditions, and sources).
How it works:
- The model is initially run with 128 nodes and crashes after 17 simulation days
- The last diagnostic timestep is saved as 128 restart files
- restart files from all nodes are combined into one
- the script then reruns the model from the restart file on one node, which allows the diagnostics to be applied coherently
- using a
try ... catch
statement, the last model timestep at crash is saved in the same format as the standard diagnostic output
Note that these files also had to be modified, as well as the pipeline script, to be able to save the last timestep
src/Diagnostics/atmos_gcm_default.jl
src/Driver/Driver.jl
assemble_checkpoints.jl
based on this script- (
helper.sh
,exp_parameter
can be removed if also removed from the pipeline script)
If you know the function that crashed, tell the program to print some variable info and crash gridpoint location, as in here
Run script on CPU using
sbatch pipeline_logging_gcmd_precrash.sh
- Your specified output folder should contain
.../netcdf/
containing the diagnostics output.nc
file and thelast_c rash_HeldSuarez ... .nc
file.../restart/
containing all restart.jld2
files from individual nodes.../log/
containingmodel_log_err.out
logfile
- View Output
- printed info on crash point can be found in the
model_log_err.out
logfile - for a quick visualisation of the crash location on the Caltech cluster, it is recommended to use
ncview <namefile>
- printed info on crash point can be found in the