Preconfigured preprocessors

For production runs, there are Preprocessor classes that are pre-configured for different input datasets, e.g., CMIP, ERA5 or REMO output (double nesting). These are mostly pre-configured to work with data at DKRZ, e.g., they read input data from the CMIP or ERA5 data pool and mostly process them on the fly without the need to store any duplicated global model data.

[4]:
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

For the preprocessor classes, it is usally a good idea to start a dask client to define your preocessing resources. For the preprocessors, it’s usually a good idea to avoid multithreading if you want to write a lot of netcdf files. The following should usually work at DKRZ:

[5]:
from dask.distributed import Client

client = Client(
    dashboard_address="localhost:8787", n_workers=16, threads_per_worker=1
)  # mutlithreading does not work well with cdo
2025-03-28 14:20:12,257 - distributed.scheduler - WARNING - Failed to format dashboard link, unknown value: 'JUPYTERHUB_SERVICE_PREFIX'

Now, you can create a preprocessor instance, e.g., for ERA5, you can choose the ERA5Preprocessor:

[6]:
from pyremo.preproc import ERA5Preprocessor

preprocessor = ERA5Preprocessor(
    expid="000000",
    surflib="/work/ch0636/remo/surflibs/cordex/lib_EUR-11_frac.nc",
    domain="EUR-11",
    vc="vc_49lev",
    scratch="/scratch/g/g300046",
    outpath="/scratch/g/g300046/000000/xa/{date:%Y}/{date:%m}",
)

The preprocessor for ERA5 creates intermediate ERA5 NetCDF files (using CDO) in a CF-like format (“gfile”) and stores them in your scratch location. These files are removed automatically after processing and are only used on the fly. Ensure you have enough scratch disk space if preprocessing multiple years.

The easiest way to run the preprocessor is to use the run method. If you want to write netcdf files, you should choose write=True. The option compute=True will immediately start the processing instead of returnd dask delayed objects.

[ ]:
afiles = preprocessor.run(
    "2000-01-01T00:00:00", "2000-02-01T00:00:00", write=True, compute=True
)

The run method returns all afiles created by the preprocessor, e.g.

[9]:
afiles
[9]:
('/scratch/g/g300046/000000/xa/2000/01/a000000a2000010100.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010106.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010112.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010118.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010200.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010206.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010212.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010218.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010300.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010306.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010312.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010318.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010400.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010406.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010412.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010418.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010500.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010506.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010512.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010518.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010600.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010606.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010612.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010618.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010700.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010706.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010712.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010718.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010800.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010806.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010812.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010818.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010900.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010906.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010912.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000010918.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011000.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011006.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011012.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011018.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011100.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011106.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011112.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011118.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011200.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011206.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011212.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011218.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011300.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011306.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011312.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011318.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011400.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011406.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011412.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011418.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011500.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011506.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011512.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011518.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011600.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011606.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011612.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011618.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011700.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011706.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011712.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011718.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011800.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011806.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011812.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011818.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011900.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011906.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011912.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000011918.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012000.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012006.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012012.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012018.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012100.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012106.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012112.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012118.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012200.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012206.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012212.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012218.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012300.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012306.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012312.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012318.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012400.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012406.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012412.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012418.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012500.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012506.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012512.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012518.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012600.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012606.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012612.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012618.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012700.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012706.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012712.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012718.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012800.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012806.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012812.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012818.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012900.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012906.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012912.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000012918.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000013000.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000013006.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000013012.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000013018.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000013100.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000013106.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000013112.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000013118.nc',
 '/scratch/g/g300046/000000/xa/2000/01/a000000a2000020100.nc')