Configuration file
Usage
NEDAS configuration is driven by YAML files and runtime argument parsing.
The NEDAS/config/default.yml file defines all the entries and their default values.
At runtime, a customized configuration file can be used by -c CONFIG_FILE,
the CONFIG_FILE doesn’t need to define every entry in default.yml,
just the ones related to the particular experiment.
Also, the simple entry types (not the compound types such as list, tuple and dict) can be
specified with a new value with --key value at runtime,
which makes it easier to run the same experiment but just changing one or two parameters in the configuration.
To run a NEDAS experiment on command line:
python -m NEDAS -c CONFIG_FILE --key value
Alternatively, in an interactive environment such as a Jupyter notebook,
the configuration object config can be initialized directly with
from NEDAS.config import Config
config = Config(config_file='CONFIG_FILE', key=value)
The config object can then use used to initialize and run the analysis scheme
from NEDAS.schemes import get_scheme
scheme = get_scheme(config)
scheme()
Description of entries
System paths and runtime environment
Entry |
Description |
Default (from |
|---|---|---|
|
Working directory for running the analysis scheme. |
‘work’ |
|
Runtime directory structure defined by format strings. |
See details in Table 1. |
|
Initialization script to enter the python environment. If not None, at runtime this script before running the python command. |
None |
|
I/O mode.
|
‘offline’ |
|
Runtime job submitter settings. These options are forwarded to the job submitter (see NEDAS.job_submitters package). |
See details in Table 2. |
|
Total number of processors used when a step is executed under MPI. |
1 |
|
Number of processors in a “member group” when distributing ensemble members. If not set in YAML, the code sets and computes Must evenly divide |
None (interpreted as |
|
Number of processors to use for utility steps (preprocess, postprocess, diagnose, etc.). If not set in YAML, the code uses |
None |
Key |
Description |
Default |
|---|---|---|
|
Directory for each analysis cycle. |
‘{work_dir}/cycle/
{time:%Y%m%d%H%M}’
|
|
Directory for the ensemble forecast step. |
‘{work_dir}/cycle/
{time:%Y%m%d%H%M}/
{model_name}’
|
|
Directory for the assimilation step (outer-loop iteration |
‘{work_dir}/cycle/
{time:%Y%m%d%H%M}/
analysis/{iter}’
|
Key |
Description |
Examples |
|---|---|---|
|
Host machine type. Machine-specific behavior can be defined in the corresponding subclass |
‘local’, ‘betzy’, … |
|
Scheduler type. |
None, ‘slurm’, ‘oar’, ‘pbs’, … |
|
Project number for resource allocation. |
None, ‘nn2993k’, … |
|
Name of the scheduler queue to submit jobs to (HPC). |
None, ‘normal’, ‘devel’, … |
|
Parallelization strategy to request from the job submitter. |
‘serial’, ‘mpi’, ‘openmp’ |
Runtime logging
Entry |
Description |
Default |
|---|---|---|
|
If True, show extra debug messages and output intermediate data during runtime. |
False |
|
If True, show elapsed time for major steps in the workflow. |
True |
|
If True, allow ANSI escape codes (colors, cursor movement) in terminal output. If None, auto-detected from the terminal environment. |
True |
|
If True, suppress most runtime status output. |
False |
|
Current call stack context string, set automatically at runtime. |
None |
|
Maximum call stack depth to display in status output. If None, all levels are shown. |
2 |
|
If True, adapt output formatting for Jupyter notebooks. If None, auto-detected. |
None |
|
Terminal width in characters used for formatting status lines. If None, auto-detected from the terminal. |
None |
|
Number of characters reserved for the left (description) part of status lines. |
50 |
|
Number of spaces per call stack level indentation. |
4 |
|
Width of the progress bar in characters. |
10 |
Analysis scheme design parameters
Key |
Description |
Default |
|---|---|---|
|
Ensemble size. |
20 |
|
Whether to run the preprocessing step. |
True |
|
Whether to run the ensemble forecast step. |
True |
|
Whether to run the analysis (assimilation) step. |
True |
|
Whether to run the postprocessing step after assimilation. |
True |
|
Whether to run the diagnostic tools. |
True |
|
If True, save checkpoints of model state and observations between cycles. |
False |
|
Used by If None, will run the entire workflow. Otherwise, will only run the specified step. (Valid step names depend on the scheme; for the filter scheme these include
and |
None |
Time controls
Key |
Description |
Example |
|---|---|---|
|
Start time of the period of interest. |
2001-01-01T00:00:00Z |
|
End time of the period of interest. |
2001-01-30T00:00:00Z |
|
Time of the first analysis cycle. Defaults to |
2001-01-07T00:00:00Z |
|
Time of the last analysis cycle. Defaults to |
2001-01-28T00:00:00Z |
|
Interval in hours between analysis cycles. |
12 |
|
Time of the current analysis cycle, set automatically at runtime. If None, will start at |
None |
|
Time steps in hours relative to the analysis for the observations. |
[0] |
|
Smoothing window in hours for observations. |
0 |
|
Time steps in hours relative to the analysis for the state variables. |
[0] |
|
Smoothing window in hours for state variables. |
0 |
Analysis grid definition
The grid_def entry is a dictionary with the following entries:
Key |
Description |
Example |
|---|---|---|
|
Type of grid to use for the analysis step. If ‘custom’, the other entries will be used as kwargs in initializing a regular grid, see details in Table 3. If a model name is specified, the corresponding model grid will be used instead. |
‘custom’, ‘qg’, etc. |
|
Mask for invalid points in the domain. If not None, the model name specifies which model generates the mask for the analysis grid. |
None, ‘qg’, etc. |
Key |
Description |
Example |
|---|---|---|
|
Map projection defined as PROJ4 strings |
None, ‘+proj=stere +lat_0=90 +lon_0=-45’ |
|
X coordinate start |
0 |
|
X coordinate end |
128 |
|
Y coordinate start |
0 |
|
Y coordinate end |
128 |
|
Grid spacing Note: the coordinates and grid spacing should be in meters. But if proj is None, they can be nondimensional. |
1 |
|
If True, the coordinates are defined at the center of each grid box. |
False |
|
The dimension(s) that are cyclic |
None, ‘x’, ‘y’, or ‘xy’ |
|
Type of distance function |
‘cartesian’ or ‘spherical’ |
State definition
The state_def entry is a list, each item is a dictionary that defines one model state variable:
Key |
Description |
Example |
|---|---|---|
|
Model state variable name. Corresponding to the keys in implemented in the model interface |
‘streamfunc’ |
|
Name of the model this variable comes from. Should be one of the keys in |
‘qg.fortran’ |
|
Variable type. |
‘field’, or ‘scalar’ |
|
Error distribution type. |
‘normal’ |
The model_def entry is a dictionary, with model_name as keys pointing to a dictionary of model-specific configuration parameters.
Key |
Description |
Example |
|---|---|---|
|
YAML configuration file for the model. If not specified, will use in the corresponding model module directory. Additional entries added below will overwrite the settings in the YAML file, making it easier to setup twin experiments. |
None, ‘{nedas_root}/models/qg/
fortran/default.yml’
|
|
Initialization script for model. At runtime this script before running the model forecast. |
‘setup.src’ |
|
Path to the model code directory. |
‘{nedas_root}/models/qg/fortran’ |
|
Number of processors to use for a model forecast. |
1 |
|
Number of processors to use for utility functions. |
1 |
|
Maximum runtime in seconds for model forecast. |
3600 |
|
Model restart file saving interval in hours. |
24 |
|
Model boundary condition interval in hours. |
24 |
|
Strategy for running tasks involving an ensemble of tasks. ‘scheduler’: run each member as a separate job and distribute the workload using a ‘batch’: run all members in a single job. |
‘scheduler’ or ‘batch’ |
|
Whether to use job array functionality when submitting the jobs via |
False |
|
Directory where the initial ensemble restart files are located. |
‘{work_dir}/init_ens’ |
|
Directory where the truth files are located. This is required when using synthetic observations that are generated from a truth run. |
‘{work_dir}/truth’ |
Observation definition
The obs_def entry is a list, each item is a dictionary that defines one observation variable:
Key |
Description |
Example |
|---|---|---|
|
Observation variable name. Corresponding to the keys in implemented in the dataset interface |
‘velocity’ |
|
Name of the dataset the observation comes from. Should be one of the keys in |
‘synthetic’ |
|
Name of the model from which to compute the observation priors. |
‘qg.fortran’ |
|
Number of observations. If generating synthetic random observation network, use this to control the density. |
3000 |
|
Error definition dictionary. |
See details in Table 4 |
|
Horizontal localization distance, radius of influence beyond which the observation impact is tapered to zero. In the same units as grid coordinates |
inf, 10, etc. |
|
Vertical localization distance, in the same units as |
inf |
|
Temporal localization distance |
inf |
|
List of impact factors of this observation on the state variables. The unlisted variable has a default impact of 1.0 |
{ ‘streamfunc’: 0 }, which turns off the impact on streamfunc |
Key |
Description |
Example |
|---|---|---|
|
Type of error distribution. |
‘normal’ |
|
Observation error standard deviation. |
1.0 |
|
Horizontal correlation length in observation error. |
0 |
|
Vertical correlation length in observation error. |
0 |
|
Temporal correlation length in observation error. |
0 |
|
Cross-variable correlation in observation error. A dictionary {variable_name: corr} listing the correlation between self and other variable_name. Auto-correlation is always 1, so there is no need to include self in the dictionary. |
{‘streamfunc’: 0} |
The dataset_def entry is a dictionary, with dataset_name as keys pointing to a dictionary of dataset-specific configuration parameters.
Key |
Description |
Example |
|---|---|---|
|
Name of the model used for computing observation priors for this dataset. Should be one of the keys in |
‘qg.fortran’ |
|
YAML configuration file for the dataset. If not specified, will use in the corresponding dataset module directory. Additional entries added below will overwrite the settings in the YAML file. |
None |
|
Path to the dataset files. (For synthetic observations this can be left empty.) |
None |
|
Start of the observation window, hours relative to the analysis time. |
-6 |
|
End of the observation window, hours relative to the analysis time. |
0 |
Some additional parameters:
Synthetic observations are enabled by using a synthetic dataset in obs_def (e.g. dataset_src: synthetic)
and providing a corresponding entry in dataset_def.
Key |
Description |
Default (from |
|---|---|---|
|
Whether to randomize the order of observations. |
False |
|
Where the reference vertical coordinates come from. |
‘mean’ |
|
Interpolation method used when mapping between grids. |
‘linear’ |
Perturbation
The top-level perturb entry controls the optional perturbation step.
In the default configuration it is left empty/None (no perturbation).
If enabled, perturb should be a list of dictionaries. Each dictionary defines a perturbation to apply
to one ensemble member and one or more variables (see NEDAS.core.perturb).
Key |
Description |
Example |
|---|---|---|
|
Variable name (string) or list of variable names to perturb. |
‘streamfunc’ |
|
Model name the variable(s) come from (a key in |
‘qg.fortran’ |
|
Perturbation type string. The first token selects the main method:
Additional options can be appended with commas (e.g. |
‘gaussian’ |
|
Perturbation amplitude. |
0.1 |
|
Horizontal correlation length (needed by |
15 |
|
Temporal correlation length (hours) used to correlate perturbations between cycles/time steps. |
0 |
|
Power-law exponent (needed by |
4 |
|
Optional value bounds after perturbation. |
[0, inf] |
|
Optional random seed. |
1234 |
If no perturbation is needed, leave perturb empty/None.
Assimilation method
The following parameters help NEDAS locate the correct analysis scheme and assimilation components.
Key |
Description |
Default / example |
|---|---|---|
|
Type of analysis scheme to use. |
‘filter’ |
|
Assimilator configuration dictionary. The assimilator class is chosen by
|
See below. |
|
Updator configuration dictionary (applies increments to produce posterior state). Alignment-based updators are selected via
through |
See |
|
Covariance configuration dictionary. |
See |
Key |
Description |
Default |
|---|---|---|
|
Assimilator type. |
‘ETKF’ |
|
Optional YAML configuration file for the assimilator. If not specified, the assimilator module default is used. |
None |
Covariance inflation parameters are stored in the inflation_def entry as a dictionary.
Key |
Description |
Example |
|---|---|---|
|
Type of inflation (post/prior, multiplicative/RTPP). |
‘post,multiplicative’ |
|
Whether to run an adaptive inflation scheme. |
False |
|
Static inflation coefficient. |
1.0 |
Covariance localization settings are separately defined for the spatial and temporal components.
The localization_def entry is a dictionary with keys horizontal, vertical and temporal
each pointing to a dictionary that defines its localization function parameters.
Key |
Description |
Example |
|---|---|---|
|
Type of localization kernel to use. Implemented types include
|
‘gaspari_cohn’ |
State and observation transforms can be configured with the transform_def entry,
which is a list of dictionaries each defining one transform to apply
(see NEDAS.assim_tools.transforms).
Key |
Description |
Example |
|---|---|---|
|
Transform type. Built-in types include and |
‘scale_bandpass’ |
|
If True, apply the same transform decomposition to observations as well as state variables. |
False |
Multiscale approach configuration:
Key |
Description |
Example |
|---|---|---|
|
Number of outer-loop iterations, e.g. number of scale components in a multiscale approach. |
1 |
|
Current iteration number |
0 |
|
Resolution level (n) for the analysis grid. The analysis grid will have a resolution where |
[0] |
|
Characteristic length (in grid coordinate units) for each scale (large to small). |
[16] |
|
Scale factor for localization distances. |
[1] |
|
Scale factor for observation error inflation. |
[1] |
Diagnostic methods
The diag entry is a list. Each element is a dictionary defining a diagnostic method to be run.
Key |
Description |
Example |
|---|---|---|
|
Name of the diagnostic method (Python module path under |
‘misc.convert_output’ |
|
Optional YAML configuration file for the method. If not specified, the method module default is used. |
None |
|
Which model the diagnostic is applied to. |
‘qg.fortran’ |
|
List of variables to process. |
[‘streamfunc’] |
|
Optional output grid definition; if omitted, the model grid is used. |
None |
|
Output filename format string. |
‘{work_dir}/output/
mem{member:03}_
{time:%Y-%m-%dT%H}.nc’
|