Configuration file
Usage
NEDAS configuration is driven by YAML files and runtime argument parsing.
The NEDAS/config/default.yml file defines all the entries and their default values.
At runtime, a customized configuration file can be used by -c CONFIG_FILE,
the CONFIG_FILE doesn’t need to define every entry in default.yml,
just the ones related to the particular experiment.
Also, the simple entry types (not the compound types such as list, tuple and dict) can be
specified with a new value with --key value at runtime,
which makes it easier to run the same experiment but just changing one or two parameters in the configuration.
In a python script, the following code can be included
from NEDAS.config import Config
c = Config(parse_args=True)
so that when the script is run on command line as
python script.py -c CONFIG_FILE --key value
the config object c is created, whose attributes carry the configuration parameters.
Alternatively, in an interactive environment such as a Jupyter notebook,
the configuration object c can be initialized directly with
from NEDAS.config import Config
c = Config(config_file='CONFIG_FILE', key=value)
Description of entries
System paths and runtime environment
Entry |
Description |
Default |
|---|---|---|
|
Working directory for running the analysis scheme. |
‘work’ |
|
Runtime directory structure defined by format strings. |
See details in Table 1. |
|
Initialization script to enter the python environment. If not None, at runtime this script before running the python command. |
None |
|
Runtime job submitter settings, which are passed to
|
See details in Table 2. |
|
Number of processors to use for the analysis step. |
10 |
|
Number of processors in a “member group”, which splits the MPI communicator into If None, no splitting is done. |
None |
|
Number of processors to use for utility steps, such as preproc, postproc, diagnose, etc. If None, will use the same as |
None |
Key |
Description |
Default |
|---|---|---|
|
Directory for each analysis cycle |
‘{work_dir}/cycle/{time:%Y%m%d%H%M}’ |
|
Directory for the assimilation step |
‘{work_dir}/cycle/{time:%Y%m%d%H%M}/analysis’ |
|
Directory for the ensemble forecast step |
‘{work_dir}/cycle/{time:%Y%m%d%H%M}/{model_name}’ |
Key |
Description |
Examples |
|---|---|---|
|
Host machine name, machine-specific behavior in job scheduling can be defined in the corresponding subclass in Job submitters. |
None, ‘laptop’, ‘betzy’, etc. |
|
Project number for resource allocation |
None, ‘nn2993k’, etc. |
|
Name of the scheduler queue to submit jobs to |
None, ‘normal’, ‘devel’, etc. |
|
Scheduler type. Typically a separate Job submitters subclass is defined for each scheduler type. |
None, ‘slurm’, ‘oar’, ‘pbs’, etc. |
|
Number of available processors per compute node |
128 |
Analysis scheme design parameters
Key |
Description |
Default |
|---|---|---|
|
Ensemble size |
20 |
|
Whether to run the preprocessing step. |
True |
|
Whether to run the ensemble forecast step. |
True |
|
Whether to run the analysis step. |
True |
|
Whether to run the diagnostic tools. |
True |
|
If True, show extra debug message and output intermediate data during runtime. |
False |
|
If True, show elapsed time for each major steps in the workflow. |
True |
|
Used by If None, will run the entire workflow. Otherwise, will only run the specified step defined in the workflow at |
None, ‘preprocess’, ‘postprocess’, ‘filter’, ‘perturb’, ‘diagnose’, or ‘ensemble_forecast’. |
Time controls
Key |
Description |
Example |
|---|---|---|
|
Start time of the period of interest. |
2001-01-01T00:00:00Z |
|
End time of the period of interest. |
2001-01-30T00:00:00Z |
|
Time of the first analysis cycle. |
2001-01-07T00:00:00Z |
|
Time of the last analysis cycle. |
2001-01-28T00:00:00Z |
|
Interval in hours between analysis cycles. |
24 |
|
Time of the current analysis cycle. If None, will start at |
None |
|
Time steps in hours relative to the analysis for the observations. |
[0] |
|
Smoothing window in hours for observations. |
0 |
|
Time steps in hours relative to the analysis for the state variables. |
[0] |
|
Smoothing window in hours for state variables. |
0 |
Analysis grid definition
The grid_def entry is a dictionary with the following entries:
Key |
Description |
Example |
|---|---|---|
|
Type of grid to use for the analysis step. If ‘custom’, the other entries will be used as kwargs in initializing a regular grid, see details in Table 3. If a model name is specified, the corresponding model grid will be used instead. |
‘custom’, ‘qg’, etc. |
|
Mask for invalid points in the domain. If not None, the model name specifies which model generates the mask for the analysis grid. |
None, ‘qg’, etc. |
Key |
Description |
Example |
|---|---|---|
|
Map projection defined as PROJ4 strings |
None, ‘+proj=stere +lat_0=90 +lon_0=-45’ |
|
X coordinate start |
0 |
|
X coordinate end |
128 |
|
Y coordinate start |
0 |
|
Y coordinate end |
128 |
|
Grid spacing Note: the coordinates and grid spacing should be in meters. But if proj is None, they can be nondimensional. |
1 |
|
If True, the coordinates are defined at the center of each grid box. |
False |
|
The dimension(s) that are cyclic |
None, ‘x’, ‘y’, or ‘xy’ |
|
Type of distance function |
‘cartesian’ or ‘spherical’ |
State definition
The state_def entry is a list, each item is a dictionary that defines one model state variable:
Key |
Description |
Example |
|---|---|---|
|
Model state variable name. Corresponding to the keys in implemented in the model interface |
‘streamfunc’ |
|
Name of the model this variable comes from. Should be one of the keys in |
‘qg’ |
|
Variable type. |
‘field’, or ‘scalar’ |
|
Error distribution type. |
‘normal’ |
The model_def entry is a dictionary, with model_name as keys pointing to a dictionary of model-specific configuration parameters.
Key |
Description |
Example |
|---|---|---|
|
YAML configuration file for the model. If not specified, will use in the corresponding model module directory. Additional entries added below will overwrite the settings in the YAML file, making it easier to setup twin experiments. |
None, ‘models/qg/default.yml’ |
|
Initialization script for model. At runtime this script before running the model forecast. |
‘setup.src’ |
|
Path to the model code. |
‘{nedas_root}/models/qg’ |
|
Number of processors to use for a model forecast. |
1 |
|
Number of processors to use for utility functions. |
1 |
|
Maximum runtime in seconds for model forecast. |
3600 |
|
Model restart file saving interval in hours. |
24 |
|
Model boundary condition interval in hours. |
24 |
|
Strategy for running tasks involving an ensemble of tasks. ‘scheduler’: run each member as a separate job and distribute the workload using a ‘batch’: run all members in a single job. |
‘scheduler’ or ‘batch’ |
|
Whether to use job array functionality when submitting the jobs via |
False |
|
Directory where the initial ensemble restart files are located. |
‘{work_dir}/init_ens’ |
|
Directory where the truth files are located. If |
‘{work_dir}/truth’ |
Observation definition
The obs_def entry is a list, each item is a dictionary that defines one observation variable:
Key |
Description |
Example |
|---|---|---|
|
Observation variable name. Corresponding to the keys in implemented in the dataset interface |
‘velocity’ |
|
Name of the dataset the observation comes from. Should be one of the keys in |
‘qg’ |
|
Name of the model from which to compute the observation priors . |
‘qg’ |
|
Number of observations. If generating synthetic random observation network, use this to control the density. |
3000 |
|
Error definition dictionary. |
See details in Table 4 |
|
Horizontal localization distance, radius of influence beyond which the observation impact is tapered to zero. In the same units as grid coordinates |
inf, 10, etc. |
|
Vertical localization distance, in the same units as |
inf |
|
Temporal localization distance |
inf |
|
List of impact factors of this observation on the state variables. The unlisted variable has a default impact of 1.0 |
{ ‘streamfunc’: 0 }, which turns off the impact on streamfunc |
Key |
Description |
Example |
|---|---|---|
|
Type of error distribution. |
‘normal’ |
|
Observation error standard deviation. |
1.0 |
|
Horizontal correlation length in observation error. |
0 |
|
Vertical correlation length in observation error. |
0 |
|
Temporal correlation length in observation error. |
0 |
|
Cross-variable correlation in observation error. A dictionary {variable_name: corr} listing the correlation between self and other variable_name. Auto-correlation is always 1, so there is no need to include self in the dictionary. |
{‘streamfunc’: 0} |
The dataset_def entry is a dictionary, with dataset_name as keys pointing to a dictionary of dataset-specific configuration parameters.
Key |
Description |
Example |
|---|---|---|
|
YAML configuration file for the dataset. If not specified, will use in the corresponding dataset module directory. Additional entries added below will overwrite the settings in the YAML file. |
None, ‘dataset/qg/default.yml’ |
|
Path to the dataset files |
‘data’ |
|
Start of the observation window, hours relative to the analysis time. |
-12 |
|
End of the observation window, hours relative to the analysis time. |
12 |
Some additional parameters:
Key |
Description |
Default |
|---|---|---|
|
Whether to use synthetic observations generated from the truth. |
True |
|
Whether to randomize the order of observations. |
False |
|
Where the reference vertical coordinates come from. |
‘mean’ |
Perturbation
The perturb entry is a list, each element is a dictionary with kwargs that will be passed to utils.random_perturb()
to perform the perturbation.
Key |
Description |
Default |
|---|---|---|
|
Name of variable to be perturbed. |
‘streamfunc’ |
|
Name of the model the variable comes from. |
‘qg’ |
|
Type of random perturbation. Use ‘,’ to join multiple options |
‘gaussian,exp’ |
|
Amplitude of the perturbation. |
0.1 |
|
Horizontal correlation length of the perturbation, in coordinate units |
15 |
|
Temporal correlation length of the perturbation, in hours |
0 |
|
If set, the perturbed variable will remain in the value range. |
[0, inf] |
If no perturbation is needed, you can also leave perturb as None.
Assimilation method
The following parameters helps get_scheme() to locate the right subclass of Scheme.
Currently only ‘filter’ and ‘forecast’ schemes are implemented.
For the specific filter_type, the corresponding Assimilator subclass will be chosen to perform the filter step.
Key |
Description |
Example |
|---|---|---|
|
Type of analysis scheme to use. |
‘filter’ |
|
Type of filter |
‘ETKF’ (for batch mode), ‘EAKF’ (for serial mode) |
Assimilator-specific parameters are defined in assimilator dictionary.
Key |
Description |
Example |
|---|---|---|
|
YAML configuration file for the assimilator. If not specified, will use in the corresponding assimilator module directory. Additional entries added below will overwrite the settings. |
None, ‘assimilators/ETKF/default.yml’ |
Alignment technique configuration is stored in alignment as a dictionary:
Key |
Description |
Example |
|---|---|---|
|
If True, use interpolation to find the variables on displaced analysis grid. If False, displace the grid coordinates directly. |
False |
|
Name of the variable the alignment is based on. |
‘streamfunc’ |
|
Optical flow method. |
‘HS_pyramid’ |
|
Number of resolution levels in pyramic approach |
5 |
|
Weight in cost function to enforce smoothness of displace vector field |
1 |
Covariance inflation parameters are stored in the inflation entry as a dictionary.
Key |
Description |
Example |
|---|---|---|
|
Type of inflation (post/prior, multiplicative/RTPP). |
‘post,RTPP’ |
|
Whether to run an adaptive inflation scheme. |
True |
|
Static inflation coefficient. |
1.0 |
Covariance localization settings are separately defined for the spatial and temporal components.
The localization entry is a dictionary with keys horizontal, vertical and temporal
each pointing to a dictionary that defines its localization function parameters.
Key |
Description |
Example |
|---|---|---|
|
Type of localization function to use. Distance-based (GC, step, exp) or correlation-based (NICE) |
‘GC’ |
|
YAML configuration file for this type of localization. |
None, ‘default.yml’ |
Multiscale approach configuration:
Key |
Description |
Example |
|---|---|---|
|
Number of outer-loop iterations, e.g. number of scale components in a multiscale approach. |
1 |
|
Current iteration number |
0 |
|
Resolution level (n) for the analysis grid. The analysis grid will have a resolution where |
[0] |
|
Characteristic length (in grid coordinate units) for each scale (large to small) |
[1] |
|
Scale factor for localization distances. |
[1] |
|
Scale factor for observation error inflation. |
[1] |
Diagnostic methods
The diag entry is a list, each element is a dictionary defining a diagnostic method to be run.
Key |
Description |
Example |
|---|---|---|
|
Name of the diagnostic method |
‘misc.convert_output’ |
|
YAML configuration file for the method. If not specified, will use in the corresponding method module directory. Additional entries added below will overwrite the settings. |
None, ‘diag/misc/convert_output/default.yml’ |