Configuration file

Usage 

NEDAS configuration is driven by YAML files and runtime argument parsing. The NEDAS/config/default.yml file defines all the entries and their default values. At runtime, a customized configuration file can be used by -c CONFIG_FILE, the CONFIG_FILE doesn’t need to define every entry in default.yml, just the ones related to the particular experiment. Also, the simple entry types (not the compound types such as list, tuple and dict) can be specified with a new value with --key value at runtime, which makes it easier to run the same experiment but just changing one or two parameters in the configuration.

In a python script, the following code can be included

from NEDAS.config import Config
c = Config(parse_args=True)

so that when the script is run on command line as

python script.py -c CONFIG_FILE --key value

the config object c is created, whose attributes carry the configuration parameters.

Alternatively, in an interactive environment such as a Jupyter notebook, the configuration object c can be initialized directly with

from NEDAS.config import Config
c = Config(config_file='CONFIG_FILE', key=value)

Description of entries 

System paths and runtime environment 

Entry	Description	Default
`work_dir`	Working directory for running the analysis scheme.	‘work’
`directories`	Runtime directory structure defined by format strings.	See details in Table 1.
`python_env`	Initialization script to enter the python environment. If not None, at runtime `". {python_env}"` will source this script before running the python command.	None
`job_submit`	Runtime job submitter settings, which are passed to `NEDAS.utils.shell_utils.run_job()` as kwargs.	See details in Table 2.
`nproc`	Number of processors to use for the analysis step.	10
`nproc_mem`	Number of processors in a “member group”, which splits the MPI communicator `comm` of size `nproc` into `comm_mem` of size `nproc_mem` If None, no splitting is done.	None
`nproc_util`	Number of processors to use for utility steps, such as preproc, postproc, diagnose, etc. If None, will use the same as `nproc`.	None

Table 1. Breakdown of `directories` dictionary
Key	Description	Default
`cycle_dir`	Directory for each analysis cycle	‘{work_dir}/cycle/{time:%Y%m%d%H%M}’
`analysis_dir`	Directory for the assimilation step	‘{work_dir}/cycle/{time:%Y%m%d%H%M}/analysis’
`forecast_dir`	Directory for the ensemble forecast step	‘{work_dir}/cycle/{time:%Y%m%d%H%M}/{model_name}’

Table 2. Breakdown of `job_submit` dictionary.
Key	Description	Examples
`host`	Host machine name, machine-specific behavior in job scheduling can be defined in the corresponding subclass in Job submitters.	None, ‘laptop’, ‘betzy’, etc.
`project`	Project number for resource allocation	None, ‘nn2993k’, etc.
`queue`	Name of the scheduler queue to submit jobs to	None, ‘normal’, ‘devel’, etc.
`scheduler`	Scheduler type. Typically a separate Job submitters subclass is defined for each scheduler type.	None, ‘slurm’, ‘oar’, ‘pbs’, etc.
`ppn`	Number of available processors per compute node	128

Analysis scheme design parameters 

Key	Description	Default
`nens`	Ensemble size	20
`run_preproc`	Whether to run the preprocessing step.	True
`run_forecast`	Whether to run the ensemble forecast step.	True
`run_analysis`	Whether to run the analysis step.	True
`run_diagnose`	Whether to run the diagnostic tools.	True
`debug`	If True, show extra debug message and output intermediate data during runtime.	False
`timer`	If True, show elapsed time for each major steps in the workflow.	True
`step`	Used by `NEDAS.schemes.filter`. If None, will run the entire workflow. Otherwise, will only run the specified step defined in the workflow at `time`.	None, ‘preprocess’, ‘postprocess’, ‘filter’, ‘perturb’, ‘diagnose’, or ‘ensemble_forecast’.

Time controls 

Key	Description	Example
`time_start`	Start time of the period of interest.	2001-01-01T00:00:00Z
`time_end`	End time of the period of interest.	2001-01-30T00:00:00Z
`time_analysis_start`	Time of the first analysis cycle.	2001-01-07T00:00:00Z
`time_analysis_end`	Time of the last analysis cycle.	2001-01-28T00:00:00Z
`cycle_period`	Interval in hours between analysis cycles.	24
`time`	Time of the current analysis cycle. If None, will start at `time_start`.	None
`obs_time_steps`	Time steps in hours relative to the analysis for the observations.	[0]
`obs_time_scale`	Smoothing window in hours for observations.	0
`state_time_steps`	Time steps in hours relative to the analysis for the state variables.	[0]
`state_time_scale`	Smoothing window in hours for state variables.	0

Analysis grid definition 

The grid_def entry is a dictionary with the following entries:

Key

Description

Example

type

Type of grid to use for the analysis step.

If ‘custom’, the other entries will be used as kwargs in

initializing a regular grid, see details in Table 3.

If a model name is specified, the corresponding

model grid will be used instead.

‘custom’, ‘qg’, etc.

mask

Mask for invalid points in the domain.

If not None, the model name specifies which model generates

the mask for the analysis grid.

None, ‘qg’, etc.

Table 3. Additional kwargs for custom regular grid generation.
Key	Description	Example
`proj`	Map projection defined as PROJ4 strings	None, ‘+proj=stere +lat_0=90 +lon_0=-45’
`xmin`	X coordinate start	0
`xmax`	X coordinate end	128
`ymin`	Y coordinate start	0
`ymax`	Y coordinate end	128
`dx`	Grid spacing Note: the coordinates and grid spacing should be in meters. But if proj is None, they can be nondimensional.	1
`centered`	If True, the coordinates are defined at the center of each grid box.	False
`cyclic_dim`	The dimension(s) that are cyclic	None, ‘x’, ‘y’, or ‘xy’
`distance_type`	Type of distance function	‘cartesian’ or ‘spherical’

State definition 

The state_def entry is a list, each item is a dictionary that defines one model state variable:

Key	Description	Example
`name`	Model state variable name. Corresponding to the keys in `Model.variables` implemented in the model interface	‘streamfunc’
`model_src`	Name of the model this variable comes from. Should be one of the keys in `model_def`.	‘qg’
`var_type`	Variable type.	‘field’, or ‘scalar’
`err_type`	Error distribution type.	‘normal’

The model_def entry is a dictionary, with model_name as keys pointing to a dictionary of model-specific configuration parameters.

Key	Description	Example
`config_file`	YAML configuration file for the model. If not specified, will use `default.yml` in the corresponding model module directory. Additional entries added below will overwrite the settings in the YAML file, making it easier to setup twin experiments.	None, ‘models/qg/default.yml’
`model_env`	Initialization script for model. At runtime `". {model_env}"` will source this script before running the model forecast.	‘setup.src’
`model_code_dir`	Path to the model code.	‘{nedas_root}/models/qg’
`nproc_per_run`	Number of processors to use for a model forecast.	1
`nproc_per_util`	Number of processors to use for utility functions.	1
`walltime`	Maximum runtime in seconds for model forecast.	3600
`restart_dt`	Model restart file saving interval in hours.	24
`forcing_dt`	Model boundary condition interval in hours.	24
`ens_run_strategy`	Strategy for running tasks involving an ensemble of tasks. ‘scheduler’: run each member as a separate job and distribute the workload using a `Scheduler`. ‘batch’: run all members in a single job.	‘scheduler’ or ‘batch’
`use_job_array`	Whether to use job array functionality when submitting the jobs via `JobSubmitter`.	False
`ens_init_dir`	Directory where the initial ensemble restart files are located.	‘{work_dir}/init_ens’
`truth_dir`	Directory where the truth files are located. If `use_synthetic_obs` this is mandatory.	‘{work_dir}/truth’

Observation definition 

The obs_def entry is a list, each item is a dictionary that defines one observation variable:

Key	Description	Example
`name`	Observation variable name. Corresponding to the keys in `Dataset.variables` implemented in the dataset interface	‘velocity’
`dataset_src`	Name of the dataset the observation comes from. Should be one of the keys in `dataset_def`.	‘qg’
`model_src`	Name of the model from which to compute the observation priors .	‘qg’
`nobs`	Number of observations. If generating synthetic random observation network, use this to control the density.	3000
`err`	Error definition dictionary.	See details in Table 4
`hroi`	Horizontal localization distance, radius of influence beyond which the observation impact is tapered to zero. In the same units as grid coordinates	inf, 10, etc.
`vroi`	Vertical localization distance, in the same units as `z_coords`	inf
`troi`	Temporal localization distance	inf
`impact_on_state`	List of impact factors of this observation on the state variables. The unlisted variable has a default impact of 1.0	{ ‘streamfunc’: 0 }, which turns off the impact on streamfunc

Table 4. Breakdown of the observation error definition dictionary.
Key	Description	Example
`type`	Type of error distribution.	‘normal’
`std`	Observation error standard deviation.	1.0
`hcorr`	Horizontal correlation length in observation error.	0
`vcorr`	Vertical correlation length in observation error.	0
`tcorr`	Temporal correlation length in observation error.	0
`cross_corr`	Cross-variable correlation in observation error. A dictionary {variable_name: corr} listing the correlation between self and other variable_name. Auto-correlation is always 1, so there is no need to include self in the dictionary.	{‘streamfunc’: 0}

The dataset_def entry is a dictionary, with dataset_name as keys pointing to a dictionary of dataset-specific configuration parameters.

Key	Description	Example
`config_file`	YAML configuration file for the dataset. If not specified, will use `default.yml` in the corresponding dataset module directory. Additional entries added below will overwrite the settings in the YAML file.	None, ‘dataset/qg/default.yml’
`dataset_dir`	Path to the dataset files	‘data’
`obs_window_min`	Start of the observation window, hours relative to the analysis time.	-12
`obs_window_max`	End of the observation window, hours relative to the analysis time.	12

Some additional parameters:

Key

Description

Default

use_synthetic_obs

Whether to use synthetic observations generated

from the truth.

True

shuffle_obs

Whether to randomize the order of observations.

False

z_coords_from

Where the reference vertical coordinates come from.

‘mean’

Perturbation 

The perturb entry is a list, each element is a dictionary with kwargs that will be passed to utils.random_perturb() to perform the perturbation.

Key	Description	Default
`variable`	Name of variable to be perturbed.	‘streamfunc’
`model_src`	Name of the model the variable comes from.	‘qg’
`type`	Type of random perturbation. Use ‘,’ to join multiple options	‘gaussian,exp’
`amp`	Amplitude of the perturbation.	0.1
`hcorr`	Horizontal correlation length of the perturbation, in coordinate units	15
`tcorr`	Temporal correlation length of the perturbation, in hours	0
`bounds`	If set, the perturbed variable will remain in the value range.	[0, inf]

If no perturbation is needed, you can also leave perturb as None.

Assimilation method 

The following parameters helps get_scheme() to locate the right subclass of Scheme. Currently only ‘filter’ and ‘forecast’ schemes are implemented. For the specific filter_type, the corresponding Assimilator subclass will be chosen to perform the filter step.

Key	Description	Example
`scheme`	Type of analysis scheme to use.	‘filter’
`assimilator`	Type of filter	‘ETKF’ (for batch mode), ‘EAKF’ (for serial mode)

Assimilator-specific parameters are defined in assimilator dictionary.

Key

Description

Example

config_file

YAML configuration file for the assimilator.

If not specified, will use default.yml

in the corresponding assimilator

module directory. Additional entries added

below will overwrite the settings.

None,

‘assimilators/ETKF/default.yml’

Alignment technique configuration is stored in alignment as a dictionary:

Key	Description	Example
`interp_displace`	If True, use interpolation to find the variables on displaced analysis grid. If False, displace the grid coordinates directly.	False
`variable`	Name of the variable the alignment is based on.	‘streamfunc’
`method`	Optical flow method.	‘HS_pyramid’
`nlevel`	Number of resolution levels in pyramic approach	5
`smoothness_weight`	Weight in cost function to enforce smoothness of displace vector field	1

Covariance inflation parameters are stored in the inflation entry as a dictionary.

Key	Description	Example
`type`	Type of inflation (post/prior, multiplicative/RTPP).	‘post,RTPP’
`adaptive`	Whether to run an adaptive inflation scheme.	True
`coef`	Static inflation coefficient.	1.0

Covariance localization settings are separately defined for the spatial and temporal components. The localization entry is a dictionary with keys horizontal, vertical and temporal each pointing to a dictionary that defines its localization function parameters.

Key

Description

Example

type

Type of localization function to use. Distance-based (GC, step, exp)

or correlation-based (NICE)

‘GC’

config_file

YAML configuration file for this type of localization.

None, ‘default.yml’

Multiscale approach configuration:

Key	Description	Example
`niter`	Number of outer-loop iterations, e.g. number of scale components in a multiscale approach.	1
`iter`	Current iteration number	0
`resolution_level`	Resolution level (n) for the analysis grid. The analysis grid will have a resolution `dx * 2**n` where `dx` is the grid spacing defined in `grid_def`.	[0]
`character_length`	Characteristic length (in grid coordinate units) for each scale (large to small)	[1]
`localize_scale_fac`	Scale factor for localization distances.	[1]
`obs_err_scale_fac`	Scale factor for observation error inflation.	[1]

Diagnostic methods 

The diag entry is a list, each element is a dictionary defining a diagnostic method to be run.

Key

Description

Example

method

Name of the diagnostic method

‘misc.convert_output’

config_file

YAML configuration file for the method.

If not specified, will use default.yml

in the corresponding method module

directory. Additional entries added

below will overwrite the settings.

None,

‘diag/misc/convert_output/default.yml’