Configuration file

Usage

NEDAS configuration is driven by YAML files and runtime argument parsing. The NEDAS/config/default.yml file defines all the entries and their default values. At runtime, a customized configuration file can be used by -c CONFIG_FILE, the CONFIG_FILE doesn’t need to define every entry in default.yml, just the ones related to the particular experiment. Also, the simple entry types (not the compound types such as list, tuple and dict) can be specified with a new value with --key value at runtime, which makes it easier to run the same experiment but just changing one or two parameters in the configuration.

In a python script, the following code can be included

from NEDAS.config import Config
c = Config(parse_args=True)

so that when the script is run on command line as

python script.py -c CONFIG_FILE --key value

the config object c is created, whose attributes carry the configuration parameters.

Alternatively, in an interactive environment such as a Jupyter notebook, the configuration object c can be initialized directly with

from NEDAS.config import Config
c = Config(config_file='CONFIG_FILE', key=value)

Description of entries

System paths and runtime environment

Entry

Description

Default

work_dir

Working directory for running the analysis scheme.

‘work’

directories

Runtime directory structure defined by format strings.

See details in Table 1.

python_env

Initialization script to enter the python environment.

If not None, at runtime ". {python_env}" will source

this script before running the python command.

None

job_submit

Runtime job submitter settings, which are passed to

NEDAS.utils.shell_utils.run_job() as kwargs.

See details in Table 2.

nproc

Number of processors to use for the analysis step.

10

nproc_mem

Number of processors in a “member group”,

which splits the MPI communicator comm of size nproc

into comm_mem of size nproc_mem

If None, no splitting is done.

None

nproc_util

Number of processors to use for utility steps,

such as preproc, postproc, diagnose, etc.

If None, will use the same as nproc.

None

Table 1. Breakdown of directories dictionary

Key

Description

Default

cycle_dir

Directory for each

analysis cycle

‘{work_dir}/cycle/{time:%Y%m%d%H%M}’

analysis_dir

Directory for the

assimilation step

‘{work_dir}/cycle/{time:%Y%m%d%H%M}/analysis’

forecast_dir

Directory for the

ensemble forecast step

‘{work_dir}/cycle/{time:%Y%m%d%H%M}/{model_name}’

Table 2. Breakdown of job_submit dictionary.

Key

Description

Examples

host

Host machine name, machine-specific behavior

in job scheduling can be defined in the

corresponding subclass in Job submitters.

None, ‘laptop’, ‘betzy’, etc.

project

Project number for resource allocation

None, ‘nn2993k’, etc.

queue

Name of the scheduler queue to submit jobs to

None, ‘normal’, ‘devel’, etc.

scheduler

Scheduler type.

Typically a separate Job submitters

subclass is defined for each scheduler type.

None, ‘slurm’, ‘oar’, ‘pbs’, etc.

ppn

Number of available processors

per compute node

128

Analysis scheme design parameters

Key

Description

Default

nens

Ensemble size

20

run_preproc

Whether to run the preprocessing step.

True

run_forecast

Whether to run the ensemble forecast step.

True

run_analysis

Whether to run the analysis step.

True

run_diagnose

Whether to run the diagnostic tools.

True

debug

If True, show extra debug message and output

intermediate data during runtime.

False

timer

If True, show elapsed time for each

major steps in the workflow.

True

step

Used by NEDAS.schemes.filter.

If None, will run the entire workflow.

Otherwise, will only run the specified step

defined in the workflow at time.

None, ‘preprocess’, ‘postprocess’,

‘filter’, ‘perturb’, ‘diagnose’,

or ‘ensemble_forecast’.

Time controls

Key

Description

Example

time_start

Start time of the period of interest.

2001-01-01T00:00:00Z

time_end

End time of the period of interest.

2001-01-30T00:00:00Z

time_analysis_start

Time of the first analysis cycle.

2001-01-07T00:00:00Z

time_analysis_end

Time of the last analysis cycle.

2001-01-28T00:00:00Z

cycle_period

Interval in hours between analysis cycles.

24

time

Time of the current analysis cycle.

If None, will start at time_start.

None

obs_time_steps

Time steps in hours relative to the analysis

for the observations.

[0]

obs_time_scale

Smoothing window in hours for observations.

0

state_time_steps

Time steps in hours relative to the analysis

for the state variables.

[0]

state_time_scale

Smoothing window in hours for state variables.

0

Analysis grid definition

The grid_def entry is a dictionary with the following entries:

Key

Description

Example

type

Type of grid to use for the analysis step.

If ‘custom’, the other entries will be used as kwargs in

initializing a regular grid, see details in Table 3.

If a model name is specified, the corresponding

model grid will be used instead.

‘custom’, ‘qg’, etc.

mask

Mask for invalid points in the domain.

If not None, the model name specifies which model generates

the mask for the analysis grid.

None, ‘qg’, etc.

Table 3. Additional kwargs for custom regular grid generation.

Key

Description

Example

proj

Map projection defined as PROJ4 strings

None,

‘+proj=stere +lat_0=90 +lon_0=-45’

xmin

X coordinate start

0

xmax

X coordinate end

128

ymin

Y coordinate start

0

ymax

Y coordinate end

128

dx

Grid spacing

Note: the coordinates and grid spacing

should be in meters. But if proj is None,

they can be nondimensional.

1

centered

If True, the coordinates are defined

at the center of each grid box.

False

cyclic_dim

The dimension(s) that are cyclic

None, ‘x’, ‘y’, or ‘xy’

distance_type

Type of distance function

‘cartesian’ or ‘spherical’

State definition

The state_def entry is a list, each item is a dictionary that defines one model state variable:

Key

Description

Example

name

Model state variable name.

Corresponding to the keys in Model.variables

implemented in the model interface

‘streamfunc’

model_src

Name of the model this variable comes from.

Should be one of the keys in model_def.

‘qg’

var_type

Variable type.

‘field’, or ‘scalar’

err_type

Error distribution type.

‘normal’

The model_def entry is a dictionary, with model_name as keys pointing to a dictionary of model-specific configuration parameters.

Key

Description

Example

config_file

YAML configuration file for the model.

If not specified, will use default.yml

in the corresponding model module directory.

Additional entries added below will overwrite

the settings in the YAML file, making it easier

to setup twin experiments.

None,

‘models/qg/default.yml’

model_env

Initialization script for model.

At runtime ". {model_env}" will source

this script before running the model forecast.

‘setup.src’

model_code_dir

Path to the model code.

‘{nedas_root}/models/qg’

nproc_per_run

Number of processors to use for a model forecast.

1

nproc_per_util

Number of processors to use for utility functions.

1

walltime

Maximum runtime in seconds for model forecast.

3600

restart_dt

Model restart file saving interval in hours.

24

forcing_dt

Model boundary condition interval in hours.

24

ens_run_strategy

Strategy for running tasks involving an ensemble of tasks.

‘scheduler’: run each member as a separate job

and distribute the workload using a Scheduler.

‘batch’: run all members in a single job.

‘scheduler’ or ‘batch’

use_job_array

Whether to use job array functionality when

submitting the jobs via JobSubmitter.

False

ens_init_dir

Directory where the initial ensemble restart files

are located.

‘{work_dir}/init_ens’

truth_dir

Directory where the truth files are located.

If use_synthetic_obs this is mandatory.

‘{work_dir}/truth’

Observation definition

The obs_def entry is a list, each item is a dictionary that defines one observation variable:

Key

Description

Example

name

Observation variable name.

Corresponding to the keys in Dataset.variables

implemented in the dataset interface

‘velocity’

dataset_src

Name of the dataset the observation comes from.

Should be one of the keys in dataset_def.

‘qg’

model_src

Name of the model from which to compute the

observation priors .

‘qg’

nobs

Number of observations.

If generating synthetic random observation network,

use this to control the density.

3000

err

Error definition dictionary.

See details in Table 4

hroi

Horizontal localization distance,

radius of influence beyond which the observation

impact is tapered to zero.

In the same units as grid coordinates

inf, 10, etc.

vroi

Vertical localization distance,

in the same units as z_coords

inf

troi

Temporal localization distance

inf

impact_on_state

List of impact factors of this observation

on the state variables.

The unlisted variable has a default impact of 1.0

{ ‘streamfunc’: 0 },

which turns off the

impact on streamfunc

Table 4. Breakdown of the observation error definition dictionary.

Key

Description

Example

type

Type of error distribution.

‘normal’

std

Observation error standard deviation.

1.0

hcorr

Horizontal correlation length in observation error.

0

vcorr

Vertical correlation length in observation error.

0

tcorr

Temporal correlation length in observation error.

0

cross_corr

Cross-variable correlation in observation error. A dictionary

{variable_name: corr} listing the correlation between self

and other variable_name. Auto-correlation is always 1,

so there is no need to include self in the dictionary.

{‘streamfunc’: 0}

The dataset_def entry is a dictionary, with dataset_name as keys pointing to a dictionary of dataset-specific configuration parameters.

Key

Description

Example

config_file

YAML configuration file for the dataset.

If not specified, will use default.yml

in the corresponding dataset module directory.

Additional entries added below will overwrite

the settings in the YAML file.

None,

‘dataset/qg/default.yml’

dataset_dir

Path to the dataset files

‘data’

obs_window_min

Start of the observation window,

hours relative to the analysis time.

-12

obs_window_max

End of the observation window,

hours relative to the analysis time.

12

Some additional parameters:

Key

Description

Default

use_synthetic_obs

Whether to use synthetic observations generated

from the truth.

True

shuffle_obs

Whether to randomize the order of observations.

False

z_coords_from

Where the reference vertical coordinates come from.

‘mean’

Perturbation

The perturb entry is a list, each element is a dictionary with kwargs that will be passed to utils.random_perturb() to perform the perturbation.

Key

Description

Default

variable

Name of variable to be perturbed.

‘streamfunc’

model_src

Name of the model the variable comes from.

‘qg’

type

Type of random perturbation.

Use ‘,’ to join multiple options

‘gaussian,exp’

amp

Amplitude of the perturbation.

0.1

hcorr

Horizontal correlation length of the

perturbation, in coordinate units

15

tcorr

Temporal correlation length of the

perturbation, in hours

0

bounds

If set, the perturbed variable will

remain in the value range.

[0, inf]

If no perturbation is needed, you can also leave perturb as None.

Assimilation method

The following parameters helps get_scheme() to locate the right subclass of Scheme. Currently only ‘filter’ and ‘forecast’ schemes are implemented. For the specific filter_type, the corresponding Assimilator subclass will be chosen to perform the filter step.

Key

Description

Example

scheme

Type of analysis scheme to use.

‘filter’

assimilator

Type of filter

‘ETKF’ (for batch mode), ‘EAKF’ (for serial mode)

Assimilator-specific parameters are defined in assimilator dictionary.

Key

Description

Example

config_file

YAML configuration file for the assimilator.

If not specified, will use default.yml

in the corresponding assimilator

module directory. Additional entries added

below will overwrite the settings.

None,

‘assimilators/ETKF/default.yml’

Alignment technique configuration is stored in alignment as a dictionary:

Key

Description

Example

interp_displace

If True, use interpolation to find the variables

on displaced analysis grid.

If False, displace the grid coordinates directly.

False

variable

Name of the variable the alignment is based on.

‘streamfunc’

method

Optical flow method.

‘HS_pyramid’

nlevel

Number of resolution levels in pyramic approach

5

smoothness_weight

Weight in cost function to enforce

smoothness of displace vector field

1

Covariance inflation parameters are stored in the inflation entry as a dictionary.

Key

Description

Example

type

Type of inflation (post/prior, multiplicative/RTPP).

‘post,RTPP’

adaptive

Whether to run an adaptive inflation scheme.

True

coef

Static inflation coefficient.

1.0

Covariance localization settings are separately defined for the spatial and temporal components. The localization entry is a dictionary with keys horizontal, vertical and temporal each pointing to a dictionary that defines its localization function parameters.

Key

Description

Example

type

Type of localization function to use. Distance-based (GC, step, exp)

or correlation-based (NICE)

‘GC’

config_file

YAML configuration file for this type of localization.

None, ‘default.yml’

Multiscale approach configuration:

Key

Description

Example

niter

Number of outer-loop iterations, e.g. number of scale components in a multiscale approach.

1

iter

Current iteration number

0

resolution_level

Resolution level (n) for the analysis grid.

The analysis grid will have a resolution dx * 2**n

where dx is the grid spacing defined in grid_def.

[0]

character_length

Characteristic length (in grid coordinate units)

for each scale (large to small)

[1]

localize_scale_fac

Scale factor for localization distances.

[1]

obs_err_scale_fac

Scale factor for observation error inflation.

[1]

Diagnostic methods

The diag entry is a list, each element is a dictionary defining a diagnostic method to be run.

Key

Description

Example

method

Name of the diagnostic method

‘misc.convert_output’

config_file

YAML configuration file for the method.

If not specified, will use default.yml

in the corresponding method module

directory. Additional entries added

below will overwrite the settings.

None,

‘diag/misc/convert_output/default.yml’