Configuration file ================== .. contents:: :local: :depth: 2 Usage ----- NEDAS configuration is driven by YAML files and runtime argument parsing. The ``NEDAS/config/default.yml`` file defines all the entries and their default values. At runtime, a customized configuration file can be used by ``-c CONFIG_FILE``, the ``CONFIG_FILE`` doesn't need to define every entry in ``default.yml``, just the ones related to the particular experiment. Also, the simple entry types (not the compound types such as list, tuple and dict) can be specified with a new value with ``--key value`` at runtime, which makes it easier to run the same experiment but just changing one or two parameters in the configuration. In a python script, the following code can be included .. code-block:: python from NEDAS.config import Config c = Config(parse_args=True) so that when the script is run on command line as .. code-block:: bash python script.py -c CONFIG_FILE --key value the config object ``c`` is created, whose attributes carry the configuration parameters. Alternatively, in an interactive environment such as a Jupyter notebook, the configuration object ``c`` can be initialized directly with .. code-block:: python from NEDAS.config import Config c = Config(config_file='CONFIG_FILE', key=value) Description of entries ---------------------- System paths and runtime environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 55 25 * - Entry - Description - Default * - ``work_dir`` - Working directory for running the analysis scheme. - 'work' * - ``directories`` - Runtime directory structure defined by format strings. - See details in Table 1. * - ``python_env`` - Initialization script to enter the python environment. If not None, at runtime ``". {python_env}"`` will source this script before running the python command. - None * - ``job_submit`` - Runtime job submitter settings, which are passed to :func:`NEDAS.utils.shell_utils.run_job` as kwargs. - See details in Table 2. * - ``nproc`` - Number of processors to use for the analysis step. - 10 * - ``nproc_mem`` - Number of processors in a "member group", which splits the MPI communicator ``comm`` of size ``nproc`` into ``comm_mem`` of size ``nproc_mem`` If None, no splitting is done. - None * - ``nproc_util`` - Number of processors to use for utility steps, such as preproc, postproc, diagnose, etc. If None, will use the same as ``nproc``. - None .. list-table:: Table 1. Breakdown of ``directories`` dictionary :header-rows: 1 :widths: 20 30 50 * - Key - Description - Default * - ``cycle_dir`` - Directory for each analysis cycle - '{work_dir}/cycle/{time:%Y%m%d%H%M}' * - ``analysis_dir`` - Directory for the assimilation step - '{work_dir}/cycle/{time:%Y%m%d%H%M}/analysis' * - ``forecast_dir`` - Directory for the ensemble forecast step - '{work_dir}/cycle/{time:%Y%m%d%H%M}/{model_name}' .. list-table:: Table 2. Breakdown of ``job_submit`` dictionary. :header-rows: 1 :widths: 20 30 50 * - Key - Description - Examples * - ``host`` - Host machine name, machine-specific behavior in job scheduling can be defined in the corresponding subclass in :doc:`NEDAS.job_submitters`. - None, 'laptop', 'betzy', etc. * - ``project`` - Project number for resource allocation - None, 'nn2993k', etc. * - ``queue`` - Name of the scheduler queue to submit jobs to - None, 'normal', 'devel', etc. * - ``scheduler`` - Scheduler type. Typically a separate :doc:`NEDAS.job_submitters` subclass is defined for each scheduler type. - None, 'slurm', 'oar', 'pbs', etc. * - ``ppn`` - Number of available processors per compute node - 128 Analysis scheme design parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Key - Description - Default * - ``nens`` - Ensemble size - 20 * - ``run_preproc`` - Whether to run the preprocessing step. - True * - ``run_forecast`` - Whether to run the ensemble forecast step. - True * - ``run_analysis`` - Whether to run the analysis step. - True * - ``run_diagnose`` - Whether to run the diagnostic tools. - True * - ``debug`` - If True, show extra debug message and output intermediate data during runtime. - False * - ``timer`` - If True, show elapsed time for each major steps in the workflow. - True * - ``step`` - Used by :mod:`NEDAS.schemes.filter`. If None, will run the entire workflow. Otherwise, will only run the specified step defined in the workflow at ``time``. - None, 'preprocess', 'postprocess', 'filter', 'perturb', 'diagnose', or 'ensemble_forecast'. Time controls ^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``time_start`` - Start time of the period of interest. - 2001-01-01T00:00:00Z * - ``time_end`` - End time of the period of interest. - 2001-01-30T00:00:00Z * - ``time_analysis_start`` - Time of the first analysis cycle. - 2001-01-07T00:00:00Z * - ``time_analysis_end`` - Time of the last analysis cycle. - 2001-01-28T00:00:00Z * - ``cycle_period`` - Interval in hours between analysis cycles. - 24 * - ``time`` - Time of the current analysis cycle. If None, will start at ``time_start``. - None * - ``obs_time_steps`` - Time steps in hours relative to the analysis for the observations. - [0] * - ``obs_time_scale`` - Smoothing window in hours for observations. - 0 * - ``state_time_steps`` - Time steps in hours relative to the analysis for the state variables. - [0] * - ``state_time_scale`` - Smoothing window in hours for state variables. - 0 Analysis grid definition ^^^^^^^^^^^^^^^^^^^^^^^^ The ``grid_def`` entry is a dictionary with the following entries: .. list-table:: :header-rows: 1 :widths: 20 55 25 * - Key - Description - Example * - ``type`` - Type of grid to use for the analysis step. If 'custom', the other entries will be used as kwargs in initializing a regular grid, see details in Table 3. If a model name is specified, the corresponding model grid will be used instead. - 'custom', 'qg', etc. * - ``mask`` - Mask for invalid points in the domain. If not None, the model name specifies which model generates the mask for the analysis grid. - None, 'qg', etc. .. list-table:: Table 3. Additional kwargs for custom regular grid generation. :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``proj`` - Map projection defined as PROJ4 strings - None, '+proj=stere +lat_0=90 +lon_0=-45' * - ``xmin`` - X coordinate start - 0 * - ``xmax`` - X coordinate end - 128 * - ``ymin`` - Y coordinate start - 0 * - ``ymax`` - Y coordinate end - 128 * - ``dx`` - Grid spacing Note: the coordinates and grid spacing should be in meters. But if proj is None, they can be nondimensional. - 1 * - ``centered`` - If True, the coordinates are defined at the center of each grid box. - False * - ``cyclic_dim`` - The dimension(s) that are cyclic - None, 'x', 'y', or 'xy' * - ``distance_type`` - Type of distance function - 'cartesian' or 'spherical' State definition ^^^^^^^^^^^^^^^^ The ``state_def`` entry is a list, each item is a dictionary that defines one model state variable: .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``name`` - Model state variable name. Corresponding to the keys in ``Model.variables`` implemented in the model interface - 'streamfunc' * - ``model_src`` - Name of the model this variable comes from. Should be one of the keys in ``model_def``. - 'qg' * - ``var_type`` - Variable type. - 'field', or 'scalar' * - ``err_type`` - Error distribution type. - 'normal' The ``model_def`` entry is a dictionary, with model_name as keys pointing to a dictionary of model-specific configuration parameters. .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``config_file`` - YAML configuration file for the model. If not specified, will use ``default.yml`` in the corresponding model module directory. Additional entries added below will overwrite the settings in the YAML file, making it easier to setup twin experiments. - None, 'models/qg/default.yml' * - ``model_env`` - Initialization script for model. At runtime ``". {model_env}"`` will source this script before running the model forecast. - 'setup.src' * - ``model_code_dir`` - Path to the model code. - '{nedas_root}/models/qg' * - ``nproc_per_run`` - Number of processors to use for a model forecast. - 1 * - ``nproc_per_util`` - Number of processors to use for utility functions. - 1 * - ``walltime`` - Maximum runtime in seconds for model forecast. - 3600 * - ``restart_dt`` - Model restart file saving interval in hours. - 24 * - ``forcing_dt`` - Model boundary condition interval in hours. - 24 * - ``ens_run_strategy`` - Strategy for running tasks involving an ensemble of tasks. 'scheduler': run each member as a separate job and distribute the workload using a :class:`Scheduler`. 'batch': run all members in a single job. - 'scheduler' or 'batch' * - ``use_job_array`` - Whether to use job array functionality when submitting the jobs via :class:`JobSubmitter`. - False * - ``ens_init_dir`` - Directory where the initial ensemble restart files are located. - '{work_dir}/init_ens' * - ``truth_dir`` - Directory where the truth files are located. If ``use_synthetic_obs`` this is mandatory. - '{work_dir}/truth' Observation definition ^^^^^^^^^^^^^^^^^^^^^^ The ``obs_def`` entry is a list, each item is a dictionary that defines one observation variable: .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``name`` - Observation variable name. Corresponding to the keys in ``Dataset.variables`` implemented in the dataset interface - 'velocity' * - ``dataset_src`` - Name of the dataset the observation comes from. Should be one of the keys in ``dataset_def``. - 'qg' * - ``model_src`` - Name of the model from which to compute the observation priors . - 'qg' * - ``nobs`` - Number of observations. If generating synthetic random observation network, use this to control the density. - 3000 * - ``err`` - Error definition dictionary. - See details in Table 4 * - ``hroi`` - Horizontal localization distance, radius of influence beyond which the observation impact is tapered to zero. In the same units as grid coordinates - inf, 10, etc. * - ``vroi`` - Vertical localization distance, in the same units as ``z_coords`` - inf * - ``troi`` - Temporal localization distance - inf * - ``impact_on_state`` - List of impact factors of this observation on the state variables. The unlisted variable has a default impact of 1.0 - { 'streamfunc': 0 }, which turns off the impact on streamfunc .. list-table:: Table 4. Breakdown of the observation error definition dictionary. :header-rows: 1 :widths: 20 55 25 * - Key - Description - Example * - ``type`` - Type of error distribution. - 'normal' * - ``std`` - Observation error standard deviation. - 1.0 * - ``hcorr`` - Horizontal correlation length in observation error. - 0 * - ``vcorr`` - Vertical correlation length in observation error. - 0 * - ``tcorr`` - Temporal correlation length in observation error. - 0 * - ``cross_corr`` - Cross-variable correlation in observation error. A dictionary {variable_name: corr} listing the correlation between self and other variable_name. Auto-correlation is always 1, so there is no need to include self in the dictionary. - {'streamfunc': 0} The ``dataset_def`` entry is a dictionary, with dataset_name as keys pointing to a dictionary of dataset-specific configuration parameters. .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``config_file`` - YAML configuration file for the dataset. If not specified, will use ``default.yml`` in the corresponding dataset module directory. Additional entries added below will overwrite the settings in the YAML file. - None, 'dataset/qg/default.yml' * - ``dataset_dir`` - Path to the dataset files - 'data' * - ``obs_window_min`` - Start of the observation window, hours relative to the analysis time. - -12 * - ``obs_window_max`` - End of the observation window, hours relative to the analysis time. - 12 Some additional parameters: .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Default * - ``use_synthetic_obs`` - Whether to use synthetic observations generated from the truth. - True * - ``shuffle_obs`` - Whether to randomize the order of observations. - False * - ``z_coords_from`` - Where the reference vertical coordinates come from. - 'mean' Perturbation ^^^^^^^^^^^^ The ``perturb`` entry is a list, each element is a dictionary with kwargs that will be passed to :func:`utils.random_perturb` to perform the perturbation. .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Default * - ``variable`` - Name of variable to be perturbed. - 'streamfunc' * - ``model_src`` - Name of the model the variable comes from. - 'qg' * - ``type`` - Type of random perturbation. Use ',' to join multiple options - 'gaussian,exp' * - ``amp`` - Amplitude of the perturbation. - 0.1 * - ``hcorr`` - Horizontal correlation length of the perturbation, in coordinate units - 15 * - ``tcorr`` - Temporal correlation length of the perturbation, in hours - 0 * - ``bounds`` - If set, the perturbed variable will remain in the value range. - [0, inf] If no perturbation is needed, you can also leave ``perturb`` as None. Assimilation method ^^^^^^^^^^^^^^^^^^^ The following parameters helps :func:`get_scheme` to locate the right subclass of :class:`Scheme`. Currently only 'filter' and 'forecast' schemes are implemented. For the specific ``filter_type``, the corresponding :class:`Assimilator` subclass will be chosen to perform the filter step. .. list-table:: :header-rows: 1 :widths: 20 35 45 * - Key - Description - Example * - ``scheme`` - Type of analysis scheme to use. - 'filter' * - ``assimilator`` - Type of filter - 'ETKF' (for batch mode), 'EAKF' (for serial mode) Assimilator-specific parameters are defined in ``assimilator`` dictionary. .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``config_file`` - YAML configuration file for the assimilator. If not specified, will use ``default.yml`` in the corresponding assimilator module directory. Additional entries added below will overwrite the settings. - None, 'assimilators/ETKF/default.yml' Alignment technique configuration is stored in ``alignment`` as a dictionary: .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``interp_displace`` - If True, use interpolation to find the variables on displaced analysis grid. If False, displace the grid coordinates directly. - False * - ``variable`` - Name of the variable the alignment is based on. - 'streamfunc' * - ``method`` - Optical flow method. - 'HS_pyramid' * - ``nlevel`` - Number of resolution levels in pyramic approach - 5 * - ``smoothness_weight`` - Weight in cost function to enforce smoothness of displace vector field - 1 Covariance inflation parameters are stored in the ``inflation`` entry as a dictionary. .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``type`` - Type of inflation (post/prior, multiplicative/RTPP). - 'post,RTPP' * - ``adaptive`` - Whether to run an adaptive inflation scheme. - True * - ``coef`` - Static inflation coefficient. - 1.0 Covariance localization settings are separately defined for the spatial and temporal components. The ``localization`` entry is a dictionary with keys ``horizontal``, ``vertical`` and ``temporal`` each pointing to a dictionary that defines its localization function parameters. .. list-table:: :header-rows: 1 :widths: 20 65 15 * - Key - Description - Example * - ``type`` - Type of localization function to use. Distance-based (GC, step, exp) or correlation-based (NICE) - 'GC' * - ``config_file`` - YAML configuration file for this type of localization. - None, 'default.yml' Multiscale approach configuration: .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``niter`` - Number of outer-loop iterations, e.g. number of scale components in a multiscale approach. - 1 * - ``iter`` - Current iteration number - 0 * - ``resolution_level`` - Resolution level (n) for the analysis grid. The analysis grid will have a resolution ``dx * 2**n`` where ``dx`` is the grid spacing defined in ``grid_def``. - [0] * - ``character_length`` - Characteristic length (in grid coordinate units) for each scale (large to small) - [1] * - ``localize_scale_fac`` - Scale factor for localization distances. - [1] * - ``obs_err_scale_fac`` - Scale factor for observation error inflation. - [1] Diagnostic methods ^^^^^^^^^^^^^^^^^^ The ``diag`` entry is a list, each element is a dictionary defining a diagnostic method to be run. .. list-table:: :header-rows: 1 :widths: 20 45 35 * - Key - Description - Example * - ``method`` - Name of the diagnostic method - 'misc.convert_output' * - ``config_file`` - YAML configuration file for the method. If not specified, will use ``default.yml`` in the corresponding method module directory. Additional entries added below will overwrite the settings. - None, 'diag/misc/convert_output/default.yml'