Configuration

The entry point for the user is the configuration file, by default profit.yaml which is located in the base directory. Here, parameters, paths, variable names, etc. are stored. For more information on the single modules, see the corresponding section in Components.

The configuration file, profit.yaml (or a .py file containing python dictionaries of the parameters) contains all parameters for running the simulation, fitting and active learning. Examples and a full list of available options, as well as the default values, which are located in the file profit/defaults.py, are shown below. For the run system, the default values are documented in the individual classes, see the API reference.

Structure

The structure of the configuration represents the different modes in which proFit can be executed.

  • Base config

    Declares general parameters, like number of samples, inclusion of external files and variables.

  • Run config

    Defines runner, interface and worker, as well as pre- and postprocessing steps.

  • Fit config

    Sets parameters for the surrogate model.

  • Active learning config

    AL has a separate configuration, since it can be extensive. Includes choice of algorithm, acquisition function, number of warmup runs, etc.

  • UI config

    For now, it implements only one option, i.e. to show directly show a figure after calling profit fit. It is planned to extend this section to set specific parameters inside the GUI.

All configs set the default values at first, which are then overwritten by the user inputs. Thereafter, inside the method process_entries, the user inputs are standardized (e.g. convert relative paths to absolute, convert strings to floats if possible, etc.).

Some parameters are themselves sub configurations, e.g. the runner (local or slurm) or active learning algorithms (standard AL or MCMC). These have themselves again different parameters.

The code structure is similar to the other modules of proFit: a hierarchical class structure, where custom configurations can be registered (see also Custom extensions). In the case custom components are implemented, but have no corresponding configuration, a DefaultConfig is used, which just returns the user parameters without modifications.

Examples

Minimal configuration

Parameters that are mentioned in the following description but do not occur in the configuration file are set by default values:

  • The configuration executes \(10\) (ntrain) runs of a script (simulate.x) locally on all available CPUs when providing the command profit run, with one input variable (x) drawn from a uniform random distribution on the interval [0, 1] (for further information on variables, see Variables. For the run system, see The Run System).

  • The input file containing the variable \(x\) is found in the template directory.

  • The script writes the output in json format to stdout. After all runs are finished, the total input and output data is saved to input.txt and output.txt, respectively.

  • Using the command profit fit, the default GPySurrogate is used to fit the data with initial fit hyperparameters incurred from the data directly and the model is saved to the file model_GPy.hdf5 (for further information on surrogate models, see Surrogate models).

  • Thereafter, the data and fit can be viewed in a graphical user interface using profit ui (For more information on the UI, see User Interface).

ntrain: 10
variables:
    x: Uniform()
    f: Output
run:
    command: ./simulate.x

Run on cluster

Example for executing a simulation with GORILLA. See Cluster Support for more details.

ntrain: 100
variables:
    # normalized collisionality
    nu_star: LogUniform(1e-3, 1e-1)
    # mach number
    v_E: Normal(0, 2e-4)
    # Energy in eV
    E: 3000
    # particle species (1 = electrons, 2 = deuterium ions)
    species: 1
    # number of particles (for the monte carlo simulation)
    n_particles: 10000
    # mono energetic radial diffusion coefficient
    D11: Output
    D11_std: Output

run:
    runner:
        class: slurm
        OpenMP: True
        cpus: all
        options:
            job-name: profit-example
            partition: compute
            time: 24:00:00
    interface:
        class: zeromq
        port: 9100
    worker:
        class: command
        command: ./mono_energetic_transp_main.x
        pre:
            class: template
            path: ./template
            param_files: [mono_energetic_transp_coef.inp, gorilla.inp]
        post:
            class: numpytxt
            path: nustar_diffcoef_std.dat
            names: "IGNORE D11 D11_std"

Full list of options

Below all available options with their respective default values are shown.

Base config

base_dir: Current working directory  # Directory where the `profit.yaml` file is located.
run_dir: Current working directory  # Directory where the single runs are generated.
config_file: profit.yaml  # Name of this file.
include: []  # Paths to external files (e.g. custom components), which are loaded in the beginning.
files:
    input: input.txt  # Input variables of all runs.
    output: output.txt  # Collected output of all runs.
ntrain: 10  # Number of training runs.
variables: {}  # Definition of variables.

Run config

run:
    runner: fork  # Local runner with its default parameters (see below).
    interface: memmap  # Numpy memmap interface with its default parameters.
    worker: command  # Command worker with its default parameters
    debug: false  # override debug for Worker & Runner
All runners
runner:
    debug: false
    parallel: 0  # maximum number of parallel Workers. 0 means no limit
    sleep: 0.1  # sleep time in s between polling
    logfile: runner.log
Fork runner
runner:
    class: fork  # For fast local execution
    parallel: all  # Number of CPUs used. 'all' infers the number of available CPUs
Local runner
runner:
    class: local  # For local execution.
    parallel: all  # Number of CPUs used. 'all' infers the number of available CPUs
    command: profit-worker  # override command to start the Worker
Slurm runner
runner:
    class: slurm  # For clusters with SLURM interface.
    path: slurm.bash  # Path to SLURM script which is generated.
    custom: False  # Use a custom script instead.
    openmp: False  # Insert OpenMP options in SLURM script.
    cpus: 1  # Number of CPUs to allocate per Worker
    options:  # SLURM options.
        job-name: profit
Memmap interface
interface:
    class: memmap  # Using a memory mapped array (with numpy memmap).
    path: interface.npy  # Path to interface file.
ZeroMQ interface
interface:
    class: zeromq  # Using a lightweight message queue (with ZeroMQ).
    transport: tcp  # ZeroMQ transport protocol
    port: 9000  # port of the Runner Interface
    timeout: 4  # connection timeout when waiting for an answer in seconds (Worker)
    retries: 3  # number of tries to establish a connection (Worker)
    retry_sleep: 1  # sleep time in seconds between each retry (Worker)
    address: ~  # override ip address or hostname of the Runner Interface (default: localhost, automatic with Slurm)
    connection: ~  # override for the ZeroMQ connection spec (Worker side)
    bind: ~  # override for the ZeroMQ bind spec (Runner side)
Command Worker
worker:
    class: command
    command: ./simulation
    pre: template  # Preprocessor
    post: numpytxt  # Postprocessor
    stdout: stdout  # path to log of the simulation's stdout.
    stderr: ~  # path to log of the simulation's stderr. None means output as Worker stderr
    debug: false
    log_path: log
Template preprocessor
pre:
    class: template  # Variables are inserted into the template files.
    clean: true  # whether to clean the run directory after completion
    path: template  # Path to template directory
    param_files: None  # List of relevant files for variable replacement. None: Search all.
JSON postprocessor
post:
    class: json  # Reads output from a json formatted file.
    path: stdout  # Path to simulation output
Numpytxt postprocessor
post:
    class: numpytxt  # Reads output from a tabular text file (e.g. csv, tsv) with numpy genfromtxt.
    path: stdout  # Path to simulation output
    names: ~  # Collect only these variable names from output file.
    options:  # Options for numpy genfromtxt.
        deletechars: ""
HDF5 postprocessor
post:
    class: hdf5  # Reads output from an hdf5 file.
    path: output.hdf5  # Path to simulation output

Fit config

fit:
    surrogate: GPy  # Surrogate model used.
    save: ./model.hdf5  # Path where trained model is saved.
    load: False  # Path to existing model, which is loaded.
    fixed_sigma_n: False  # True constrains the data noise hyperparameter to its initial value.
    encoder:
        - class: Exclude  # Exclude constant variables from fit.
          variables: Constant
          parameters: {}
        - class: Log10  # Transform LogUniform variables logarithmically.
          variables: LogUniform
          parameters: {}
        - class: Normalization  # Normalize all input and output variables (zero mean, unit variance, n-dimensional 1-cube).
          variables: all
          parameters: {}
    kernel: RBF  # Kernel used for fitting. Also sum (e.g. RBF+Matern32) andd product kernels are possible.
    hyperparameters:  # Initial hyperparameters of the surrogate model.
        length_scale: None  # None: Inferred from training data.
        sigma_f: None  # Scaling parameter of surrogate model.
        sigma_n: None  # Data noise (standard deviation).

Active learning config

active_learning:
    algorithm: simple  # Algorithm to be used. Either SimpleAL or McmcAL.
    nwarmup: 3  # Number of warmup points.
    batch_size: 1  # Number of candidates which are learned in parallel.
    convergence_criterion: 1e-5  # Not yet implemented.
    nsearch: 50  # Number of candidate points per dimension.
    make_plot: False  # Plot each learning step.
    save_intermediate:  # Save model and data after each learning step.
        model_path: ./model.hdf5
        input_path: ./input.txt
        output_path: ./output.txt
    resume_from: None  # Float of the last run from where AL is resumed with saved model and data files.
Simple active learning
algorithm:
    class: simple  # Standard active learning algorithm.
    acquisition_function: simple_exploration  # Function to select next candidates.
    save: True  # Save active learning model after training.
MCMC
algorithm:
    class: mcmc  # MCMC model.
    reference_data: ./yref.txt  # Path to experimental data.
    warmup_cycles: 1  # Number of MCMC warmup cycles.
    target_acceptance_rate: 0.35  # Optimal acceptance rate to be reached after warmup.
    sigma_n: 0.05  # Estimated data noise (standard deviation).
    initial_points: None  # List of initial MCMC points.
    last_percent: 0.25  # Fraction of the main learning loop used to calculate posterior mean and standard deviation.
    save: ./mcmc_model.hdf5  # Path where MCMC model is saved.
    delayed_acceptance: False  # Use delayed acceptance with a surrogate model of the likelihood function.
Acquisition functions
Simple exploration
acquisition_function:
    class: simple_exploration  # Minimize variance.
    use_marginal_variance: False  # Add variance occurring through hyperparameter changes.
profit.al.acquisition_functions.SimpleExploration
Exploration with distance penalty
acquisition_function:
    class: exploration_with_distance_penalty  # Penalize nearby points.
    use_marginal_variance: False  # Add variance occurring through hyperparameter changes.
    weight: 10  # Exponential weight of penalization.
profit.al.acquisition_functions.ExplorationWithDistancePenalty
Weighted exploration
acquisition_function:
    class: weighted_exploration  # Trade-off between posterior surrogate mean maximization and variance minimization.
    use_marginal_variance: False  # Add variance occurring through hyperparameter changes.
    weight: 0.5  # Balance between mean and variance: weight * mean_part + (1 - weight) * variance_part
profit.al.acquisition_functions.WeightedExploration
Probability of improvement
acquisition_function:
    class: probability_of_improvement
profit.al.acquisition_functions.ProbabilityOfImprovement
Expected improvement
acquisition_function:
    class: expected_improvement  #
    exploration_factor: 0.01  # 0: Only maximization of improvement. 1: Emphasize on exploration.
    find_min: False  # Find the minimum of a function instead of the maximum.
profit.al.acquisition_functions.ExpectedImprovement
Expected improvement 2
acquisition_function:
    class: expected_improvement_2  # Same as Expected improvement, but with different approximation for parallel AL.
    exploration_factor: 0.01  # 0: Only maximization of improvement. 1: Emphasize on exploration.
    find_min: False  # Find the minimum of a function instead of the maximum.
profit.al.acquisition_functions.ExpectedImprovement2
Alternating exploration
acquisition_function:
    class: alternating_exploration  # Alternating between simple exploration and expected improvement.
    use_marginal_variance: False  # Add variance occurring through hyperparameter changes.
    exploration_factor: 0.01  # 0: Only maximization of improvement. 1: Emphasize on exploration.
    find_min: False  # Find the minimum of a function instead of the maximum.
    alternating_freq: 1  # Frequency of learning loops to change between expected improvement and exploration.
profit.al.acquisition_functions.AlternatingExploration

UI config

ui:
    plot: False  # Directly show figure after executing `profit fit`. Only possible for <= 2D.