Getting Started

This guide aims to give a rough overview over a full proFit workflow. Further examples of the configuration can be found in examples/ and in tests/integration_tests/. If you want to just use a few features have a look at examples/api/, tests and the API reference.

_images/profit_workflow.png

The typical profit workflow

Example API Notebooks:

The Configuration File

proFit is controlled almost entirely via a single configuration file which is usually called profit.yaml. Other filenames are permitted and the respective path can be given as an argument to the profit commands. The configuration is typically written using the YAML file format.

The configuration uses mostly a hierachical structure closely mirroring the structure of the different components. Different implementations of the same component are usually selected using the class attribute. A shorter notation uses the label of the component directly and uses default values for all it’s parameters. The following two snippets therefore are the same:

run:
  interface:
    class: zeromq
run:
  interface: zeromq

There are also shorthand notations for Variables and Encoders.

Set up simulation

proFit currently distinguishes two types of simulations:

  • executables with input and output files
    the simulation reads its input parameters from a file, proFit prepares the file using a Preprocessor
    the simulation writes its ouput values to a file, proFit read the file using a Postprocessor
  • Python simulations which are called from a function
    the input parameters are the arguments of the function
    the output parameters are the return values
    proFit can wrap this function using a custom Worker

For executable simulations, the first step is usually ensuring that they are installed properly and run within a dedicated working directory. Find out which environment variables need to be set and link all relevant files. This directory will become the template directory which proFit will (usually) copy for each run of the simulation. Furthermore the simulation will just inherit the environment proFit was started in. A typical directory structure could look like this:

study/
  profit.yaml
  template/
    simulation.x  ->  /path/to/simulation/executable.x
    params.txt

Function simulations can be run without any run directories.

Variables

The configuration file usually begins with ntrain: which set the desired number of data points. The next section is usually variables. In it the different input parameters and output values are defined. For the parameters a suitable distribution from which the samples will be taken is selected. Additionally independent variables allow the output values to be vectors and constant values can also be set directly from proFit. More details can be found in Variables.

ntrain: 100
variables:
  u: Uniform(4.7, 5.3)
  v: Uniform(0.55, 0.6)
  n: 10000
  f: Output

Pre- & Postprocessor

An executable simulation needs a Preprocessor and Postprocessor to prepare the input parameters and collect the results.

The recommended Preprocessor is the profit.run.default.TemplatePreprocessor which will fill placeholders in the template directory with the corresponding values.

For the output values proFit currently supports JSON, HDF5 and CSV/TSV via three different Postprocessors. In addition it is easy to add a custom Postprocessor, see Custom extensions. All the configuration options are given in Configuration. A possible configuration is given below:

run:
  worker:
    class: command
    command: ./simulation
    pre:
      class: template
      path: ./path/to/template_directory  # relative to the base directory
      param_files:
        - params.txt
    post: json

The contents of params.txt could look like this:

# just a plain csv
# u, v, n, m
{u}, {v}, {n}, 10

Python Simulation

For a simulation which can be called from python directly, the recommended configuration is different and uses Custom extensions instead. A python function simulation which takes the input parameters as arguments and returns the output values can be registered with proFit in the following way:

from profit.run import Worker

@Worker.wrap("my_name")
def simulation(u, v) -> "f":
    ...

The type annotation is used to tell proFit which return value belongs to which return value if there are several. The configuration is then:

include: path_to_simulation.py
run:
  worker: my_name

Interface & Runner

The other two main components of the run system are the Interface and the Runner itself. These play a vital role if the simulation should be scheduled on a cluster, rather than run locally. For more information see Cluster Support.

Next steps

Everything should be ready to run now:

  • calling profit run will start start ntrain simulations with different parameters and collect their results

  • calling profit fit will then use these results to fit a surrogate model which is configured in the fit section (see Surrogate models)

  • with Active Learning enabled, the fit will already happen during the run step. Active Learning optimizes the paramters at which the simulation is run to gain as much value per simulation run as possible

  • finally the results can be explored interactively in a browser after starting a plotly/dash server using profit ui (see User Interface)

There is a wide variety of configuration options to customize the run system, the surrogate fitting and the active learning algorithm. Please have a look at the documentation on the Configuration and don’t hestitate to contact the developers if you encounter any bugs.