YAML Configuration Sub-Keys#

This page provides detailed reference documentation for all configuration keys available in info.yaml files. For a tutorial-style introduction to creating analysis productions, see Getting Started.

Note

Keys marked with an asterisk (*) are optional and can be omitted.

input#

bk_query#

Key

Type

Meaning

bk_query

string

The bookkeeping location of the desired input data.

n_test_lfns*

int

The number of files to use as input to test jobs. Only to be used for samples with very few output candidates.

sample_fraction*

float

The sampling fraction to use when sampling the input LFNs to use in the production. For example, 0.1 will sample 10% of input files.

sample_seed*

string

The seed to use when sampling input LFNs. For example, HelloWorld

dq_flags*

sequence of strings

What quality of data to use. This can be set to any of BAD, OK, UNCHECKED or EXPRESS_OK (only for Runs 1 & 2). Multiple can be used at once by writing them as a sequence of values.

extended_dq_ok*

sequence of strings

Require specific subsystem DQ flags in addition to the overall DQ. Runs missing these flags are excluded. Example: [VELO_OK, IT_OK].

runs*

sequence of integers

A sequence of data taking runs to use as input. This can either be written as a typical sequence or as A:B where runs from A to B inclusive will be used. Cannot be used with start_run/end_run.

start_run*

int

Filter the BK query output such that runs before this run number are excluded. Use with end_run, not with runs.

end_run*

int

Filter the BK query output such that runs after this run number are excluded. Use with start_run, not with runs.

input_plugin*

string

Input plugin: default or by-run for run-by-run job submission. Defaults to default. Most users should not change this.

keep_running*

bool

Whether to keep running on new data as it comes in. Defaults to true.

smog2_state*

string

Gas injected in SMOG2, possible choices are: [Hydrogen, Deuterium, Helium, Nitrogen, Oxygen, Neon, Argon, Krypton, Xenon]. 2 possible states: <Name>, <Name>Unstable or both. The word. Unstable is appended if the gas injected pressure is not stable.

Here is a full example showing a bk_query input using all optional keys:

job_name:
  input:
    bk_query: /some/MagUp/bookkeeping/path.DST
    n_test_lfns: 3
    dq_flags:
      - BAD
      - OK
    runs:
      - 269370
      - 269371
      - 269372
      # equivalent to 269370:269372
    input_plugin: by-run
    keep_running: True
    smog2_state:
      - Argon
      - ArgonUnstable

# Alternative using start_run and end_run instead of runs
job_name_alt:
  input:
    bk_query: /some/MagUp/bookkeeping/path.DST
    start_run: 269370
    end_run: 269372
    # This is equivalent to runs: ["269370:269372"]

job_name#

Key

Type

Meaning

job_name

string

The name of the job whose output should be the input of this job.

filetype*

string

The file type of the input file, for when your input job has multiple output files.

sample_fraction*

float

The sampling fraction to use (0.0 to 1.0).

sample_seed*

string

The seed to use for reproducible sampling.

Here is a full example showing a job_name input using all optional keys:

strip_job:
  bk_query: /some/MagUp/bookkeeping/path.DST
  options: strip.py

tuple_job:
  input:
    job_name: strip_job
    filetype: DST
  options: tuple.py

transform_ids#

Key

Type

Meaning

transform_ids

sequence of integers

A sequence of transformation IDs to use as input file sources.

filetype

string

The file type of the input file, for when your input job has multiple output files.

n_test_lfns*

int

The number of files to use as input to test jobs. Only to be used for samples with very few output candidates.

dq_flags*

sequence of strings

What quality of data to use. This can be set to any of BAD, OK, UNCHECKED or EXPRESS_OK (only for Runs 1 & 2). Multiple can be used at once by writing them as a sequence of values.

runs*

sequence of integers

A sequence of data taking runs to use as input. This can either be written as a typical sequence or as A:B where runs from A to B inclusive will be used.

keep_running*

bool

Whether to keep running on new data as it comes in.

sample_fraction*

float

The sampling fraction to use (0.0 to 1.0).

sample_seed*

string

The seed to use for reproducible sampling.

Here is a full example showing a transform_ids input using all optional keys:

job_name:
  input:
    transform_ids:
      - 1234
      - 5678
    filetype: DST
    n_test_lfns: 3
    dq_flags:
      - BAD
      - OK
    runs:
      - 269370
      - 269371
      - 269372
      # equivalent to 269370:269372

tags#

Key

Type

Meaning

wg

string

The working group that owns the sample data.

analysis

string

The name of the analysis to query samples for.

tags*

dict

Additional tags to filter sample data.

at_time*

datetime

The timestamp (UTC) at which to query the sample database.

n_test_lfns*

int

The number of files to use as input to test jobs.

keep_running*

bool

Whether to keep running on new data as it comes in.

sample_fraction*

float

The sampling fraction to use (0.0 to 1.0).

sample_seed*

string

The seed to use for reproducible sampling.

Here is a full example showing a job with tags input:

job_name:
  input:
    wg: B2CC
    analysis: my_analysis
    tags:
      polarity: magup
      year: "2018"
      config: lhcb
    n_test_lfns: 2
    sample_fraction: 0.1
    sample_seed: analysis_2024
  ...

options#

The options configuration defines how to execute the analysis job. There are two formats:

LbExec Options (Run 3+)#

For Run 3 and later applications using lbexec:

Key

Type

Meaning

entrypoint

string

The entry point in the format ‘module:function’.

extra_options*

dict

Additional YAML configuration options passed to the application.

extra_args*

list of strings

Additional command line arguments passed to the application.

Example:

job_name:
  ...
  options:
    entrypoint: "my_production.script:my_job"
    extra_options:
      compression:
        optimise_baskets: false
    extra_args:
      - "do_this_thing"

Legacy Options (Run 1/2)#

In general, for simple cases with just option files, you should use the shorthand:

job_name:
  options:
    - "data_options.py"
    - "reco_options.py"

For Run 1 and Run 2 applications using gaudirun.py with options files:

Key

Type

Meaning

files

list of strings

List of Python options files for the application.

command*

list of strings

(Optional) command to call.

Example:

job_name:
  options:
    files:
      - "data_options.py"
      - "reco_options.py"
    command:
      - "gaudirun.py"
      - "-T"

Additional Job Configuration Keys#

The following additional keys can be used to configure job behavior:

Auto-Configuration Fields#

These fields are used when automatically_configure is enabled, and in general you won’t need to set them yourself. For run 3 applications you’ll need to use instead extra_options under options to set specific application options.

Key

Type

Meaning

simulation*

bool

Whether this job processes simulation (MC) data. Auto-detected if automatically_configure is enabled.

luminosity*

bool

Whether luminosity information should be included. Auto-detected if automatically_configure is enabled.

data_type*

string

The data taking period/year (e.g., “2018”, “2024”, “Upgrade”). Auto-detected if automatically_configure is enabled.

input_type*

string

The type of input files (“DST”, “MDST”, “RAW”). Auto-detected if automatically_configure is enabled.

dddb_tag*

string

The detector description database tag to use.

conddb_tag*

string

The conditions database tag to use.

Production Metadata#

Key

Type

Meaning

priority*

string

DIRAC request priority (“1a”, “1b”, “2a”, “2b”). Default: “1b”.

completion_percentage*

float

Target completion percentage for the job (10-100%). Default: 100.0.

comment*

string

Optional comment for the DIRAC production request.

tags*

dict

Additional metadata tags for the job.

Examples:

my_job:
  # Auto-configure overrides (run 2 only)
  simulation: true
  data_type: "2018"
  dddb_tag: "dddb-20170721-3"
  conddb_tag: "cond-20170724"

  # Production metadata
  comment: "High priority analysis job"
  tags:
    campaign: "2024_analysis"
    priority: "urgent"

Job Recipes#

Recipes are predefined job configurations that can be used to simplify common analysis tasks. The recipe field allows you to specify a recipe that will automatically configure various job parameters.

Split Trees Recipe#

The split-trees recipe is used to split ROOT files based on key patterns, allowing you to separate different decay channels or data types into separate output files.

Key

Type

Meaning

name

string

Must be “split-trees”.

split

list of objects

List of splitting configurations, each with “key” and “into” fields.

Each split configuration has:

Key

Type

Meaning

key

string

Regular expression pattern to match ROOT keys (see note below).

into

string

Output file type (uppercase, ending in .ROOT).

Note

ROOT files contain directories (TDirectory) with trees inside them. Keys are used to match directories only and have the format DirectoryName, for example for a TTree Tuple_Xi/DecayTree you should specify Tuple_Xi.

The key pattern is a regular expression matched against these full paths. Common patterns:

  • Tuple_Xi – exact match for a specific tree

  • Tuple_(Bc|B) – match either a Tuple_Bc or Tuple_B tuple directory

  • Tuple_(Bc).*? – match any TDirectory key starting with Tuple_Bc (less specific)

Use rootls -l file.root to inspect the key structure of your ROOT files.

Example:

split_job:
  wg: B2CC
  inform: [alice]
  input:
    bk_query: /some/path/data.ROOT
  recipe:
    name: "split-trees"
    split:
      - key: "Tuple_SpruceSLB_(Bc).*?/DecayTree"
        into: "BC.ROOT"
      - key: "Tuple_SpruceSLB_(Bu).*?/DecayTree"
        into: "BU.ROOT"

Filter Trees Recipe#

The filter-trees recipe is used to apply filtering operations to ROOT trees.

Key

Type

Meaning

name

string

Must be “filter-trees”.

entrypoint

string

The filtering entrypoint in format “module:function”.

extra_args*

list of strings

Additional arguments passed to the filtering function.

Example:

filter_job:
  wg: Charm
  inform: [bob]
  input:
    bk_query: /some/path/raw_data.ROOT
  recipe:
    name: "filter-trees"
    entrypoint: "MyAnalysis.filter_script:run_preselection"
    extra_args:
      - "--source={{sample_type}}"
      - "--year={{year}}"

See Passing arguments to filtering scripts for details on how to use extra_args in your filtering function.

Expand BK Path Recipe#

The expand recipe is used to expand a single job definition into multiple jobs by substituting variables in the bookkeeping path.

Key

Type

Meaning

name

string

Must be “expand”.

path

string

BK path template with format string placeholders.

substitute

dict

Variables to substitute in the path. Values can be strings or lists.

Example:

template_job:
  wg: B2OC
  inform: [charlie]
  recipe:
    name: "expand"
    path: "/LHCb/Collision24/Beam6800GeV-VeloClosed-{polarity}/Real Data/Sprucing{sprucing}/{stream}/CHARM.DST"
    substitute:
      polarity: ["MagUp", "MagDown"]
      sprucing: ["24c3", "24c2"]
      stream: "94000000"
  options: "charm_analysis.py"
  output: "CHARM.ROOT"

This will generate 4 separate jobs (one for each combination of polarity and sprucing) with appropriate BK paths.