YAML Configuration Sub-Keys#

Note

Keys marked with an asterisk are optional and can be omitted.

input#

bk_query#

Key

Type

Meaning

bk_query

string

The bookkeeping location of the desired input data.

n_test_lfns*

int

The number of files to use as input to test jobs. Only to be used for samples with very few output candidates.

sample_fraction*

float

The sampling fraction to use when sampling the input LFNs to use in the production. For example, 0.1 will sample 10% of input files.

sample_seed*

string

The seed to use when sampling input LFNs. For example, HelloWorld

dq_flags*

sequence of strings

What quality of data to use. This can be set to any of BAD, OK, UNCHECKED or EXPRESS_OK (only for Runs 1 & 2). Multiple can be used at once by writing them as a sequence of values.

extended_dq_ok*

sequence of strings

In addition to requiring data quality (DQ) OK, extended DQ flags can be required such that runs without the specified subsystem DQ OK flag being set are not included.

runs*

sequence of integers

A sequence of data taking runs to use as input. This can either be written as a typical sequence or as A:B where runs from A to B inclusive will be used. Cannot be used with start_run/end_run.

start_run*

int

Filter the BK query output such that runs before this run number are excluded. Use with end_run, not with runs.

end_run*

int

Filter the BK query output such that runs after this run number are excluded. Use with start_run, not with runs.

input_plugin*

string

The input plugin setting, either default or by-run. default=="default"

keep_running*

bool

Whether to keep running on new data as it comes in. default==True

smog2_state*

string

Gas injected in SMOG2, possible choices are: [Hydrogen, Deuterium, Helium, Nitrogen, Oxygen, Neon, Argon, Krypton, Xenon]. 2 possible states: <Name>, <Name>Unstable or both. The word. Unstable is appended if the gas injected pressure is not stable.

Here is a full example showing a bk_query input using all optional keys:

job_name:
  input:
    bk_query: /some/MagUp/bookkeeping/path.DST
    n_test_lfns: 3
    dq_flags:
      - BAD
      - OK
    runs:
      - 269370
      - 269371
      - 269372
      # equivalent to 269370:269372
    input_plugin: by-run
    keep_running: True
    smog2_state:
      - Argon
      - ArgonUnstable

# Alternative using start_run and end_run instead of runs
job_name_alt:
  input:
    bk_query: /some/MagUp/bookkeeping/path.DST
    start_run: 269370
    end_run: 269372
    # This is equivalent to runs: ["269370:269372"]

job_name#

Key

Type

Meaning

job_name

string

The name of the job whose output should be the input of this job.

filetype*

string

The file type of the input file, for when your input job has multiple output files.

sample_fraction*

float

The sampling fraction to use (0.0 to 1.0).

sample_seed*

string

The seed to use for reproducible sampling.

Here is a full example showing a job_name input using all optional keys:

strip_job:
  bk_query: /some/MagUp/bookkeeping/path.DST
  options: strip.py

tuple_job:
  input:
    job_name: strip_job
    filetype: DST
  options: tuple.py

transform_ids#

Key

Type

Meaning

transform_ids

sequence of integers

A sequence of transformation IDs to use as input file sources.

filetype

string

The file type of the input file, for when your input job has multiple output files.

n_test_lfns*

int

The number of files to use as input to test jobs. Only to be used for samples with very few output candidates.

dq_flags*

sequence of strings

What quality of data to use. This can be set to any of BAD, OK, UNCHECKED or EXPRESS_OK (only for Runs 1 & 2). Multiple can be used at once by writing them as a sequence of values.

runs*

sequence of integers

A sequence of data taking runs to use as input. This can either be written as a typical sequence or as A:B where runs from A to B inclusive will be used.

keep_running*

bool

Whether to keep running on new data as it comes in.

sample_fraction*

float

The sampling fraction to use (0.0 to 1.0).

sample_seed*

string

The seed to use for reproducible sampling.

Here is a full example showing a transform_ids input using all optional keys:

job_name:
  input:
    transform_ids:
      - 1234
      - 5678
    filetype: DST
    n_test_lfns: 3
    dq_flags:
      - BAD
      - OK
    runs:
      - 269370
      - 269371
      - 269372
      # equivalent to 269370:269372

tags#

Key

Type

Meaning

wg

string

The working group that owns the sample data.

analysis

string

The name of the analysis to query samples for.

tags*

dict

Additional tags to filter sample data.

at_time*

datetime

The timestamp (UTC) at which to query the sample database.

n_test_lfns*

int

The number of files to use as input to test jobs.

keep_running*

bool

Whether to keep running on new data as it comes in.

sample_fraction*

float

The sampling fraction to use (0.0 to 1.0).

sample_seed*

string

The seed to use for reproducible sampling.

Here is a full example showing a job with tags input:

job_name:
  input:
    wg: B2CC
    analysis: my_analysis
    tags:
      polarity: magup
      year: "2018"
      config: lhcb
    n_test_lfns: 2
    sample_fraction: 0.1
    sample_seed: analysis_2024
  ...

options#

The options configuration defines how to execute the analysis job. There are two formats:

LbExec Options (Run 3+)#

For Run 3 and later applications using lbexec:

Example:

job_name:
  ...
  options:
    entrypoint: "my_production.script:my_job"
    extra_options:
      compression:
        optimise_baskets: false
    extra_args:
      - "do_this_thing"

Legacy Options (Run 1/2)#

In general, for simple cases with just option files, you should use the shorthand:

job_name:
  options:
    - "data_options.py"
    - "reco_options.py"

For Run 1 and Run 2 applications using gaudirun.py with options files:

Key

Type

Meaning

files

list of strings

List of Python options files for the application.

command*

list of strings

(Optional) command to call.

Example:

job_name:
  options:
    files:
      - "data_options.py"
      - "reco_options.py"
    command:
      - "gaudirun.py"
      - "-T"

Additional Job Configuration Keys#

The following additional keys can be used to configure job behavior:

Auto-Configuration Fields#

These fields are used when automatically_configure is enabled, and in general you won’t need to set them yourself. For run 3 applications you’ll need to use instead extra_options under options to set specific application options.

Production Metadata#

Examples:

my_job:
  # Auto-configure overrides (run 2 only)
  simulation: true
  data_type: "2018"
  dddb_tag: "dddb-20170721-3"
  conddb_tag: "cond-20170724"

  # Production metadata
  comment: "High priority analysis job"
  tags:
    campaign: "2024_analysis"
    priority: "urgent"

Job Recipes#

Recipes are predefined job configurations that can be used to simplify common analysis tasks. The recipe field allows you to specify a recipe that will automatically configure various job parameters.

Split Trees Recipe#

The split-trees recipe is used to split ROOT files based on key patterns, allowing you to separate different decay channels or data types into separate output files.

Key

Type

Meaning

name

string

Must be “split-trees”.

split

list of objects

List of splitting configurations, each with “key” and “into” fields.

Each split configuration has:

Example:

split_job:
  wg: B2CC
  inform: [alice]
  input:
    bk_query: /some/path/data.ROOT
  recipe:
    name: "split-trees"
    split:
      - key: "Tuple_SpruceSLB_(Bc).*?/DecayTree"
        into: "BC.ROOT"
      - key: "Tuple_SpruceSLB_(Bu).*?/DecayTree"
        into: "BU.ROOT"

Filter Trees Recipe#

The filter-trees recipe is used to apply filtering operations to ROOT trees.

Key

Type

Meaning

name

string

Must be “filter-trees”.

entrypoint

string

The filtering entrypoint in format “module:function”.

Example:

filter_job:
  wg: Charm
  inform: [bob]
  input:
    bk_query: /some/path/raw_data.ROOT
  recipe:
    name: "filter-trees"
    entrypoint: "MyAnalysis.filter_script:run_preselection"

Expand BK Path Recipe#

The expand recipe is used to expand a single job definition into multiple jobs by substituting variables in the bookkeeping path.

Key

Type

Meaning

name

string

Must be “expand”.

path

string

BK path template with format string placeholders.

substitute

dict

Variables to substitute in the path. Values can be strings or lists.

Example:

template_job:
  wg: B2OC
  inform: [charlie]
  recipe:
    name: "expand"
    path: "/LHCb/Collision24/Beam6800GeV-VeloClosed-{polarity}/Real Data/Sprucing{sprucing}/{stream}/CHARM.DST"
    substitute:
      polarity: ["MagUp", "MagDown"]
      sprucing: ["24c3", "24c2"]
      stream: "94000000"
  options: "charm_analysis.py"
  output: "CHARM.ROOT"

This will generate 4 separate jobs (one for each combination of polarity and sprucing) with appropriate BK paths.