Advanced Usage

The Expression Parser

A user can define new observables for their analyses. These new observables must be arithmetic expressions of already defined observables and/or parameters. Observables defined in this way benefit from the same optimizations as all built-in observables, including multi-threaded evaluation and caching. This paragraph describes and exemplifies the syntax of the expression parser that interprets user defined python strings as new EOS observables.

The Construction of Expressions

The Rules

Basic rules are used to parse the input string

  • Spaces are ignored and can be added arbitrarily for readability.

  • The parser supports usual arithmetic operations +, -, *, / and ^ and parenthesized expressions; usual precedence rules of arithmetics apply.

  • <<...>> encapsulates the name of an EOS object, which must either be the name of parameter or an observables. Any such must therefore adhere to the restrictions of eos.QualifiedName, for example <<mass::mu>> or <<B_u->lnu::BR>>.

The following strings are valid observable expressions

  • "(<<mass::B_d>>^2 - 4 * <<mass::mu>>^2) ^ 0.5"

  • "1.0 / <<B_u->lnu::BR@l=mu>>"

Aliasing

By default, the kinematic arguments of the observables are transferred to the expression. For example, the expression 1.0 / <<B->pilnu::dBR/dq2>> will expect a kinematic specification for q2 (either through an eos.Kinematic object or indirectly in a plotting routine). When more than one observable appears in the expression, it is useful to rename the kinematic variables. This can be done via an alias specification <<...>>[...]. Two types of specification are supported

  • the = operator fixes a kinematic variable to a given value. E.g. <<B->pilnu::BR>>[q2_min=0.1] only expects a specification for q2_max.

  • the => operator renames the kinematic variable on its left-hand side to the name on the right-hand side. E.g. <<B_u->lnu::BR@l=mu>>[q2=>q2_mu] / <<B_u->lnu::BR@l=e>>[q2=>q2_e] requires two kinematic specifications, q2_mu and q2_e.

Note that these specifications can be combined in comma-separated list, for example [q2_min=1.0, q2_max=>q2_mu].

The Insert Method

Once a new observable is defined via its expression string, it can be added to the list of observables via the insert method

eos.Observables().insert(name, latex, unit, options, expression)

where name, latex and unit are the (Qualified)name, the latex representation and the unit of the new observable; options takes an eos.Options object and allows to specify global options (i.e. applied to all observables in the expression); and expression is the expression string to be parsed.

We conclude with a concrete example

eos.Observables().insert('B->Kll::R_K_example', R'(R_K)', eos.Unit.Unity(), eos.Options(),
                         '( <<B->Kll::BR;l=mu>>[q2_max=6, q2_min=>q2_mu_min] / <<B->Kll::BR;l=e>>[q2_max=6,q2_min=>q2_e_min] )')

R_K = eos.Observable.make('B->Kll::R_K_example', eos.Parameters.Defaults(), eos.Kinematics(q2_e_min=1.1, q2_mu_min=1.1), eos.Options(**{'tag':'BFS2004'}))

R_K.evaluate()   # should be ~1

The EOS Command-Line Interface

Although using EOS within an interactive Jupyter notebook on your personal computer or laptop is useful to prototype an analysis, this approach sometimes suffers from limited computing power. To circumvent this problem, you can alternatively

  • use EOS in Jupyter interactively on a remote workstation computer via an SSH tunnel (see the FAQ);

  • use EOS on remote workstations or compute clusters via the command-line interface.

In the following we document the command-line interface and the file format used in conjunction with it.

Note

The EOS command-line interface is completely optional and does not provide any means beyond the interactive Python interface.

The Analysis Description Format

EOS uses a YAML file to describe the individual steps of one or more statistical analyses. At the top level, the format includes the following YAML keys:

  • priors (mandatory) — The list of priors within the analysis.

  • likelihoods (mandatory) — The list of likelihoods within the analysis..

  • posteriors (mandatory) — The list of posteriors within the analysis.

  • predictions (optional) — The list of theory predictions within the analysis.

Describing Priors

The priors key contains a list of named priors. Each prior has two mandatory keys:

  • name (mandatory) — The unique name of this prior.

  • parameters (mandatory) — The ordered list of parameters described by this prior.

The description of each individual parameter follows the prior description used in the Analysis constructor.

Describing Likelihoods

The likelihoods key contains a list of named likelihoods. Each likelihood has two mandatory keys:

  • name (mandatory) — The unique name of this likelihood.

  • constraints (mandatory) — The ordered list of EOS constraint names that comprise this likelihood.

Describing Posteriors

The posteriors key contains a list of named posteriors. Each posterior can contain several keys:

  • name (mandatory) — The unique name of this posterior.

  • global_options (optional) — A key/value map providing global options, i.e., options that apply to all observables used by this posterior.

  • prior (mandatory) — The ordered list of named priors that are used as part of this posterior.

  • likelihood (optional) — The ordered list of named likelihoods that are used as part of this posterior.

  • fixed_parameter (optional) — A key/value map providing values for parameters that deviate from the default values.

Example

priors:
  - name: CKM
    parameters:
      - parameter: CKM::abs(V_ub)
        min: 2.0e-3
        max: 5.0e-3
        type: uniform

  - name: FF-BCL2008
    parameters:
      - parameter: B->pi::f_+(0)@BCL2008
        min: 0.2
        max: 0.4
        type: uniform
      - parameter: B->pi::b_+^1@BCL2008
        min: -20.0
        max: +20.0
        type: uniform
      - parameter: B->pi::b_+^2@BCL2008
        min: -20.0
        max: +20.0
        type: uniform

likelihoods:
  - name: theory
    constraints:
      - B->pi::f_+@IKMvD:2014A

  - name: BaBar
    constraints:
      - B^0->pi^+lnu::BR@BaBar:2010B
      - B^0->pi^+lnu::BR@BaBar:2012D

  - name: Belle
    constraints:
      - B^0->pi^+lnu::BR@Belle:2010A
      - B^0->pi^+lnu::BR@Belle:2013A

posteriors:
  - name: th+exp
    global_options:
      model: CKM
      form-factors: BCL2008
    prior:
      - CKM
      - FF-BCL2008
    likelihood:
      - theory
      - BaBar
      - Belle

predictions:
  - name: differential
    global_options:
      model: CKM
      form-factors: BCL2008
      l: e
    observables:
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.05 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.10 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.25 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.50 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.75 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  1.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  1.50 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  2.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  2.50 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  3.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  3.50 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  4.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  6.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  8.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2: 10.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2: 12.00 }

The Command-Line Interface

The eos-analysis script provides several subcommands that

  • inspect the analysis file;

  • sample from a posterior density with Monte Carlo methods;

  • perform auxiliary tasks on intermediate results.

The output of these commands are stored on disk as directories filled with YAML files (for descriptions and small numerical datasets) and Numpy datafiles (for samples). The datafiles can be access with the classes documented as part of the eos.data module.

usage: eos-analysis [-h] [-v] [-f ANALYSIS_FILE]
                    {list-priors,list-likelihoods,list-posteriors,list-predictions,sample-mcmc,sample-pmc,plot-samples,find-mode,find-clusters,predict-observables,run}
                    ...

Named Arguments

-v, --verbose

Increases the verbosity of the script

-f, --analysis-file

The analysis file. Defaults to ‘.analysis.yaml’.

Sub-commands:

list-priors

Lists the named prior PDFs defined within the scope of this analysis file.

eos-analysis list-priors [-h]
list-likelihoods

Lists the named likelihoods defined within the scope of this analysis file.

eos-analysis list-likelihoods [-h] [-d]
Named Arguments
-d, --display-details

Whether to display further details for each likelihood.

list-posteriors

Lists the named posterior PDFs defined within the scope of this analysis file.

eos-analysis list-posteriors [-h]
list-predictions

Lists the named prediction sets defined within the scope of this analysis file.

eos-analysis list-predictions [-h]
sample-mcmc

Samples from a named posterior PDF using Markov Chain Monte Carlo (MCMC) methods.

The output file will be stored in EOS_BASE_DIRECTORY/POSTERIOR/mcmc-IDX.

eos-analysis sample-mcmc [-h] [-N N] [-S STRIDE] [-p PRERUNS] [-n PRE_N]
                         [-b BASE_DIRECTORY]
                         POSTERIOR CHAIN-IDX
Positional Arguments
POSTERIOR

The name of the posterior PDF from which to draw the samples.

CHAIN-IDX

The index assigned to the Markov chain. This value is used to seed the RNG for a reproducable analysis.

Named Arguments
-N, --number-of-samples

The number of samples to be stored in the output file.

-S, --stride

The ratio of samples drawn over samples stored. For every S samples, S - 1 will be discarded.

-p, --number-of-preruns

The number of prerun steps, which ared used to adapt the MCMC proposal to the posterior.

-n, --number-of-prerun-samples

The number of samples to be used for an adaptation in each prerun steps. These samples will be discarded.

-b, --base-directory

The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

sample-pmc

Samples from a named posterior using the Population Monte Carlo (PMC) methods.

The results of the find-cluster command are expected in EOS_BASE_DIRECTORY/POSTERIOR/clusters. The output file will be stored in EOS_BASE_DIRECTORY/POSTERIOR/pmc.

eos-analysis sample-pmc [-h] [-n STEP_N] [-s STEPS] [-t PERPLEXITY_THRESHOLD]
                        [-N FINAL_N] [-c] [-b BASE_DIRECTORY]
                        POSTERIOR
Positional Arguments
POSTERIOR

The name of the posterior PDF from which to draw the samples.

Named Arguments
-n, --number-of-adaptation-samples

The number of samples to be used in each adaptation step. These samples will be discarded.

-s, --number-of-adaptation-steps

The number of adaptation steps, which ared used to adapt the PMC proposal to the posterior.

-t, --perplexity-threshold

The threshold for the perplexity in the last step after which further adaptation steps are to be skipped.

-N, --number-of-final-samples

The number of samples to be stored in the output file.

-c, --continue-sampling

Whether to continue sampling from the previous sample-pmc results, or start fresh from the proposal obtained using find-clusters.

-b, --base-directory

The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

plot-samples

Plots all samples obtained for a named posterior.

The results of either the sample-mcmc or the sample-pmc command are expected in EOS_BASE_DIRECTORY/POSTERIOR/mcmc-* or EOS_BASE_DIRECTORY/POSTERIOR/pmc, respectively. The plots will be stored as PDF files within the respective sample inputs.

eos-analysis plot-samples [-h] [-B BINS] [-b BASE_DIRECTORY] POSTERIOR
Positional Arguments
POSTERIOR

The name of the posterior PDF from which to draw the samples.

Named Arguments
-B, --bins

The number of bins per histogram.

-b, --base-directory

The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

find-mode

Finds the mode of the named posterior.

The optimization process can be initialized either with a provided parameter point, or by extracting the point with the largest posterior from among previously obtained MCMC samples. The output will be stored in EOS_BASE_DIRECTORY/posterior/mode.

eos-analysis find-mode [-h] [-p POINTS] [-i INIT_FILE] [--from-point POINT]
                       [--use-random-seed SEED] [-b BASE_DIRECTORY]
                       POSTERIOR
Positional Arguments
POSTERIOR

The name of the posterior PDF that will be maximized.

Named Arguments
-p, --starting-points

The number of parameter points from which maximization is started.

-i, --init-from-file

The name of an MCMC data file from which the maximization is started.

--from-point

The point from which the minization is started.

--use-random-seed

The seed used to generate the random starting point of the minimization.

-b, --base-directory

The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

find-clusters

Finds clusters among posterior MCMC samples, grouped by Gelman-Rubin R value, and creates a Gaussian mixture density.

Finding clusters and creating a Gaussian mixture density is a neccessary intermediate step before using the sample-pmc subcommand. The input files are expected in EOS_BASE_DIRECTORY/POSTERIOR/mcmc-*. All MCMC input files present will be used in the clustering. The output files will be stored in EOS_BASE_DIRECTORY/POSTERIOR/clusters.

eos-analysis find-clusters [-h] [-t THRESHOLD] [-c K_G] [-b BASE_DIRECTORY]
                           POSTERIOR
Positional Arguments
POSTERIOR

The name of the posterior PDF from which MCMC samples have previously been drawn.

Named Arguments
-t, --threshold

The R value threshold. If two sample subsets have an R value larger than this threshold, they will be treated as two distinct clusters. (default: 2.0)

-c, --clusters-per-group

The number of mixture components per cluster. (default: 1)

-b, --base-directory

The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

predict-observables

Predicts a set of observables based on previously obtained PMC samples.

The input files are expected in EOS_BASE_DIRECTORY/POSTERIOR/pmc. The output files will be stored in EOS_BASE_DIRECTORY/POSTERIOR/pred-PREDICTION.

eos-analysis predict-observables [-h] [-B BEGIN] [-E END] [-b BASE_DIRECTORY]
                                 POSTERIOR PREDICTION
Positional Arguments
POSTERIOR

The name of the posterior PDF from which to draw the samples.

PREDICTION

The name of the set of observables to predict.

Named Arguments
-B, --begin-index

The index of the first sample to use for the predictions.

-E, --end-index

The index beyond the last sample to use for the predictions.

-b, --base-directory

The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

run

Runs a list of subcommands.

eos-analysis run [-h] [-b BASE_DIRECTORY]
Named Arguments
-b, --base-directory

The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.