Advanced Usage

The Expression Parser

A user can define new observables for their analyses. These new observables must be arithmetic expressions of already defined observables and/or parameters. Observables defined in this way benefit from the same optimizations as all built-in observables, including multi-threaded evaluation and caching. This paragraph describes and exemplifies the syntax of the expression parser that interprets user defined python strings as new EOS observables.

The Construction of Expressions

The Rules

Basic rules are used to parse the input string

Spaces are ignored and can be added arbitrarily for readability.

The parser supports usual arithmetic operations +, -, *, / and ^ and parenthesized expressions; usual precedence rules of arithmetics apply.

<<...>> encapsulates the name of an EOS object, which must either be the name of parameter or an observables. Any such must therefore adhere to the restrictions of eos.QualifiedName, for example <<mass::mu>> or <<B_u->lnu::BR>>.

The following strings are valid observable expressions

"(<<mass::B_d>>^2 - 4 * <<mass::mu>>^2) ^ 0.5"

"1.0 / <<B_u->lnu::BR@l=mu>>"

Aliasing

By default, the kinematic arguments of the observables are transferred to the expression. For example, the expression 1.0 / <<B->pilnu::dBR/dq2>> will expect a kinematic specification for q2 (either through an eos.Kinematic object or indirectly in a plotting routine). When more than one observable appears in the expression, it is useful to rename the kinematic variables. This can be done via an alias specification <<...>>[...]. Two types of specification are supported

the = operator fixes a kinematic variable to a given value. E.g. <<B->pilnu::BR>>[q2_min=0.1] only expects a specification for q2_max.

the => operator renames the kinematic variable on its left-hand side to the name on the right-hand side. E.g. <<B_u->lnu::BR@l=mu>>[q2=>q2_mu] / <<B_u->lnu::BR@l=e>>[q2=>q2_e] requires two kinematic specifications, q2_mu and q2_e.

Note that these specifications can be combined in comma-separated list, for example [q2_min=1.0, q2_max=>q2_mu].

The Insert Method

Once a new observable is defined via its expression string, it can be added to the list of observables via the insert method

eos.Observables().insert(name, latex, unit, options, expression)

where name, latex and unit are the (Qualified)name, the latex representation and the unit of the new observable; options takes an eos.Options object and allows to specify global options (i.e. applied to all observables in the expression); and expression is the expression string to be parsed.

We conclude with a concrete example

eos.Observables().insert('B->Kll::R_K_example', R'(R_K)', eos.Unit.Unity(), eos.Options(),
                         '( <<B->Kll::BR;l=mu>>[q2_max=6, q2_min=>q2_mu_min] / <<B->Kll::BR;l=e>>[q2_max=6,q2_min=>q2_e_min] )')

R_K = eos.Observable.make('B->Kll::R_K_example', eos.Parameters.Defaults(), eos.Kinematics(q2_e_min=1.1, q2_mu_min=1.1), eos.Options(**{'tag':'BFS2004'}))

R_K.evaluate()   # should be ~1

The EOS Command-Line Interface

Although using EOS within an interactive Jupyter notebook on your personal computer or laptop is useful to prototype an analysis, this approach sometimes suffers from limited computing power. To circumvent this problem, you can alternatively

use EOS in Jupyter interactively on a remote workstation computer via an SSH tunnel (see the FAQ);

use EOS on remote workstations or compute clusters via the command-line interface.

In the following we document the command-line interface and the file format used in conjunction with it.

Note

The EOS command-line interface is completely optional and does not provide any means beyond the interactive Python interface.

The Analysis Description Format

EOS uses a YAML file to describe the individual steps of one or more statistical analyses. At the top level, the format includes the following YAML keys:

priors (mandatory) — The list of priors within the analysis.

likelihoods (mandatory) — The list of likelihoods within the analysis..

posteriors (mandatory) — The list of posteriors within the analysis.

predictions (optional) — The list of theory predictions within the analysis.

Describing Priors

The priors key contains a list of named priors. Each prior has two mandatory keys:

name (mandatory) — The unique name of this prior.

parameters (mandatory) — The ordered list of parameters described by this prior.

The description of each individual parameter follows the prior description used in the Analysis constructor.

Describing Likelihoods

The likelihoods key contains a list of named likelihoods. Each likelihood has two mandatory keys:

name (mandatory) — The unique name of this likelihood.

constraints (mandatory) — The ordered list of EOS constraint names that comprise this likelihood.

Describing Posteriors

The posteriors key contains a list of named posteriors. Each posterior can contain several keys:

name (mandatory) — The unique name of this posterior.

global_options (optional) — A key/value map providing global options, i.e., options that apply to all observables used by this posterior.

prior (mandatory) — The ordered list of named priors that are used as part of this posterior.

likelihood (optional) — The ordered list of named likelihoods that are used as part of this posterior.

fixed_parameter (optional) — A key/value map providing values for parameters that deviate from the default values.

Example

Example examples/cli/btopilnu.analysis

priors:
  - name: CKM
    parameters:
      - parameter: CKM::abs(V_ub)
        min: 2.0e-3
        max: 5.0e-3
        type: uniform

  - name: FF-BCL2008
    parameters:
      - parameter: B->pi::f_+(0)@BCL2008
        min: 0.2
        max: 0.4
        type: uniform
      - parameter: B->pi::b_+^1@BCL2008
        min: -20.0
        max: +20.0
        type: uniform
      - parameter: B->pi::b_+^2@BCL2008
        min: -20.0
        max: +20.0
        type: uniform

likelihoods:
  - name: theory
    constraints:
      - B->pi::f_+@IKMvD:2014A

  - name: BaBar
    constraints:
      - B^0->pi^+lnu::BR@BaBar:2010B
      - B^0->pi^+lnu::BR@BaBar:2012D

  - name: Belle
    constraints:
      - B^0->pi^+lnu::BR@Belle:2010A
      - B^0->pi^+lnu::BR@Belle:2013A

posteriors:
  - name: th+exp
    global_options:
      model: CKM
      form-factors: BCL2008
    prior:
      - CKM
      - FF-BCL2008
    likelihood:
      - theory
      - BaBar
      - Belle

predictions:
  - name: differential
    global_options:
      model: CKM
      form-factors: BCL2008
      l: e
    observables:
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.05 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.10 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.25 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.50 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  0.75 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  1.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  1.50 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  2.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  2.50 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  3.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  3.50 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  4.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  6.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2:  8.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2: 10.00 }
      - name: B->pilnu::dBR/dq2
        kinematics: { q2: 12.00 }

The Command-Line Interface

The eos-analysis script provides several subcommands that

inspect the analysis file;

sample from a posterior density with Monte Carlo methods;

perform auxiliary tasks on intermediate results.

The output of these commands are stored on disk as directories filled with YAML files (for descriptions and small numerical datasets) and Numpy datafiles (for samples). The datafiles can be access with the classes documented as part of the eos.data module.

usage: eos-analysis [-h] [-v] [-f ANALYSIS_FILE]
                    {list-priors,list-likelihoods,list-posteriors,list-predictions,sample-mcmc,sample-pmc,plot-samples,find-mode,find-clusters,predict-observables,run}
                    ...

Named Arguments

-v, --verbose: Increases the verbosity of the script
-f, --analysis-file: The analysis file. Defaults to ‘.analysis.yaml’.

Sub-commands:

list-priors

Lists the named prior PDFs defined within the scope of this analysis file.

eos-analysis list-priors [-h]

list-likelihoods

Lists the named likelihoods defined within the scope of this analysis file.

eos-analysis list-likelihoods [-h] [-d]

Named Arguments

-d, --display-details: Whether to display further details for each likelihood.

list-posteriors

Lists the named posterior PDFs defined within the scope of this analysis file.

eos-analysis list-posteriors [-h]

list-predictions

Lists the named prediction sets defined within the scope of this analysis file.

eos-analysis list-predictions [-h]

sample-mcmc

Samples from a named posterior PDF using Markov Chain Monte Carlo (MCMC) methods.

The output file will be stored in EOS_BASE_DIRECTORY/POSTERIOR/mcmc-IDX.

eos-analysis sample-mcmc [-h] [-N N] [-S STRIDE] [-p PRERUNS] [-n PRE_N]
                         [-b BASE_DIRECTORY]
                         POSTERIOR CHAIN-IDX

Positional Arguments

POSTERIOR: The name of the posterior PDF from which to draw the samples.
CHAIN-IDX: The index assigned to the Markov chain. This value is used to seed the RNG for a reproducable analysis.

Named Arguments

-N, --number-of-samples: The number of samples to be stored in the output file.
-S, --stride: The ratio of samples drawn over samples stored. For every S samples, S - 1 will be discarded.
-p, --number-of-preruns: The number of prerun steps, which ared used to adapt the MCMC proposal to the posterior.
-n, --number-of-prerun-samples: The number of samples to be used for an adaptation in each prerun steps. These samples will be discarded.
-b, --base-directory: The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

sample-pmc

Samples from a named posterior using the Population Monte Carlo (PMC) methods.

The results of the find-cluster command are expected in EOS_BASE_DIRECTORY/POSTERIOR/clusters. The output file will be stored in EOS_BASE_DIRECTORY/POSTERIOR/pmc.

eos-analysis sample-pmc [-h] [-n STEP_N] [-s STEPS] [-t PERPLEXITY_THRESHOLD]
                        [-N FINAL_N] [-c] [-b BASE_DIRECTORY]
                        POSTERIOR

Positional Arguments

POSTERIOR: The name of the posterior PDF from which to draw the samples.

Named Arguments

-n, --number-of-adaptation-samples: The number of samples to be used in each adaptation step. These samples will be discarded.
-s, --number-of-adaptation-steps: The number of adaptation steps, which ared used to adapt the PMC proposal to the posterior.
-t, --perplexity-threshold: The threshold for the perplexity in the last step after which further adaptation steps are to be skipped.
-N, --number-of-final-samples: The number of samples to be stored in the output file.
-c, --continue-sampling: Whether to continue sampling from the previous sample-pmc results, or start fresh from the proposal obtained using find-clusters.
-b, --base-directory: The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

plot-samples

Plots all samples obtained for a named posterior.

The results of either the sample-mcmc or the sample-pmc command are expected in EOS_BASE_DIRECTORY/POSTERIOR/mcmc-* or EOS_BASE_DIRECTORY/POSTERIOR/pmc, respectively. The plots will be stored as PDF files within the respective sample inputs.

eos-analysis plot-samples [-h] [-B BINS] [-b BASE_DIRECTORY] POSTERIOR

Positional Arguments

POSTERIOR: The name of the posterior PDF from which to draw the samples.

Named Arguments

-B, --bins: The number of bins per histogram.
-b, --base-directory: The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

find-mode

Finds the mode of the named posterior.

The optimization process can be initialized either with a provided parameter point, or by extracting the point with the largest posterior from among previously obtained MCMC samples. The output will be stored in EOS_BASE_DIRECTORY/posterior/mode.

eos-analysis find-mode [-h] [-p POINTS] [-i INIT_FILE] [--from-point POINT]
                       [--use-random-seed SEED] [-b BASE_DIRECTORY]
                       POSTERIOR

Positional Arguments

POSTERIOR: The name of the posterior PDF that will be maximized.

Named Arguments

-p, --starting-points: The number of parameter points from which maximization is started.
-i, --init-from-file: The name of an MCMC data file from which the maximization is started.
--from-point: The point from which the minization is started.
--use-random-seed: The seed used to generate the random starting point of the minimization.
-b, --base-directory: The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

find-clusters

Finds clusters among posterior MCMC samples, grouped by Gelman-Rubin R value, and creates a Gaussian mixture density.

Finding clusters and creating a Gaussian mixture density is a neccessary intermediate step before using the sample-pmc subcommand. The input files are expected in EOS_BASE_DIRECTORY/POSTERIOR/mcmc-*. All MCMC input files present will be used in the clustering. The output files will be stored in EOS_BASE_DIRECTORY/POSTERIOR/clusters.

eos-analysis find-clusters [-h] [-t THRESHOLD] [-c K_G] [-b BASE_DIRECTORY]
                           POSTERIOR

Positional Arguments

POSTERIOR: The name of the posterior PDF from which MCMC samples have previously been drawn.

Named Arguments

-t, --threshold: The R value threshold. If two sample subsets have an R value larger than this threshold, they will be treated as two distinct clusters. (default: 2.0)
-c, --clusters-per-group: The number of mixture components per cluster. (default: 1)
-b, --base-directory: The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

predict-observables

Predicts a set of observables based on previously obtained PMC samples.

The input files are expected in EOS_BASE_DIRECTORY/POSTERIOR/pmc. The output files will be stored in EOS_BASE_DIRECTORY/POSTERIOR/pred-PREDICTION.

eos-analysis predict-observables [-h] [-B BEGIN] [-E END] [-b BASE_DIRECTORY]
                                 POSTERIOR PREDICTION

Positional Arguments

POSTERIOR: The name of the posterior PDF from which to draw the samples.
PREDICTION: The name of the set of observables to predict.

Named Arguments

-B, --begin-index: The index of the first sample to use for the predictions.
-E, --end-index: The index beyond the last sample to use for the predictions.
-b, --base-directory: The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.

run

Runs a list of subcommands.

eos-analysis run [-h] [-b BASE_DIRECTORY]

Named Arguments

-b, --base-directory: The base directory for the storage of data files. Can also be set via the EOS_BASE_DIRECTORY environment variable.