5.2 Predicting the science return of a future experiment

We consider a toy Gaussian linear model in order to illustrate the different approaches to performance forecast. We notice that, although motivated by computational simplicity and the ability to obtain analytical results, a Gaussian model is actually a fairly close representation of many cases of interest. In Figure 55View Image we illustrate this point by plotting the parameter constraints expected from a Euclid-like survey and the corresponding Gaussian approximation in the Fisher-matrix approach to the likelihood (described below). In these cases, it seem clear that the Gaussian model captures fairly well the full probability distribution. Another example shown in Figure 56View Image are cosmological constraints from WMAP and SDSS data, where a Gaussian approximation to the likelihood (so-called Laplace approximation) is seen to give an excellent description of the full distribution obtained numerically via MCMC.
View Image

Figure 55: Projected cosmological 8-parameter space for a 20,000 square degrees, median redshift of z = 0.8, 10 bin tomographic cosmic shear survey. Specifications are based on Euclid Yellow book [550] as this figure is representative of a method, rather than on forecast analysis; the discussion is still valid with more updated [551] Euclid specifications. The upper panel shows the 1D parameter constraints using analytic marginalization (black) and the Gaussian approximation (Fisher matrix, blue, dark grey). The other panels show the 2D parameter constraints. Grey contours are 1- 2- and 3-σ levels using analytic marginalization over the extra parameters, solid blue ellipses are the 1-σ contours using the Fisher-matrix approximation to the projected likelihood surface, solid red ellipses are the 1-σ fully marginalized. Image reproduced by permission from [878Jump To The Next Citation Point].
View Image

Figure 56: Gaussian approximation (Laplace approximation) to a 6-dimensional posterior distribution for cosmological parameters, from WMAP1 and SDSS data. For all couples of parameters, panels show contours enclosing 68% and 95% of joint probability from 2 ⋅ 105 MC samples (black contours), along with the Laplace approximation (red ellipses). It is clear that the Laplace approximation captures the bulk of the posterior volume in parameter space in this case where there is little non-Gaussianity in the posterior PDF. Image reproduced from 2005 preprint of [894Jump To The Next Citation Point].

5.2.1 The Gaussian linear model

Suppose we have N cosmological probes, whose likelihood function is assumed to be a multi-dimensional Gaussian, given by: Li (i = 1,...,N), i.e.,

( 1 ) ℒi (Θ ) ≡ p(Di |Θ ) = ℒi0 exp − --(μi − Θ )tLi(μi − Θ ) . (5.2.1 ) 2
where Θ are the parameters one is interested in constraining, D i are the available data from probe i and μi is the location of the maximum likelihood value in parameter space. The matrix Li is the inverse of the covariance matrix of the parameters.

The posterior distribution for the parameters from each probe, p (Θ |Di), is obtained by Bayes’ theorem as

p(Θ)p(Di |Θ ) p(Θ|Di ) = ------------, (5.2.2 ) p(Di )
where and p(Θ ) is the prior and p(Di ) is a normalizing constant (the Bayesian evidence). If we assume a Gaussian prior centered on the origin with inverse covariance matrix Σ, the posterior from each probe is also a Gaussian, with inverse covariance matrix
Fi = Li + Σ (i = 1,...,N ) (5.2.3 )
and posterior mean
μ- = F −1(Liμi). (5.2.4 ) i i
Tighter constraints on the parameters can be usually obtained by combining all available probes together (provided there are no systematics, see below). If we combine all probes together, we obtain a Gaussian posterior with inverse covariance matrix
∑N F = Li + Σ (5.2.5 ) i=1
and mean
∑N μ-= F −1 L μ . (5.2.6 ) i i i=1
Notice that the precision of the posterior (i.e., the inverse covariance matrix) does not depend on the degree of overlap of the likelihoods from the individual probes. This is a property of the Gaussian linear model.

For future reference, it is also useful to write down the general expression for the Bayesian evidence. For a normal prior p(Θ ) ∼ 𝒩 (𝜃π,Σ ) and a likelihood

( ) ℒ (Θ) = ℒ exp − 1(𝜃 − Θ)tL(𝜃 − Θ ) , (5.2.7 ) 0 2 0 0
the evidence for data d is given by
∫ 1∕2 [ ( -- -)] p (d ) ≡ dΘp (d|Θ )p(Θ) = ℒ0 |Σ|---exp − 1- 𝜃tL 𝜃0 + 𝜃tΣ ðœƒπ − 𝜃tF 𝜃 , (5.2.8 ) |F|1∕2 2 0 π
where F is given by Eq. (5.2.5View Equation) with N = 1 and -- −1 𝜃 = F L𝜃0.

5.2.2 Fisher-matrix error forecast

A general likelihood function for a future experiment (subscript i) can be Taylor-expanded around its maximum-likelihood value, μi. By definition, at the maximum the first derivatives vanish, and the shape of the log-likelihood in parameter space is approximated by the Hessian matrix Hi,

1- t lnℒi (Θ ) ≈ lnℒi (μi) + 2 (Θ − μi) Hi(Θ − μi), (5.2.9 )
where Hi is given by
2 | (H ) ≡ -∂--ln-ℒi-| , (5.2.10 ) iαβ ∂ Θ α∂Θ β| μi
and the derivatives are evaluated at the maximum-likelihood point. By taking the expectation of equation (5.2.9View Equation) with respect to many data realizations, we can replace the maximum-likelihood value μi with the true value, Θ ∗, as the maximum-likelihood estimate is unbiased (in the absence of systematics), i.e., ⟨μi⟩ = Θ ∗. We then define the Fisher information matrix as the expectation value of the Hessian,
Fi ≡ ⟨Hi ⟩. (5.2.11 )
The inverse of the Fisher matrix, F −1, is an estimate of the covariance matrix for the parameters, and it describes how fast the log-likelihood falls (on average) around the maximum likelihood value, and we recover the Gaussian expression for the likelihood, Eq. (5.2.1View Equation), with the maximum likelihood value replaced by the true value of the parameters and the inverse covariance matrix given by the Fisher matrix, Li = F −1 i [496]. In general, the derivatives depend on where in parameter space we take them (except for the simple case of linear models), hence it is clear that F i is a function of the fiducial parameters.

Once we have the Fisher matrix, we can give estimates for the accuracy on the parameters from a future measurement, by computing the posterior as in Eq. (5.2.2View Equation). If we are only interested in a subset of the parameters, then we can marginalize easily over the others: computing the Gaussian integral over the unwanted parameters is the same as inverting the Fisher matrix, dropping the rows and columns corresponding to those parameters (keeping only the rows and columns containing the parameters of interest) and inverting the smaller matrix back. The result is the marginalized Fisher matrix ℱi. For example, the 1 σ error for parameter α from experiment i, marginalized over all other parameters, is simply given by ∘ -------- σ = (F −1) α i αα.

It remains to compute the Fisher matrix for the future experiment. This can be done analytically for the case where the likelihood function is approximately Gaussian in the data, which is a good approximation for many applications of interest. We can write for the log-likelihood (in the following, we drop the subscript i denoting the experiment under consideration for simplicity of notation)

− 2 ln ℒ = ln |C | + (D − μ)tC −1(D − μ ), (5.2.12 )
where D are the (simulated) data that would be observed by the experiment and in general both the mean μ and covariance matrix C may depend on the parameters Θ we are trying to estimate. The expectation value of the data corresponds to the true mean, ⟨D⟩ = μ, and similarly the expectation value of the data matrix t Δ ≡ (D − μ) (D − μ) is equal to the true covariance, ⟨Δ⟩ = C. Then it can be shown (see e.g. [884Jump To The Next Citation Point]) that the Fisher matrix is given by
1 [ ] Fαβ = -tr A αA β + C− 1⟨Δ, αβ⟩ , (5.2.13 ) 2
where Aα ≡ C− 1C, α and the comma denotes a derivative with respect to the parameters, for example C,α ≡ ∂C ∕ ∂Θ α. The fact that this expression depends only on expectation values and not on the particular data realization means that the Fisher matrix can be computed from knowledge of the noise properties of the experiment without having to go through the step of actually generating any simulated data. The specific form of the Fisher matrix then becomes a function of the type of observable being considered and of the experimental parameters.

Explicit expressions for the Fisher matrix for cosmological observables can be found in [884] for cosmic microwave background data, in [880] for the matter power spectrum from galaxy redshift surveys (applied to baryonic acoustic oscillations in [815] and in [454] for weak lensing. These approaches have been discussed in Section 1.7. A useful summary of Fisher matrix technology is given in the Dark Energy Task Force report [21Jump To The Next Citation Point] and in [919]. A useful numerical package which includes several of the a bove calculations is the publicly available Matlab code20 Fisher4Cast [99, 98]. Attempts to include systematic errors modelling in this framework can be found in [508, 878, 505].

5.2.3 Figure of merits

It has become customary to describe the statistical power of a future dark energy probe by the inverse area enclosed by the 68% covariance ellipse marginalized down to the dark-energy parameter space. This measure of statistical performance for probe i (widely known as the DETF FoM [21Jump To The Next Citation Point, 470Jump To The Next Citation Point]) is usually defined (up to multiplicative constants) as

−1∕2 FoM = |Fi| , (5.2.14 )
where the Fisher matrix Fi is given in Eq. (5.2.11View Equation). [21] suggested to use the inverse area of the 95% error ellipse of w0 − wa (where w0 and wa are defined in [584], [229]). This definition was inspired by [470]. In [22] it is suggested to model w (a) as piecewise constant values of w (a) defined in many small redshift bins (Δa = 0.025). The suggestion is then to apply a principal component approach [468Jump To The Next Citation Point] in order to understand the redshifts at which each experiment has the power to constrain w.

A closely related but more statistically motivated measure of the information gain is the Kullback–Leibler divergence (KL) between the posterior and the prior, representing the information gain obtained when upgrading the prior to the posterior via Bayes’ theorem:

∫ p(Θ|D-)- DKL ≡ p(Θ|D )ln p(Θ ) dΘ. (5.2.15 )
The KL divergence measures the relative entropy between the two distributions: it is a dimensionless quantity which expressed the information gain obtained via the likelihood. For the Gaussian likelihood and prior introduced above, the information gain (w.r.t. the prior Σ) from the combination of all probes is given by [900]
1-( −1) DKL = 2 ln|F | − ln|Σ| − tr[1 − ΣF ] . (5.2.16 )

A discussion of other, alternative FoMs (D-optimality, A-optimality) can be found in [96Jump To The Next Citation Point]. In [939] a different FoM for dark energy is suggested. For a set of DE parameters Θ, the FoM is defined as -------- FoM = 1∕∘ Cov (Θ ), where Cov (Θ ) is the covariance matrix of Θ. This definition is more flexible since one can use it for any DE parametrization [945].

Given that Euclid can constrain both the expansion history and the growth of structure, it is also useful to introduce a new FoM for the growth of perturbations. Similarly to the DETF FoM, one can define this new FoM as the inverse area of the 95% error ellipse of Ωm − γ, where γ is the growth index, defined starting from the growth rate f (z) ≡ d-lnG(z)= Ωγ G dlna m, or as 1∕∘Cov---(w-,w--,γ)- 0 a or similar variants [614, 308]. Instead of γ, other parameters describing the growth can also be employed.

A FoM targeted at evaluating the robustness of a future probe to potential systematic errors has been introduced in [625]. The robustness of a future probe is defined via the degree of overlap between the posterior distribution from that probe and the posterior from other, existing probes. The fundamental notion is that maximising statistical power (e.g., by designing a future probe to deliver orthogonal constraints w.r.t. current probes) will in general reduce its robustness (by increasing the probability of an incompatible results, for example because of systematic bias). Thus in evaluating the strength of a probe, both its statistical power and its resilience to plausible systematics ought to be considered.

5.2.4 The Bayesian approach

When considering the capabilities of future experiments, it is common stance to predict their performance in terms of constraints on relevant parameters, assuming a fiducial point in parameter space as the true model (often, the current best-fit model), as explained above. While this is a useful indicator for parameter inference tasks, many questions in cosmology fall rather in the model comparison category. Dark energy is a case in point, where the science driver for many future probes (including Euclid) is to detect possible departures from a cosmological constant, hence to gather evidence in favor of an evolving dark-energy model. Therefore, it is preferable to assess the capabilities of future experiments by their ability to answer model selection questions.

The procedure is as follows (see [677Jump To The Next Citation Point] for details and the application to dark-energy scenarios). At every point in parameter space, mock data from the future observation are generated and the Bayes factor between the competing models is computed, for example between an evolving dark energy and a cosmological constant. Then one delimits in parameter space the region where the future data would not be able to deliver a clear model comparison verdict, for example |ln B01| < 5 (evidence falling short of the “strong” threshold). Here, B01 is the Bayes factor, which is formed from the ratio of the Bayesian evidences of the two models being considered:

p(d|ℳ0--)- B01 = p(d|ℳ1 ), (5.2.17 )
where the Bayesian evidence is the average of the likelihood under the prior in each model (denoted by a subscript m):
∫ p(d|ℳm ) = dΘmp (d|Θm, ℳm )p(Θm |ℳm ). (5.2.18 )
The Bayes factor updates the prior probability ratio of the models to the posterior one, indicating the extent to which the data have modified one’s original view on the relative probabilities of the two models. The experiment with the smallest “model-confusion” volume in parameter space is to be preferred, since it achieves the highest discriminative power between models. An application of a related technique to the spectral index from the Planck satellite is presented in [704, 703].

Alternatively, we can investigate the full probability distribution for the Bayes factor from a future observation. This allows to make probabilistic statements regarding the outcome of a future model comparison, and in particular to quantify the probability that a new observation will be able to achieve a certain level of evidence for one of the models, given current knowledge. This technique is based on the predictive distribution for a future observation, which gives the expected posterior for an observation with a certain set of experimental capabilities (further details are given in [895]). This method is called PPOD, for predictive posterior odds distribution and can be useful in the context of experiment design and optimization

Hybrid approaches have also been attempted, i.e., to defined model-selection oriented FoMs while working in the Fisher-matrix framework, such as the expected Bayesian evidence ratio [429Jump To The Next Citation Point, 31].

The most general approach to performance forecasting involves the use of a suitably defined utility function, and it has recently been presented in [899Jump To The Next Citation Point]. Consider the different levels of uncertainty that are relevant when predicting the probability of a certain model selection outcome from a future probe, which can be summarized as follows:

The commonly-used Fisher matrix forecast ignores the uncertainty arising from Levels 1 and 2, as it assumes a fiducial model (Level 1) and fiducial parameter values (Level 2). It averages over realization noise (Level 3) in the limit of an infinite number of realizations. Clearly, the Fisher matrix procedure provides a very limited assessment of what we can expect for the scientific return of a future probe, as it ignores the uncertainty associated with the choice of model and parameter values.

The Bayesian framework allows improvement on the usual Fisher matrix error forecast thanks to a general procedure which fully accounts for all three levels of uncertainty given above. Following [590Jump To The Next Citation Point], we think of future data Df as outcomes, which arise as consequence of our choice of experimental parameters e (actions). For each action and each outcome, we define a utility function 𝒰 (Df ,e). Formally, the utility only depends on the future data realization Df. However, as will become clear below, the data D f are realized from a fiducial model and model parameter values. Therefore, the utility function implicitly depends on the assumed model and parameters from which the data Df are generated. The best action is the one that maximizes the expected utility, i.e., the utility averaged over possible outcomes:

∫ ℰ𝒰 (e) ≡ dDf p(Df|e,d)𝒰 (Df ,e). (5.2.19 )
Here, p(Df |e,d) is the predictive distribution for the future data, conditional on the experimental setup (e) and on current data (d). For a single fixed model the predictive distribution is given by
pict

where the last line follows because p(Df |Θ,e,d ) = p (Df |Θ, e) (conditioning on current data is irrelevant once the parameters are given) and p(Θ |e,d) = p(Θ |d) (conditioning on future experimental parameters is irrelevant for the present-day posterior). So we can predict the probability distribution for future data Df by averaging the likelihood function for the future measurement (Level 3 uncertainty) over the current posterior on the parameters (Level 2 uncertainty). The expected utility then becomes

∫ ∫ ℰ 𝒰(e) = dΘp (Θ |o,d) dDf p(Df|Θ, e)𝒰(Df ,e). (5.2.21 )

So far, we have tacitly assumed that only one model was being considered for the data. In practice, there will be several models that one is interested in testing (Level 1 uncertainty), and typically there is uncertainty over which one is best. This is in fact one of the main motivations for designing a new dark energy probe. If M models { ℳ1, ...,ℳM } are being considered, each one with parameter vector Θ m (m = 1,...,M), the current posterior can be further extended in terms of model averaging (Level 1), weighting each model by its current model posterior probability, p(ℳm |d), obtaining from Eq. (5.2.21View Equation) the model-averaged expected utility

pict

This expected utility is the most general definition of a FoM for a future experiment characterized by experimental parameters e. The usual Fisher matrix forecast is recovered as a special case of Eq. (5.2.22), as are other ad hoc FoMs that have been defined in the literature. Therefore Eq. (5.2.22) gives us a formalism to define in all generality the scientific return of a future experiment. This result clearly accounts for all three levels of uncertainty in making our predictions: the utility function 𝒰(Df ,e,ℳm ) (to be specified below) depends on the future data realization, Df, (Level 3), which in turn is a function of the fiducial parameters value, Θm, (Level 2), and is averaged over present-day model probabilities (Level 1).

This approach is used in [899] to define two model-selection oriented Figures of Merit: the decisiveness 𝒟, which quantifies the probability that a probe will deliver a decisive result in favor or against the cosmological constant, and the expected strength of evidence, ℰ, that returns a measure of the expected power of a probe for model selection.


  Go to previous page Go up Go to next page