5.3 Measurement of parameters and source reconstruction

We have so far focused on the problem of detection and have not discussed parameter estimation in any concrete way. In principle, parameter estimation should not be considered to be distinct from detection. From a practical point of view, however, certain methods might be computationally efficient in detecting a signal but not necessarily the best for parameter estimation, while the best parameter estimation methods might not be computationally efficient. Thus, the problem of parameter estimation is often treated separately from detection.

The first thing to realize is that we can never be absolutely certain that a signal is present in a data train [159Jump To The Next Citation Point161Jump To The Next Citation Point]; we can only give confidence levels about its presence, which could be close to 100% for high values of the SNR. The next thing to realize is that, whatever the SNR may be, we cannot be absolutely certain about the true parameters of the signal: at best we can make an estimate and these estimates are given in a certain range. The width of the range depends on the confidence level required, being larger for higher confidence levels [159Jump To The Next Citation Point].

Maximum likelihood estimates have long been used to measure the parameters of a known signal buried in noisy data. The method consists in maximizing the likelihood ratio – the ratio of the probability that a given signal is present in the data to the probability that the signal is absent [188Jump To The Next Citation Point159Jump To The Next Citation Point]. Maximum likelihood estimates are not always minimum uncertainty estimates, as has been particularly demonstrated in the case of binary inspiral signals by Balasubramanian, et al. [66Jump To The Next Citation Point67Jump To The Next Citation Point]. However, until recently, this is the method that has been very widely followed in the gravitational wave literature. But what is important to note is that maximum likelihood estimates are unbiased when the SNR is large3, and the mean of the distribution of measured values of the parameters will be centered around the true parameter values. This is an important quality that will be useful in our discussion below.

Bayesian estimates, which take into account any prior knowledge that may be available about the distribution of the source parameters, often give much better estimates and do not rely on the availability of an ensemble of detector outputs [343Jump To The Next Citation Point274]. However, they are computationally a lot more expensive than maximum likelihood estimates.

In any one measurement, any estimated parameters, however efficient, robust and accurate, are unlikely to be the actual parameters of the signal, since, at any finite SNR, noise alters the input signal. In the geometric language, the signal vector is being altered by the noise vector and our matched filtering aims at computing the projection of this altered vector onto the signal space. The true parameters are expected to lie within an ellipsoid of p dimensions at a certain confidence level – the volume of the ellipsoid increasing with the confidence level at a given SNR but decreasing with the SNR at a given confidence level.

5.3.1 Ambiguity function

The ambiguity function, well known in the statistical theory of signal detection [188Jump To The Next Citation Point], is a very powerful tool in signal analysis: it helps one to assess the number of templates required to span the parameter space of the signal [324], to make estimates of variances and covariances involved in the measurement of various parameters, to compute biases introduced in using a family of templates whose shape is not the same as that of a family of signals intended to be detected, etc. We will see below how the ambiguity function can be used to compute the required number of templates. Towards the end of this section we will use the ambiguity function for the estimation of parameters.

The ambiguity function is defined (see Equation (91View Equation) below) as the scalar product of two normalized waveforms maximized over the initial phase of the waveform, in other words, the absolute value of the scalar product4. A waveform e is said to be normalized if 1∕2 ⟨e, e⟩ = 1, where the inner product is inversely weighted by the PSD, as in Equation (79View Equation). Among other things, normalized waveforms help in defining signal strengths: a signal is said to be of strength h0 if h = h0e. Note that the optimal SNR for such a signal of strength h0 is, ⟨h,h ⟩1∕2 = h0.

Let e(t;α ), where i α = {α |i = 0,...,p} is the parameter vector comprised of p + 1 parameters, denote a normalized waveform. It is conventional to choose the parameter 0 α to be the lag τ, which simply corresponds to a coordinate time when an event occurs and is therefore called an extrinsic parameter, while the rest of the p parameters are called the intrinsic parameters and characterize the gravitational wave source.

Given two normalized waveforms e (t;α ) and e(t;β ), whose parameter vectors are not necessarily the same, the ambiguity 𝒜 is defined as

𝒜 (α,β ) ≡ |⟨e(α),e(β )⟩|. (91 )
Since the waveforms are normalized, 𝒜(α, α) = 1 and 𝒜(α, β) < 1, if α ⁄= β. Here, α can be thought of as the parameters of a (normalized) template while β those of a signal. With the template parameters α fixed, the ambiguity function is a function of p signal parameters βi, giving the SNR obtained by the template for different signals. The region in the signal parameter space for which a template obtains SNRs larger than a chosen value (called the minimal match [278Jump To The Next Citation Point]) is the span of that template. Template families should be chosen so that altogether they span the entire signal parameter space of interest with the least overlap of one other’s spans. One can equally well interpret the ambiguity function as the SNR obtained for a given signal by filters of different parameter values.

It is clear that the ambiguity function is a local maximum at the “correct” set of parameters, β = α. Search methods that vary β to find the best fit to the parameter values make use of this property in one way or another. But the ambiguity function will usually have secondary maxima as a function of β with fixed α. If these secondaries are only slightly smaller than the primary maximum, then noise can lead to confusion: it can, at random, sometimes elevate a secondary and suppress a primary. These can lead to false measurements of the parameters. Search methods need to be designed carefully to avoid this as much as possible. One way would be to fit the known properties of the ambiguity function to an ensemble of maxima. This would effectively average over the noise on individual peaks and point more reliably to the correct one.

It is important to note that in the definition of the ambiguity function there is no need for the functional forms of the template and signal to be the same; the definition holds true for any signal-template pair of waveforms. Moreover, the number of template parameters need not be identical (and usually aren’t) to the number of parameters characterizing the signal. For instance, a binary can be characterized by a large number of parameters, such as the masses, spins, eccentricity of the orbit, etc., while we may take as a model waveform the one involving only the masses. In the context of inspiral waves, e(t;β ) is the exact general relativistic waveform emitted by a binary, whose form we do not know, while the template family is a post-Newtonian, or some other, approximation to it, that will be used to detect the true waveform. Another example would be signals emitted by spinning neutron stars, isolated or in binaries, whose time evolution is unknown, either because we cannot anticipate all the physical effects that affect their spin, or because the parameter space is so large that we cannot possibly take into account all of them in a realistic search.

Of course, in such cases we cannot compute the ambiguity function, since one of the arguments to the ambiguity function is unknown. These are, indeed, issues where substantial work is called for. What are all the physical effects to be considered so as not to miss out a waveform from our search? How to make a choice of templates when the functional form of templates is different from those of signals? For this review it suffices to assume that the signal and template waveforms are of identical shape and the number of parameters in the two cases is the same.

5.3.2 Metric on the space of waveforms

The computational cost of a search and the estimation of parameters of a signal afford a lucid geometrical picture developed by Balasubramanian et al. [67Jump To The Next Citation Point] and Owen [278Jump To The Next Citation Point]. Much of the discussion below is borrowed from their work.

Let x k, k = 1,2,...,N, denote the discretely sampled output of a detector. The set of all possible detector outputs satisfy the usual axioms of a vector space. Therefore, xk can be thought of as an N-dimensional vector. It is more convenient to work in the continuum limit, in which case we have infinite dimensional vectors and the corresponding vector space. However, all the results are applicable to the realistic case in which detector outputs are treated as finite dimensional vectors.

Amongst all vectors, of particular interest are those corresponding to gravitational waves from a given astronomical source. While every signal can be thought of as a vector in the infinite-dimensional vector space of the detector outputs, the set of all such signal vectors do not, by themselves, form a vector space. However, the set of all normed signal vectors (i.e., signal vectors of unit norm) form a manifold, the parameters of the signal serving as a coordinate system [66Jump To The Next Citation Point67Jump To The Next Citation Point278Jump To The Next Citation Point280]. Thus, each class of an astronomical source forms an n-dimensional manifold 𝒮n, where n is the number of independent parameters characterizing the source. For instance, the set of all signals from a binary on a quasi-circular orbit inclined to the line of sight at an angle ι, consisting of nonspinning black holes of masses m1, and m2, located a distance D from the Earth5 initially in the direction (𝜃,φ ) and expected to merge at a time tC with the phase of the signal at merger φC, forms a nine-dimensional manifold with coordinates {D, 𝜃, φ,m1, m2, tC,φC ,ι,ψ}, where ψ is the polarization angle of the signal. In the general case of a signal characterized by n parameters we shall denote the parameters by pα, where α = 1,...,n.

The manifold 𝒮n can be endowed with a metric gαβ that is induced by the scalar product defined in Equation (79View Equation). The components of the metric in a coordinate system p α are defined by6

⟨ ⟩ ∂ˆh gαβ ≡ ∂α ˆh,∂βˆh , ∂αˆh ≡ --α-. (92 ) ∂p
The metric can then be used on the signal manifold as a measure of the proper distance dℓ between nearby signals with coordinates pα and pα + dpα, that is signals ˆh(pα) and ˆh(pα + dpα),
2 α β dℓ = gαβdp dp . (93 )

Now, by Taylor expanding ˆh(pα + dp α) around pα, and keeping only terms to second order in dpα, it is straightforward to see that the overlap 𝒪 of two infinitesimally close signals can be computed using the metric:

⟨ ⟩ 𝒪 (dp α;pα) ≡ ˆh (p α),ˆh(pα + dpα) 1 α β = 1 − 2gαβdp dp . (94 )

The metric on the signal manifold is nothing but the well-known Fisher information matrix usually denoted Γ αβ, (see, e.g., [188284]) but scaled down by the square of the SNR, i.e., gαβ = ρ −2Γ αβ. The information matrix is itself the inverse of the covariance matrix Cαβ and is a very useful quantity in signal analysis.

5.3.3 Covariance matrix

Having defined the metric, we next consider the application of the geometric formalism in the estimation of statistical errors involved in the measurement of the parameters. We closely follow the notation of Finn and Chernoff [159161115Jump To The Next Citation Point].

Let us suppose a signal of known shape with parameters α p is buried in background noise that is Gaussian and stationary. Since the signal shape is known, one can use matched filtering to dig the signal out of the noise. The measured parameters -- pα will, in general, differ from the true parameters of the signal7. Geometrically speaking, the noise vector displaces the signal vector and the process of matched filtering projects the (noise + signal) vector back on to the signal manifold. Thus, any nonzero noise will make it impossible to measure the true parameters of the signal. The best one can hope for is a proper statistical estimation of the influence of noise.

The posterior probability density function 𝒫 of the parameters -- pα is given by a multivariate Gaussian distribution8:

n [ ] 𝒫 (Δp α )dn Δp = ---d-Δp√---exp − 1-C− 1Δp α Δp β , (95 ) (2π)n∕2 C 2 αβ
where n is the number of parameters, α α -α Δp = p − p, and C αβ is the covariance matrix, C being its determinant. Noting that −1 C αβ = ρ2gαβ, we can rewrite the above distribution as
√ -- [ ] α n ρn gdnΔp ρ2 α β 𝒫(Δp )d Δp = ------n∕2--exp − --gαβ Δp Δp , (96 ) (2π) 2
where we have used the fact that C = 1∕(ρ2ng), g being the determinant of the metric gαβ. Note that if we define new parameters p′α = ρpα, then we have exactly the same distribution function for all SNRs, except that the deviations Δp α are scaled by ρ.

Let us first specialize to one dimension to illustrate the region of the parameter space with which one should associate an event at a given confidence level. In one dimension the distribution of the deviation from the mean of the measured value of the parameter p is given by

( ) √ --- ( ) dΔp Δp2 ρ gppdΔp ρ2 2 𝒫 (Δp )dΔp = √----- exp − ---2 = ---√-------exp − --gppΔp , (97 ) 2π σ 2σ 2π 2
where, analogous to the n-dimensional case, we have used σ2 = 1∕(ρ2gpp). Now, at a given SNR, what is the volume VP in the parameter space, such that the probability of finding the measured parameters p- inside this volume is P? This volume is defined by
∫ P = 𝒫 (Δp )dΔp. (98 ) Δp ∈VP
Although VP is not unique, it is customary to choose it to be centered around Δp = 0:
∫ ( 2) ∫ √--- ( 2 2 ) P = d√-Δp--exp − Δp-- = ρ--g√ppdΔp--exp − ρ-gppΔp--- , (99 ) (Δp∕σ)2≤r2(P) 2 πσ 2σ2 ρ2gppΔp2≤r2(P) 2π 2
where, given P, the above equation can be used to solve for r(P ) and it determines the range of integration: − rσ ≤ Δp ≤ rσ. For instance, the volumes VP corresponding to P ≃ 0.683, 0.954, 0.997,..., are the familiar intervals [− σ,σ], [− 2σ,2 σ], [− 3σ,3σ], ..., and the corresponding values of r are 1, 2, 3. Since ∘ ----- σ = 1∕ ρ2gpp, we see that in terms of gpp the above intervals translate to
[ ] [ ] [ ] 1- − --1--,--1-- , 1 − --2--,--2-- , 1 − --3--,--3-- ,.... (100 ) ρ √gpp- √gpp- ρ √gpp- √gpp- ρ √gpp- √gpp-
Thus, for a given probability P, the volume VP shrinks as 1∕ρ. The maximum distance dmax within which we can expect to find “triggers” at a given P depends inversely on the SNR ρ: ∘ ------- dℓ = gppΔp2 = r∕ ρ. Therefore, for P ≃ 0.954, r = 2 and at an SNR of 5 the maximum distance is 0.4, which corresponds to a match of 𝜖 = 1 − 1dℓ2 = 0.92 2. In other words, in one dimension 95% of the time we expect our triggers to come from templates that have an overlap greater than or equal to 0.92 with the buried signal when the SNR is five. This interpretation in terms of the match is a good approximation as long as dℓ ≪ 1, which will be true for large SNR events. However, for weaker signals and/or greater values of P we can’t interpret the results in terms of the match, although Equation (98View Equation) can be used to determine r(P ). As an example, at P ≃ 0.997, r = 3 and at an SNR of ρ = 4, the maximum distance is dℓ = 0.75 and the match is 𝜖 = 23∕32 ≃ 0.72, which is significantly smaller than one and the quadratic approximation is not good enough to compute the match.

These results generalize to n dimensions. In n-dimensions the volume VP is defined by

∫ P = 𝒫(Δp α)dn Δp. (101 ) Δpα∈VP
Again, V P is not unique but it is customary to center the volume around the point Δp α = 0:
∫ n√ --n [ 2 ] P = ρ---gd--Δp-exp − ρ-gα βΔp αΔp β . (102 ) ρ2gαβΔpα Δpβ≤r2(P,n) (2π)n∕2 2
Given P and the parameter space dimension n, one can iteratively solve the above equation for r(P, n). The volume VP is the surface defined by the equation
( ) α β r 2 gαβ Δp Δp = -- . (103 ) ρ
This is the equation of an n-dimensional ellipsoid whose size is defined by r∕ρ. For a given r (which determines the confidence level), the size of the ellipsoid is inversely proportional to the SNR, the volume decreasing as n ρ. However, the size is not small enough for all combinations of P and ρ to interpret the distance from the center of the ellipsoid to its surface, in terms of the overlap or match of the signals at the two locations, except when the distance is close to zero. This is because the expression for the match in terms of the metric is based on the quadratic approximation, which breaks down when the matches are small. However, the region defined by Equation (103View Equation) always corresponds to the probability P and there is no approximation here (except that the detector noise is Gaussian).

When the SNR ρ is large and 1 − P is not close to zero, the triggers are found from the signal with matches greater than or equal to r2(P,n) 1 − --2ρ2--. Table 2 lists the value of r for several values of P in one, two and three-dimensions and the minimum match 𝜖MM for SNRs 5, 10 and 20.

Table 2: The value of the (squared) distance dℓ2 = r2∕ρ2 for several values of P and the corresponding smallest match that can be expected between templates and the signal at different values of the SNR.
P = 0.683
P = 0.954
P = 0.997
ρ d ℓ2 𝜖MM dℓ2 𝜖MM dℓ2 𝜖MM
n = 1
5 0.04 0.9899 0.16 0.9592 0.36 0.9055
10 0.01 0.9975 0.04 0.9899 0.09 0.9772
20 0.0025 0.9994 0.01 0.9975 0.0225 0.9944
n = 2
5 0.092 0.9767 0.2470 0.9362 0.4800 0.8718
10 0.023 0.9942 0.0618 0.9844 0.1200 0.9695
20 0.00575 0.9986 0.0154 0.9961 0.0300 0.9925
n = 3
5 0.1412 0.9641 0.32 0.9165 0.568 0.8462
10 0.0353 0.9911 0.08 0.9798 0.142 0.9638
20 0.00883 0.9978 0.02 0.9950 0.0355 0.9911

Table 2 should be interpreted in light of the fact that triggers come from an analysis pipeline in which the templates are laid out with a certain minimal match and one cannot, therefore, expect the triggers from different detectors to be matched better than the minimal match.

From Table 2, we see that, when the SNR is large (say greater than about 10), the dependence of the match 𝜖MM on n is very weak; in other words, irrespective of the number of dimensions, we expect the match between the trigger and the true signal (and for our purposes the match between triggers from different instruments) to be pretty close to 1, and mostly larger than a minimal match of about 0.95 that is typically used in a search. Even when the SNR is in the region of 5, for low P again there is a weak dependence of 𝜖MM on the number of parameters. For large P and low SNR, however, the dependence of 𝜖MM on the number of dimensions becomes important. At an SNR of 5 and P ≃ 0.997, 𝜖MM = 0.91,0.87,0.85 for n = 1,2,3 dimensions, respectively.

Bounds on the estimation computed using the covariance matrix are called Cramér–Rao bounds. Cramér–Rao bounds are based on local analysis and do not take into consideration the effect of distant points in the parameter space on the errors computed at a given point, such as the secondary maxima in the likelihood. Though the Cramér–Rao bounds are in disagreement with maximum likelihood estimates, global analysis, taking the effect of distant points on the estimation of parameters, does indeed give results in agreement with maximum likelihood estimation as shown by Balasubramanian and Dhurandhar [65Jump To The Next Citation Point].

5.3.4 Bayesian inference

A good example of an efficient detection algorithm that is not a reliable estimator is the time-frequency transform of a chirp. For signals that are loud enough, a time-frequency transform of the data would be a very effective way of detecting the signal, but the transform contains hardly any information about the masses, spins and other information about the source. This is because the time-frequency transform of a chirp is a mapping from the multi-dimensional (17 in the most general case) space of chirps to just the two-dimensional space of time and frequency. Even matched filtering, which would use templates that are defined on the full parameter space of the signal, would not give the parameters at the expected accuracy. This is because the templates are defined only at a certain minimal match and might not resolve the signal well enough, especially for signals that have a high SNR.

In recent times Bayesian inference techniques have been applied with success in many areas in astronomy and cosmology. These techniques are probably the most sensible way of estimating the parameters, and the associated errors, but cannot be used to efficiently search for signals. Bayesian inference is among the simplest of statistical measures to state, but is not easy to compute and is often subject to controversies. Here we shall only discuss the basic tenets of the method and refer the reader for details to an excellent treatise on the subject (see, e.g., Sivia [343]).

To understand the chief ideas behind Bayesian inference, let us begin with some basic concepts in probability theory. Given two hypothesis or statements A and B about an observation, let P (A, B) denote the joint probability of A and B being true. For the sake of clarity, let A denote a statement about the universe and B some observation that has been made. Now, the joint probability can be expressed in terms of the individual probability densities P (A ) and P (B ) and conditional probability densities P(A |B) and P (B |A ) as follows:

P (A, B) = P (A )P(B |A) or P (A,B ) = P (B )P (A |B ). (104 )
The first of these equations says the joint probability of A and B both being true is the probability that A is true times the probability that B is true given that A is true and similarly the second. We can use the above equations to arrive at Bayes theorem:
P (B)P (A |B ) = P (A )P (B|A ) or P(A |B) = P-(A-)P(B-|A). (105 ) P (B )
The left-hand side of the above equation can be interpreted as a statement about A (the universe) given B (the data). This is the posterior probability density. The right-hand side contains P (B |A ), which is the probability that B is obtained given that A is true and is called the likelihood, P(A ), which is the probability of A, called the prior probability of A, and P (B ) (the prior of B), which is simply a normalizing constant often ignored in Bayesian analysis.

For instance, if A denotes the statement it is going to rain and B the amount of humidity in the air then the above equation gives us the posterior probability that it rains when the air contains a certain amount of humidity. Clearly, the posterior depends on what is the likelihood of the air having a certain humidity when it rains and the prior probability of rain on a given day. If the prior is very small (as it would be in a desert, for example) then you would need a rather large likelihood for the posterior to be large. Even when the prior is not so small, say a 50% chance of rain on any given day (as it would be if you are in Wales), the likelihood has to be large for posterior probability to say something about the relationship between the level of humidity and the chance of rain.

As another example, and more relevant to the subject of this review, let s be the statement the data contains a chirp (signal), n the statement the data contains an instrumental transient, (noise), and let t be a test that is performed to infer which of the two statements above are true. Let us suppose t is a very good test, in that it discriminates between s and n very well, and say the detection probability is as high as P(t|s) = 0.95 with a low false alarm rate P(t|n) = 0.05 (note that P (t|s) and P(t|n) need not necessarily add up to 1). Also, the expected event rate of a chirp during our observation is low, P (s) = 10−5, but the chance of an instrumental transient is relatively large, P (n) = 0.01. We are interested in knowing what the posterior probability of the data containing a chirp is, given that the test has been passed. By Bayes theorem this is

P (t|s)P (s) P (t|s)P (s) P(s|t) = -----------= ------------------------, (106 ) P (t) P (t|s)P (s) + P (t|n )P (n)
where P(t) (the probability of the test being positive) is taken to result from either the chirp or the instrumental transient. Substituting for various quantities in the above equation we find
0.95 × 10−5 P (s|t) = ---------−-5------------- ≃ 0.02. (107 ) 0.95 × 10 + 0.05 × 0.01
There is only a 2% chance that the data really contains a chirp when the test was taken. On the contrary, for the same data we find that the chance of an instrumental transient for a positive test result is P (n|t) ∼ 98%. Thus, though there is a high (low) probability for the test to be positive in the presence of a signal (noise) when the test is indeed positive, we cannot necessarily conclude that a signal is present. This is not surprising since the prior probability of the signal being present is very low. The situation can be remedied by designing a test that gives a far lower probability for the test to give a positive result in the case of an instrumental transient (i.e., a very low false alarm rate).

Thus, Bayesian inference neatly folds the prior knowledge about sources in the estimation process. One might worry that the outcome of a measurement process would be seriously biased by our preconception of the prior. To understand this better, let us rewrite Equation (106View Equation) as follows:

1 1 P (s|t) = 1 +-P-(t|n-)P(n-)∕P-(t|s)P(s)-= 1 +-L--p---, (108 ) NS SN
where LNS = P (t|n )∕P (t|s) is the ratio of the two likelihoods and pSN = P (s)∕P (n) is the ratio of the priors. The latter is not in the hands of a data analyst; it is determined by the nature of the source being searched for and the property of the instrument. The only way an analyst can make the posterior probability large is by choosing a test that gives a small value for the ratio of the two likelihoods. When LNS ≪ pSN (i.e., the likelihood of the test being positive when the signal is present is far larger, depending on the priors, than when the transient is present) the posterior will be close to unity.

The above example tells us why we have to work with unusually-small false-alarm probability in the case of gravitational wave searches. For instance, to search for binary coalescences in ground-based detectors we use a (power) SNR threshold of about 30 to 50. This is because the expected event rate is about 0.04 per year.

Computing the posterior involves multi-dimensional integrals and these are highly expensive computationally, when the number of parameters involved is large. This is why it is often not possible to apply Bayesian techniques to continuously streaming data; it would be sensible to reserve the application of Bayesian inference only for candidate events that are selected from inexpensive analysis techniques. Thus, although Bayesian analysis is not used in current detection pipelines, there has been a lot of effort in evaluating its ability to search for [116351123Jump To The Next Citation Point121Jump To The Next Citation Point] and measure the parameters of [117Jump To The Next Citation Point122Jump To The Next Citation Point377] a signal and in follow-up analysis [378].

  Go to previous page Go up Go to next page