The first thing to realize is that we can never be absolutely certain that a signal is present in a data train [159, 161]; we can only give confidence levels about its presence, which could be close to 100% for high values of the SNR. The next thing to realize is that, whatever the SNR may be, we cannot be absolutely certain about the true parameters of the signal: at best we can make an estimate and these estimates are given in a certain range. The width of the range depends on the confidence level required, being larger for higher confidence levels .
Maximum likelihood estimates have long been used to measure the parameters of a known signal buried in noisy data. The method consists in maximizing the likelihood ratio – the ratio of the probability that a given signal is present in the data to the probability that the signal is absent [188, 159]. Maximum likelihood estimates are not always minimum uncertainty estimates, as has been particularly demonstrated in the case of binary inspiral signals by Balasubramanian, et al. [66, 67]. However, until recently, this is the method that has been very widely followed in the gravitational wave literature. But what is important to note is that maximum likelihood estimates are unbiased when the SNR is large3, and the mean of the distribution of measured values of the parameters will be centered around the true parameter values. This is an important quality that will be useful in our discussion below.
Bayesian estimates, which take into account any prior knowledge that may be available about the distribution of the source parameters, often give much better estimates and do not rely on the availability of an ensemble of detector outputs [343, 274]. However, they are computationally a lot more expensive than maximum likelihood estimates.
In any one measurement, any estimated parameters, however efficient, robust and accurate, are unlikely to be the actual parameters of the signal, since, at any finite SNR, noise alters the input signal. In the geometric language, the signal vector is being altered by the noise vector and our matched filtering aims at computing the projection of this altered vector onto the signal space. The true parameters are expected to lie within an ellipsoid of dimensions at a certain confidence level – the volume of the ellipsoid increasing with the confidence level at a given SNR but decreasing with the SNR at a given confidence level.
The ambiguity function, well known in the statistical theory of signal detection , is a very powerful tool in signal analysis: it helps one to assess the number of templates required to span the parameter space of the signal , to make estimates of variances and covariances involved in the measurement of various parameters, to compute biases introduced in using a family of templates whose shape is not the same as that of a family of signals intended to be detected, etc. We will see below how the ambiguity function can be used to compute the required number of templates. Towards the end of this section we will use the ambiguity function for the estimation of parameters.
The ambiguity function is defined (see Equation (91) below) as the scalar product of two normalized waveforms maximized over the initial phase of the waveform, in other words, the absolute value of the scalar product4. A waveform is said to be normalized if , where the inner product is inversely weighted by the PSD, as in Equation (79). Among other things, normalized waveforms help in defining signal strengths: a signal is said to be of strength if . Note that the optimal SNR for such a signal of strength is, .
Let , where is the parameter vector comprised of parameters, denote a normalized waveform. It is conventional to choose the parameter to be the lag , which simply corresponds to a coordinate time when an event occurs and is therefore called an extrinsic parameter, while the rest of the parameters are called the intrinsic parameters and characterize the gravitational wave source.
Given two normalized waveforms and , whose parameter vectors are not necessarily the same, the ambiguity is defined asminimal match ) is the span of that template. Template families should be chosen so that altogether they span the entire signal parameter space of interest with the least overlap of one other’s spans. One can equally well interpret the ambiguity function as the SNR obtained for a given signal by filters of different parameter values.
It is clear that the ambiguity function is a local maximum at the “correct” set of parameters, . Search methods that vary to find the best fit to the parameter values make use of this property in one way or another. But the ambiguity function will usually have secondary maxima as a function of with fixed . If these secondaries are only slightly smaller than the primary maximum, then noise can lead to confusion: it can, at random, sometimes elevate a secondary and suppress a primary. These can lead to false measurements of the parameters. Search methods need to be designed carefully to avoid this as much as possible. One way would be to fit the known properties of the ambiguity function to an ensemble of maxima. This would effectively average over the noise on individual peaks and point more reliably to the correct one.
It is important to note that in the definition of the ambiguity function there is no need for the functional forms of the template and signal to be the same; the definition holds true for any signal-template pair of waveforms. Moreover, the number of template parameters need not be identical (and usually aren’t) to the number of parameters characterizing the signal. For instance, a binary can be characterized by a large number of parameters, such as the masses, spins, eccentricity of the orbit, etc., while we may take as a model waveform the one involving only the masses. In the context of inspiral waves, is the exact general relativistic waveform emitted by a binary, whose form we do not know, while the template family is a post-Newtonian, or some other, approximation to it, that will be used to detect the true waveform. Another example would be signals emitted by spinning neutron stars, isolated or in binaries, whose time evolution is unknown, either because we cannot anticipate all the physical effects that affect their spin, or because the parameter space is so large that we cannot possibly take into account all of them in a realistic search.
Of course, in such cases we cannot compute the ambiguity function, since one of the arguments to the ambiguity function is unknown. These are, indeed, issues where substantial work is called for. What are all the physical effects to be considered so as not to miss out a waveform from our search? How to make a choice of templates when the functional form of templates is different from those of signals? For this review it suffices to assume that the signal and template waveforms are of identical shape and the number of parameters in the two cases is the same.
The computational cost of a search and the estimation of parameters of a signal afford a lucid geometrical picture developed by Balasubramanian et al.  and Owen . Much of the discussion below is borrowed from their work.
Let , , denote the discretely sampled output of a detector. The set of all possible detector outputs satisfy the usual axioms of a vector space. Therefore, can be thought of as an -dimensional vector. It is more convenient to work in the continuum limit, in which case we have infinite dimensional vectors and the corresponding vector space. However, all the results are applicable to the realistic case in which detector outputs are treated as finite dimensional vectors.
Amongst all vectors, of particular interest are those corresponding to gravitational waves from a given astronomical source. While every signal can be thought of as a vector in the infinite-dimensional vector space of the detector outputs, the set of all such signal vectors do not, by themselves, form a vector space. However, the set of all normed signal vectors (i.e., signal vectors of unit norm) form a manifold, the parameters of the signal serving as a coordinate system [66, 67, 278, 280]. Thus, each class of an astronomical source forms an -dimensional manifold , where is the number of independent parameters characterizing the source. For instance, the set of all signals from a binary on a quasi-circular orbit inclined to the line of sight at an angle , consisting of nonspinning black holes of masses , and , located a distance from the Earth5 initially in the direction and expected to merge at a time with the phase of the signal at merger , forms a nine-dimensional manifold with coordinates , where is the polarization angle of the signal. In the general case of a signal characterized by parameters we shall denote the parameters by , where .
The manifold can be endowed with a metric that is induced by the scalar product defined in Equation (79). The components of the metric in a coordinate system are defined by6
Now, by Taylor expanding around , and keeping only terms to second order in , it is straightforward to see that the overlap of two infinitesimally close signals can be computed using the metric:
The metric on the signal manifold is nothing but the well-known Fisher information matrix usually denoted , (see, e.g., [188, 284]) but scaled down by the square of the SNR, i.e., . The information matrix is itself the inverse of the covariance matrix and is a very useful quantity in signal analysis.
Having defined the metric, we next consider the application of the geometric formalism in the estimation of statistical errors involved in the measurement of the parameters. We closely follow the notation of Finn and Chernoff [159, 161, 115].
Let us suppose a signal of known shape with parameters is buried in background noise that is Gaussian and stationary. Since the signal shape is known, one can use matched filtering to dig the signal out of the noise. The measured parameters will, in general, differ from the true parameters of the signal7. Geometrically speaking, the noise vector displaces the signal vector and the process of matched filtering projects the (noise + signal) vector back on to the signal manifold. Thus, any nonzero noise will make it impossible to measure the true parameters of the signal. The best one can hope for is a proper statistical estimation of the influence of noise.
The posterior probability density function of the parameters is given by a multivariate Gaussian distribution8:same distribution function for all SNRs, except that the deviations are scaled by .
Let us first specialize to one dimension to illustrate the region of the parameter space with which one should associate an event at a given confidence level. In one dimension the distribution of the deviation from the mean of the measured value of the parameter is given by
These results generalize to dimensions. In -dimensions the volume is defined by
When the SNR is large and is not close to zero, the triggers are found from the signal with matches greater than or equal to . Table 2 lists the value of for several values of in one, two and three-dimensions and the minimum match for SNRs 5, 10 and 20.
Table 2 should be interpreted in light of the fact that triggers come from an analysis pipeline in which the templates are laid out with a certain minimal match and one cannot, therefore, expect the triggers from different detectors to be matched better than the minimal match.
From Table 2, we see that, when the SNR is large (say greater than about 10), the dependence of the match on is very weak; in other words, irrespective of the number of dimensions, we expect the match between the trigger and the true signal (and for our purposes the match between triggers from different instruments) to be pretty close to 1, and mostly larger than a minimal match of about 0.95 that is typically used in a search. Even when the SNR is in the region of 5, for low again there is a weak dependence of on the number of parameters. For large and low SNR, however, the dependence of on the number of dimensions becomes important. At an SNR of 5 and , for dimensions, respectively.
Bounds on the estimation computed using the covariance matrix are called Cramér–Rao bounds. Cramér–Rao bounds are based on local analysis and do not take into consideration the effect of distant points in the parameter space on the errors computed at a given point, such as the secondary maxima in the likelihood. Though the Cramér–Rao bounds are in disagreement with maximum likelihood estimates, global analysis, taking the effect of distant points on the estimation of parameters, does indeed give results in agreement with maximum likelihood estimation as shown by Balasubramanian and Dhurandhar .
A good example of an efficient detection algorithm that is not a reliable estimator is the time-frequency transform of a chirp. For signals that are loud enough, a time-frequency transform of the data would be a very effective way of detecting the signal, but the transform contains hardly any information about the masses, spins and other information about the source. This is because the time-frequency transform of a chirp is a mapping from the multi-dimensional (17 in the most general case) space of chirps to just the two-dimensional space of time and frequency. Even matched filtering, which would use templates that are defined on the full parameter space of the signal, would not give the parameters at the expected accuracy. This is because the templates are defined only at a certain minimal match and might not resolve the signal well enough, especially for signals that have a high SNR.
In recent times Bayesian inference techniques have been applied with success in many areas in astronomy and cosmology. These techniques are probably the most sensible way of estimating the parameters, and the associated errors, but cannot be used to efficiently search for signals. Bayesian inference is among the simplest of statistical measures to state, but is not easy to compute and is often subject to controversies. Here we shall only discuss the basic tenets of the method and refer the reader for details to an excellent treatise on the subject (see, e.g., Sivia ).
To understand the chief ideas behind Bayesian inference, let us begin with some basic concepts in probability theory. Given two hypothesis or statements and about an observation, let denote the joint probability of and being true. For the sake of clarity, let denote a statement about the universe and some observation that has been made. Now, the joint probability can be expressed in terms of the individual probability densities and and conditional probability densities and as follows:the joint probability of and both being true is the probability that is true times the probability that is true given that is true and similarly the second. We can use the above equations to arrive at Bayes theorem: posterior probability density. The right-hand side contains , which is the probability that is obtained given that is true and is called the likelihood, , which is the probability of , called the prior probability of , and (the prior of ), which is simply a normalizing constant often ignored in Bayesian analysis.
For instance, if denotes the statement it is going to rain and the amount of humidity in the air then the above equation gives us the posterior probability that it rains when the air contains a certain amount of humidity. Clearly, the posterior depends on what is the likelihood of the air having a certain humidity when it rains and the prior probability of rain on a given day. If the prior is very small (as it would be in a desert, for example) then you would need a rather large likelihood for the posterior to be large. Even when the prior is not so small, say a 50% chance of rain on any given day (as it would be if you are in Wales), the likelihood has to be large for posterior probability to say something about the relationship between the level of humidity and the chance of rain.
As another example, and more relevant to the subject of this review, let be the statement the data contains a chirp (signal), the statement the data contains an instrumental transient, (noise), and let be a test that is performed to infer which of the two statements above are true. Let us suppose is a very good test, in that it discriminates between and very well, and say the detection probability is as high as with a low false alarm rate (note that and need not necessarily add up to 1). Also, the expected event rate of a chirp during our observation is low, , but the chance of an instrumental transient is relatively large, . We are interested in knowing what the posterior probability of the data containing a chirp is, given that the test has been passed. By Bayes theorem this is
Thus, Bayesian inference neatly folds the prior knowledge about sources in the estimation process. One might worry that the outcome of a measurement process would be seriously biased by our preconception of the prior. To understand this better, let us rewrite Equation (106) as follows:
The above example tells us why we have to work with unusually-small false-alarm probability in the case of gravitational wave searches. For instance, to search for binary coalescences in ground-based detectors we use a (power) SNR threshold of about 30 to 50. This is because the expected event rate is about 0.04 per year.
Computing the posterior involves multi-dimensional integrals and these are highly expensive computationally, when the number of parameters involved is large. This is why it is often not possible to apply Bayesian techniques to continuously streaming data; it would be sensible to reserve the application of Bayesian inference only for candidate events that are selected from inexpensive analysis techniques. Thus, although Bayesian analysis is not used in current detection pipelines, there has been a lot of effort in evaluating its ability to search for [116, 351, 123, 121] and measure the parameters of [117, 122, 377] a signal and in follow-up analysis .
This work is licensed under a Creative Commons License.