The first thing to realize is that we can never be absolutely certain that a signal is present in a data train [159, 161]; we can only give confidence levels about its presence, which could be close to 100% for high values of the SNR. The next thing to realize is that, whatever the SNR may be, we cannot be absolutely certain about the true parameters of the signal: at best we can make an estimate and these estimates are given in a certain range. The width of the range depends on the confidence level required, being larger for higher confidence levels [159].

Maximum likelihood estimates have long been used to measure the parameters of a known signal buried
in noisy data. The method consists in maximizing the likelihood ratio – the ratio of the probability that a
given signal is present in the data to the probability that the signal is absent [188, 159]. Maximum
likelihood estimates are not always minimum uncertainty estimates, as has been particularly demonstrated
in the case of binary inspiral signals by Balasubramanian, et al. [66, 67]. However, until recently,
this is the method that has been very widely followed in the gravitational wave literature. But
what is important to note is that maximum likelihood estimates are unbiased when the SNR is
large^{3},
and the mean of the distribution of measured values of the parameters will be centered around
the true parameter values. This is an important quality that will be useful in our discussion
below.

Bayesian estimates, which take into account any prior knowledge that may be available about the distribution of the source parameters, often give much better estimates and do not rely on the availability of an ensemble of detector outputs [343, 274]. However, they are computationally a lot more expensive than maximum likelihood estimates.

In any one measurement, any estimated parameters, however efficient, robust and accurate, are unlikely to be the actual parameters of the signal, since, at any finite SNR, noise alters the input signal. In the geometric language, the signal vector is being altered by the noise vector and our matched filtering aims at computing the projection of this altered vector onto the signal space. The true parameters are expected to lie within an ellipsoid of dimensions at a certain confidence level – the volume of the ellipsoid increasing with the confidence level at a given SNR but decreasing with the SNR at a given confidence level.

The ambiguity function, well known in the statistical theory of signal detection [188], is a very powerful tool in signal analysis: it helps one to assess the number of templates required to span the parameter space of the signal [324], to make estimates of variances and covariances involved in the measurement of various parameters, to compute biases introduced in using a family of templates whose shape is not the same as that of a family of signals intended to be detected, etc. We will see below how the ambiguity function can be used to compute the required number of templates. Towards the end of this section we will use the ambiguity function for the estimation of parameters.

The ambiguity function is defined (see Equation (91) below) as the scalar product of two normalized
waveforms maximized over the initial phase of the waveform, in other words, the absolute value of the scalar
product^{4}.
A waveform is said to be normalized if , where the inner product is inversely weighted by
the PSD, as in Equation (79). Among other things, normalized waveforms help in defining signal strengths:
a signal is said to be of strength if . Note that the optimal SNR for such a signal of strength
is, .

Let , where is the parameter vector comprised of parameters, denote a normalized waveform. It is conventional to choose the parameter to be the lag , which simply corresponds to a coordinate time when an event occurs and is therefore called an extrinsic parameter, while the rest of the parameters are called the intrinsic parameters and characterize the gravitational wave source.

Given two normalized waveforms and , whose parameter vectors are not necessarily the same, the ambiguity is defined as

Since the waveforms are normalized, and , if . Here, can be thought of as the parameters of a (normalized) template while those of a signal. With the template parameters fixed, the ambiguity function is a function of signal parameters , giving the SNR obtained by the template for different signals. The region in the signal parameter space for which a template obtains SNRs larger than a chosen value (called the minimal match [278]) is the span of that template. Template families should be chosen so that altogether they span the entire signal parameter space of interest with the least overlap of one other’s spans. One can equally well interpret the ambiguity function as the SNR obtained for a given signal by filters of different parameter values.It is clear that the ambiguity function is a local maximum at the “correct” set of parameters, . Search methods that vary to find the best fit to the parameter values make use of this property in one way or another. But the ambiguity function will usually have secondary maxima as a function of with fixed . If these secondaries are only slightly smaller than the primary maximum, then noise can lead to confusion: it can, at random, sometimes elevate a secondary and suppress a primary. These can lead to false measurements of the parameters. Search methods need to be designed carefully to avoid this as much as possible. One way would be to fit the known properties of the ambiguity function to an ensemble of maxima. This would effectively average over the noise on individual peaks and point more reliably to the correct one.

It is important to note that in the definition of the ambiguity function there is no need for the functional forms of the template and signal to be the same; the definition holds true for any signal-template pair of waveforms. Moreover, the number of template parameters need not be identical (and usually aren’t) to the number of parameters characterizing the signal. For instance, a binary can be characterized by a large number of parameters, such as the masses, spins, eccentricity of the orbit, etc., while we may take as a model waveform the one involving only the masses. In the context of inspiral waves, is the exact general relativistic waveform emitted by a binary, whose form we do not know, while the template family is a post-Newtonian, or some other, approximation to it, that will be used to detect the true waveform. Another example would be signals emitted by spinning neutron stars, isolated or in binaries, whose time evolution is unknown, either because we cannot anticipate all the physical effects that affect their spin, or because the parameter space is so large that we cannot possibly take into account all of them in a realistic search.

Of course, in such cases we cannot compute the ambiguity function, since one of the arguments to the ambiguity function is unknown. These are, indeed, issues where substantial work is called for. What are all the physical effects to be considered so as not to miss out a waveform from our search? How to make a choice of templates when the functional form of templates is different from those of signals? For this review it suffices to assume that the signal and template waveforms are of identical shape and the number of parameters in the two cases is the same.

The computational cost of a search and the estimation of parameters of a signal afford a lucid geometrical picture developed by Balasubramanian et al. [67] and Owen [278]. Much of the discussion below is borrowed from their work.

Let , , denote the discretely sampled output of a detector. The set of all possible detector outputs satisfy the usual axioms of a vector space. Therefore, can be thought of as an -dimensional vector. It is more convenient to work in the continuum limit, in which case we have infinite dimensional vectors and the corresponding vector space. However, all the results are applicable to the realistic case in which detector outputs are treated as finite dimensional vectors.

Amongst all vectors, of particular interest are those corresponding to gravitational waves from a given
astronomical source. While every signal can be thought of as a vector in the infinite-dimensional vector
space of the detector outputs, the set of all such signal vectors do not, by themselves, form a vector
space. However, the set of all normed signal vectors (i.e., signal vectors of unit norm) form a
manifold, the parameters of the signal serving as a coordinate system [66, 67, 278, 280]. Thus,
each class of an astronomical source forms an -dimensional manifold , where is
the number of independent parameters characterizing the source. For instance, the set of all
signals from a binary on a quasi-circular orbit inclined to the line of sight at an angle ,
consisting of nonspinning black holes of masses , and , located a distance from the
Earth^{5}
initially in the direction and expected to merge at a time with the phase of the signal at
merger , forms a nine-dimensional manifold with coordinates , where
is the polarization angle of the signal. In the general case of a signal characterized by parameters
we shall denote the parameters by , where .

The manifold can be endowed with a metric that is induced by the scalar product
defined in Equation (79). The components of the metric in a coordinate system are defined
by^{6}

Now, by Taylor expanding around , and keeping only terms to second order in , it is straightforward to see that the overlap of two infinitesimally close signals can be computed using the metric:

The metric on the signal manifold is nothing but the well-known Fisher information matrix usually denoted , (see, e.g., [188, 284]) but scaled down by the square of the SNR, i.e., . The information matrix is itself the inverse of the covariance matrix and is a very useful quantity in signal analysis.

Having defined the metric, we next consider the application of the geometric formalism in the estimation of statistical errors involved in the measurement of the parameters. We closely follow the notation of Finn and Chernoff [159, 161, 115].

Let us suppose a signal of known shape with parameters is buried in background noise that is
Gaussian and stationary. Since the signal shape is known, one can use matched filtering to dig the signal out
of the noise. The measured parameters will, in general, differ from the true parameters of the
signal^{7}.
Geometrically speaking, the noise vector displaces the signal vector and the process of matched filtering
projects the (noise + signal) vector back on to the signal manifold. Thus, any nonzero noise will make it
impossible to measure the true parameters of the signal. The best one can hope for is a proper statistical
estimation of the influence of noise.

The posterior probability density function of the parameters is given by a multivariate Gaussian
distribution^{8}:

Let us first specialize to one dimension to illustrate the region of the parameter space with which one should associate an event at a given confidence level. In one dimension the distribution of the deviation from the mean of the measured value of the parameter is given by

where, analogous to the -dimensional case, we have used . Now, at a given SNR, what is the volume in the parameter space, such that the probability of finding the measured parameters inside this volume is ? This volume is defined by Although is not unique, it is customary to choose it to be centered around : where, given , the above equation can be used to solve for and it determines the range of integration: . For instance, the volumes corresponding to , are the familiar intervals , , , , and the corresponding values of are 1, 2, 3. Since , we see that in terms of the above intervals translate to Thus, for a given probability , the volume shrinks as . The maximum distance within which we can expect to find “triggers” at a given depends inversely on the SNR : . Therefore, for , and at an SNR of the maximum distance is 0.4, which corresponds to a match of . In other words, in one dimension 95% of the time we expect our triggers to come from templates that have an overlap greater than or equal to 0.92 with the buried signal when the SNR is five. This interpretation in terms of the match is a good approximation as long as , which will be true for large SNR events. However, for weaker signals and/or greater values of we can’t interpret the results in terms of the match, although Equation (98) can be used to determine . As an example, at , and at an SNR of , the maximum distance is and the match is , which is significantly smaller than one and the quadratic approximation is not good enough to compute the match.These results generalize to dimensions. In -dimensions the volume is defined by

Again, is not unique but it is customary to center the volume around the point : Given and the parameter space dimension , one can iteratively solve the above equation for . The volume is the surface defined by the equation This is the equation of an -dimensional ellipsoid whose size is defined by . For a given (which determines the confidence level), the size of the ellipsoid is inversely proportional to the SNR, the volume decreasing as . However, the size is not small enough for all combinations of and to interpret the distance from the center of the ellipsoid to its surface, in terms of the overlap or match of the signals at the two locations, except when the distance is close to zero. This is because the expression for the match in terms of the metric is based on the quadratic approximation, which breaks down when the matches are small. However, the region defined by Equation (103) always corresponds to the probability and there is no approximation here (except that the detector noise is Gaussian).When the SNR is large and is not close to zero, the triggers are found from the signal with matches greater than or equal to . Table 2 lists the value of for several values of in one, two and three-dimensions and the minimum match for SNRs 5, 10 and 20.

5 | 0.04 | 0.9899 | 0.16 | 0.9592 | 0.36 | 0.9055 |

10 | 0.01 | 0.9975 | 0.04 | 0.9899 | 0.09 | 0.9772 |

20 | 0.0025 | 0.9994 | 0.01 | 0.9975 | 0.0225 | 0.9944 |

5 | 0.092 | 0.9767 | 0.2470 | 0.9362 | 0.4800 | 0.8718 |

10 | 0.023 | 0.9942 | 0.0618 | 0.9844 | 0.1200 | 0.9695 |

20 | 0.00575 | 0.9986 | 0.0154 | 0.9961 | 0.0300 | 0.9925 |

5 | 0.1412 | 0.9641 | 0.32 | 0.9165 | 0.568 | 0.8462 |

10 | 0.0353 | 0.9911 | 0.08 | 0.9798 | 0.142 | 0.9638 |

20 | 0.00883 | 0.9978 | 0.02 | 0.9950 | 0.0355 | 0.9911 |

Table 2 should be interpreted in light of the fact that triggers come from an analysis pipeline in which the templates are laid out with a certain minimal match and one cannot, therefore, expect the triggers from different detectors to be matched better than the minimal match.

From Table 2, we see that, when the SNR is large (say greater than about 10), the dependence of the match on is very weak; in other words, irrespective of the number of dimensions, we expect the match between the trigger and the true signal (and for our purposes the match between triggers from different instruments) to be pretty close to 1, and mostly larger than a minimal match of about 0.95 that is typically used in a search. Even when the SNR is in the region of 5, for low again there is a weak dependence of on the number of parameters. For large and low SNR, however, the dependence of on the number of dimensions becomes important. At an SNR of 5 and , for dimensions, respectively.

Bounds on the estimation computed using the covariance matrix are called Cramér–Rao bounds. Cramér–Rao bounds are based on local analysis and do not take into consideration the effect of distant points in the parameter space on the errors computed at a given point, such as the secondary maxima in the likelihood. Though the Cramér–Rao bounds are in disagreement with maximum likelihood estimates, global analysis, taking the effect of distant points on the estimation of parameters, does indeed give results in agreement with maximum likelihood estimation as shown by Balasubramanian and Dhurandhar [65].

A good example of an efficient detection algorithm that is not a reliable estimator is the time-frequency transform of a chirp. For signals that are loud enough, a time-frequency transform of the data would be a very effective way of detecting the signal, but the transform contains hardly any information about the masses, spins and other information about the source. This is because the time-frequency transform of a chirp is a mapping from the multi-dimensional (17 in the most general case) space of chirps to just the two-dimensional space of time and frequency. Even matched filtering, which would use templates that are defined on the full parameter space of the signal, would not give the parameters at the expected accuracy. This is because the templates are defined only at a certain minimal match and might not resolve the signal well enough, especially for signals that have a high SNR.

In recent times Bayesian inference techniques have been applied with success in many areas in astronomy and cosmology. These techniques are probably the most sensible way of estimating the parameters, and the associated errors, but cannot be used to efficiently search for signals. Bayesian inference is among the simplest of statistical measures to state, but is not easy to compute and is often subject to controversies. Here we shall only discuss the basic tenets of the method and refer the reader for details to an excellent treatise on the subject (see, e.g., Sivia [343]).

To understand the chief ideas behind Bayesian inference, let us begin with some basic concepts in probability theory. Given two hypothesis or statements and about an observation, let denote the joint probability of and being true. For the sake of clarity, let denote a statement about the universe and some observation that has been made. Now, the joint probability can be expressed in terms of the individual probability densities and and conditional probability densities and as follows:

The first of these equations says the joint probability of and both being true is the probability that is true times the probability that is true given that is true and similarly the second. We can use the above equations to arrive at Bayes theorem: The left-hand side of the above equation can be interpreted as a statement about (the universe) given (the data). This is the posterior probability density. The right-hand side contains , which is the probability that is obtained given that is true and is called the likelihood, , which is the probability of , called the prior probability of , and (the prior of ), which is simply a normalizing constant often ignored in Bayesian analysis.For instance, if denotes the statement it is going to rain and the amount of humidity in the air then the above equation gives us the posterior probability that it rains when the air contains a certain amount of humidity. Clearly, the posterior depends on what is the likelihood of the air having a certain humidity when it rains and the prior probability of rain on a given day. If the prior is very small (as it would be in a desert, for example) then you would need a rather large likelihood for the posterior to be large. Even when the prior is not so small, say a 50% chance of rain on any given day (as it would be if you are in Wales), the likelihood has to be large for posterior probability to say something about the relationship between the level of humidity and the chance of rain.

As another example, and more relevant to the subject of this review, let be the statement the data contains a chirp (signal), the statement the data contains an instrumental transient, (noise), and let be a test that is performed to infer which of the two statements above are true. Let us suppose is a very good test, in that it discriminates between and very well, and say the detection probability is as high as with a low false alarm rate (note that and need not necessarily add up to 1). Also, the expected event rate of a chirp during our observation is low, , but the chance of an instrumental transient is relatively large, . We are interested in knowing what the posterior probability of the data containing a chirp is, given that the test has been passed. By Bayes theorem this is

where (the probability of the test being positive) is taken to result from either the chirp or the instrumental transient. Substituting for various quantities in the above equation we find There is only a 2% chance that the data really contains a chirp when the test was taken. On the contrary, for the same data we find that the chance of an instrumental transient for a positive test result is . Thus, though there is a high (low) probability for the test to be positive in the presence of a signal (noise) when the test is indeed positive, we cannot necessarily conclude that a signal is present. This is not surprising since the prior probability of the signal being present is very low. The situation can be remedied by designing a test that gives a far lower probability for the test to give a positive result in the case of an instrumental transient (i.e., a very low false alarm rate).Thus, Bayesian inference neatly folds the prior knowledge about sources in the estimation process. One might worry that the outcome of a measurement process would be seriously biased by our preconception of the prior. To understand this better, let us rewrite Equation (106) as follows:

where is the ratio of the two likelihoods and is the ratio of the priors. The latter is not in the hands of a data analyst; it is determined by the nature of the source being searched for and the property of the instrument. The only way an analyst can make the posterior probability large is by choosing a test that gives a small value for the ratio of the two likelihoods. When (i.e., the likelihood of the test being positive when the signal is present is far larger, depending on the priors, than when the transient is present) the posterior will be close to unity.The above example tells us why we have to work with unusually-small false-alarm probability in the case of gravitational wave searches. For instance, to search for binary coalescences in ground-based detectors we use a (power) SNR threshold of about 30 to 50. This is because the expected event rate is about 0.04 per year.

Computing the posterior involves multi-dimensional integrals and these are highly expensive computationally, when the number of parameters involved is large. This is why it is often not possible to apply Bayesian techniques to continuously streaming data; it would be sensible to reserve the application of Bayesian inference only for candidate events that are selected from inexpensive analysis techniques. Thus, although Bayesian analysis is not used in current detection pipelines, there has been a lot of effort in evaluating its ability to search for [116, 351, 123, 121] and measure the parameters of [117, 122, 377] a signal and in follow-up analysis [378].

http://www.livingreviews.org/lrr-2009-2 |
This work is licensed under a Creative Commons License. Problems/comments to |