## 4 Testing Techniques

### 4.1 Coalescence analysis

Gravitational waves emitted during the inspiral, merger and ringdown of compact binaries are the most studied in the context of data analysis and parameter estimation. In this section, we will review some of the main data analysis techniques employed in the context of parameter estimation and tests of GR. We begin with a discussion of matched filtering and Fisher theory (for a detailed review, see [173, 103, 125*, 174*, 248]). We then continue with a discussion of Bayesian parameter estimation and hypothesis testing (for a detailed review, see [387, 205, 123, 294*]).

#### 4.1.1 Matched filtering and Fisher’s analysis

When the detector noise is Gaussian and stationary, and when the signal is known very well, the optimal detection strategy is matched filtering. For any given realization, such noise can be characterized by its power spectral density , defined via

where the tilde stands for the Fourier transform, the asterisk for complex conjugation and the brackets for the expectation value.The detectability of a signal is determined by its signal-to-noise ratio or SNR, which is defined via

where is a template with parameters and we have defined the inner product If the templates do not exactly match the signal, then the SNR is reduced by a factor of , called the match: where is the mismatch.For the noise assumptions made here, the probability of measuring in the detector output, given a template , is given by

and thus the waveform that best fits the signal is that with best-fit parameters such that the argument of the exponential is minimized. For large SNR, the best-fit parameters will have a multivariate Gaussian distribution centered on the true values of the signal , and thus, the waveform parameters that best fit the signal minimize the argument of the exponential. The parameter errors will be distributed according to where is the Fisher matrix The root-mean-squared () error on a given parameter is then where is the variance-covariance matrix and summation is not implied in Eq. (83*) ( denotes a particular element of the vector ). This root-mean-squared error is sometimes referred to as the statistical error in the measurement of . One can use Eq. (83*) to estimate how well modified gravity parameters can be measured. Put another way, if a gravitational wave were detected and found consistent with GR, Eq. (83*) would provide an estimate of how close to zero these modified gravity parameters would have to be.The Fisher method to estimate projected constraints on modified gravity theory parameters is as follows. First, one constructs a waveform model in the particular modified gravity theory one wishes to constrain. Usually, this waveform will be similar to the GR one, but it will contain an additional parameter, , such that the template parameters are now plus . Let us assume that as , the modified gravity waveform reduces to the GR expectation. Then, the accuracy to which can be measured, or the accuracy to which we can say is zero, is approximately , where the Fisher matrix associated with this variance-covariance matrix must be computed with the non-GR model evaluated at the GR limit (). Such a method for estimating how well modified gravity theories can be constrained was pioneered by Will in [436*, 353], and since then, it has been widely employed as a first-cut estimate of the accuracy to which different theories can be constrained.

The Fisher method described above can dangerously lead to incorrect results if abused [414, 415]. One must understand that this method is suitable only if the noise is stationary and Gaussian and if the SNR is sufficiently large. How large an SNR is required for Fisher methods to work depends somewhat on the signals considered, but usually for applications concerning tests of GR, one would be safe with or so. In real data analysis, the first two conditions are almost never satisfied. Moreover, the first detections that will be made will probably be of low SNR, i.e., , for which again the Fisher method fails. In such cases, more sophisticated parameter estimation methods need to be employed.

#### 4.1.2 Bayesian theory and model testing

Bayesian theory is ideal for parameter estimation and model selection. Let us then assume that a detection has been made and that the gravitational wave signal in the data can be described by some model , parameterized by the vector . Using Bayes’ theorem, the posterior distribution function (PDF) or the probability density function for the model parameters, given data and model , is

Obviously, the global maximum of the PDF in the parameter manifold gives the best fit parameters for that model. The prior probability density represents our prior beliefs of the parameter range in model . The marginalized likelihood or evidence, is the normalization constant which clearly guarantees that the integral of Eq. (84*) integrates to unity. The quantity is the likelihood function, which is simply given by Eq. (80*), with a given normalization. In that equation we used slightly different notation, with being the data and the template associated with model and parameterized by . The marginalized PDF, which represents the probability density function for a given parameter (recall that is a particular element of ), after marginalizing over all other parameters, is given by where the integration is not to be carried out over .Let us now switch gears to model selection. In hypothesis testing, one wishes to determine whether the data is more consistent with hypothesis A (e.g., that a GR waveform correctly models the signal) or with hypothesis B (e.g., that a non-GR waveform correctly models the signal). Using Bayes’ theorem, the PDF for model given the data is

As before, is the prior probability of hypothesis , namely the strength of our prior belief that hypothesis is correct. The normalization constant is given by where the integral is to be taken over all models. Thus, it is clear that this normalization constant does not depend on the model. Similar relations hold for hypothesis by replacing in Eq. (87*).When hypothesis A and B refer to fundamental theories of nature we can take different viewpoints regarding the priors. If we argue that we know nothing about whether hypothesis A or B better describes nature, then we would assign equal priors to both hypotheses. If, on the other hand, we believe GR is the correct theory of nature, based on all previous experiments performed in the solar system and with binary pulsars, then we would assign . This assigning of priors necessarily biases the inferences derived from the calculated posteriors, which is sometimes heavily debated when comparing Bayesian theory to a frequentist approach. However, this “biasing” is really unavoidable and merely a reflection of our state of knowledge of nature (for a more detailed discussion on such issues, please refer to [294]).

The integral over all models in Eq. (88*) can never be calculated in practice, simply because we do not know all models. Thus, one is forced to investigate relative probabilities between models, such that the normalization constant cancels out. The odds ratio is defined by

where is the Bayes factor and the prefactor is the prior odds. The odds-ratio is a convenient quantity to calculate because the evidence , which is difficult to compute, actually cancels out. Recently, Vallisneri [416] has investigated the possibility of calculating the odds-ratio using only frequentist tools and without having to compute full evidences. The odds-ratio should be interpreted as the betting-odds of model over model . For example, an odds-ratio of unity means that both models are equally supported by the data, while an odds-ratio of means that there is a 100 to 1 odds that model better describes the data than model .The main difficulty in Bayesian inference (both in parameter estimation and model selection) is sampling the PDF sufficiently accurately. Several methods have been developed for this purpose, but currently the two main workhorses in gravitational-wave data analysis are Markov Chain Monte Carlo and Nested Sampling. In the former, one samples the likelihood through the Metropolis–Hastings algorithm [314, 221, 122, 367]. This is computationally expensive in high-dimensional cases, and thus, there are several techniques to improve the efficiency of the method, e.g., parallel tempering [402]. Once the PDF has been sampled, one can then calculate the evidence integral, for example via thermodynamic integration [420, 167, 419]. In Nested Sampling, the evidence is calculated directly by laying out a fixed number of points in the prior volume, which are then allowed to move and coalesce toward regions of high posterior probability. With the evidence in hand, one can then infer the PDF. As in the previous case, Nested Sampling can be computationally expensive in high-dimensional cases.

Del Pozzo et al. [142*] were the first to carry out a Bayesian implementation of model selection in the context of tests of GR. Their analysis focused on tests of a particular massive graviton theory, using the gravitational wave signal from quasi-circular inspiral of non-spinning black holes. Cornish et al. [124*, 376*] extended this analysis by considering model-independent deviations from GR, using the parameterized post-Einsteinian (ppE) approach (Section 5.3.4) [467*]. Recently, this was continued by Li et al. [290*, 291], who carried out a similar analysis on a large statistical sample of Advanced LIGO (aLIGO) detections using simulated data and a restricted ppE model. All of these studies suggest that Bayesian tests of GR are possible, given sufficiently-high SNR events. Of course, whether deviations from GR are observable will depend on the strong-field character and strength of the deviation, as well as the availability of sufficiently-accurate GR waveforms.

#### 4.1.3 Systematics in model selection

The model selection techniques described above are affected by other systematics present in data analysis. In general, we can classify these into the following [417*]:

- Mismodeling Systematic, caused by inaccurate models of the gravitational-wave template.
- Instrumental Systematic, caused by inaccurate models of the gravitational-wave response.
- Astrophysical Systematic, caused by inaccurate models of the astrophysical environment.

Mismodeling systematics are introduced due to the lack of an exact solution to the Einstein equations from which to extract an exact template, given a particular astrophysical scenario. Inspiral templates, for example, are approximated through post-Newtonian theory and become increasingly less accurate as the binary components approach each other. Cutler and Vallisneri [127] were the first to carry out a formal and practical investigation of such a systematic in the context of parameter estimation from a frequentist approach.

Mismodeling systematics will prevent us from testing GR effectively with signals that we do not understand sufficiently well. For example, when considering signals from black hole coalescences, if the the total mass of the binary is sufficiently high, the binary will merge in band. The higher the total mass, the fewer the inspiral cycles that will be in band, until eventually only the merger is in band. Since the merger phase is the least understood phase, it stands to reason that our ability to test GR will deteriorate as the total mass increases. Of course, we do understand the ringdown phase very well, and tests of the no-hair theorem would be allowed during this phase, provided a sufficiently-high SNR [65*]. On the other hand, for neutron star binaries or very–low-mass black-hole binaries, the merger phase is expected to be essentially out of band for aLIGO (above 1 kHz), and thus, the noise spectrum itself may shield us from our ignorance.

Instrumental systematics are introduced by our ignorance of the transfer function, which connects the detector output to the incoming gravitational waves. Through sophisticated calibration studies with real data, one can approximate the transfer function very well [4*, 1]. However, this function is not time-independent, because the noise in the instrument is not stationary or Gaussian. Thus, un-modeled drifts in the transfer function can introduce systematics in parameter estimation that are as large as 10% in the amplitude and the phase [4].

Instrumental systematics can affect tests of GR, if these are performed with a single instrument. However, one expects multiple detectors to be online in the future and for gravitational-wave detections to be made in several of them simultaneously. Instrumental systematics should be present in all such detections, but since the noise will be mostly uncorrelated between different instruments, one should be able to ameliorate its effects through cross-correlating outputs from several instruments.

Astrophysical systematics are induced by our lack of a priori knowledge of the gravitational wave source. As explained above, matched filtering requires knowledge of a waveform template with which to filter the data. Usually, we assume the sources are in a perfect vacuum and isolated. For example, when considering inspiral signals, we ignore any third bodies, electric or magnetic fields, neutron star hydrodynamics, the expansion of the universe, etc. Fortunately, however, most of these effects are expected to be small: the probability of finding third bodies sufficiently close to a binary system is very small [463*]; for low redshift events, the expansion of the universe induces an acceleration of the center of mass, which is also very small [468*]; electromagnetic fields and neutron-star hydrodynamic effects may affect the inspiral of black holes and neutron stars, but not until the very last stages, when most signals will be out of band anyways. For example, tidal deformation effects enter a neutron-star–binary inspiral waveform at 5 post-Newtonian order, which therefore affects the signal outside of the most sensitive part of the aLIGO sensitivity bucket.

Perhaps the most dangerous source of astrophysical systematics is due to the assumptions made about the astrophysical systems we expect to observe. For example, when considering neutron-star–binary inspirals, one usually assumes the orbit will have circularized by the time it enters the sensitivity band. Moreover, one assumes that any residual spin angular momentum that the neutron stars may possess is very small and aligned or counter-aligned with the orbital angular momentum. These assumptions certainly simplify the construction of waveform templates, but if they happen to be wrong, they would introduce mismodeling systematics that could also affect parameter estimation and tests of GR.

### 4.2 Burst analyses

In alternative theories of gravity, gravitational-wave sources such as core collapse supernovae may result in the production of gravitational waves in more than just the plus and cross-polarizations [384, 380, 216, 334, 333, 369]. Indeed, the near-spherical geometry of the collapse can be a source of scalar breathing-mode gravitational waves. However, the precise form of the waveform is unknown because it is sensitive to the initial conditions.

When searching for un-modeled bursts in alternative theories of gravity, a general approach involves optimized linear combinations of data streams from all available detectors to form maximum likelihood estimates of the waveforms in the various polarizations, and the use of null streams. In the context of ground-based detectors and GR, these ideas were first explored by Gürsel and Tinto [212*] and later by Chatterji et al. [101*] with the aim of separating false-alarm events from real detections. The main idea was to construct a linear combination of data streams received by a network of detectors, so that the combination contained only noise. Of course, in GR one need only include and polarizations, and thus a network of three detectors suffices. This concept can be extended to develop null tests of GR, as originally proposed by Chatziioannou et al. [102*] and recently implemented by Hayama et al. [228*].

Let us consider a network of detectors with uncorrelated noise and a detection by all detectors. For a source that emits gravitational waves in the direction , a single data point (either in the time-domain, or a time-frequency pixel) from an array of detectors (either pulsars or interferometers) can be written as

Here where is a vector with the noise. The antenna pattern functions are given by the matrix, For simplicity we have suppressed the sky-location dependence of the antenna pattern functions. These can either be the interferometric antenna pattern functions in Eqs. (58*), or the pulsar response functions in Eqs. (69*). For interferometers, since the breathing and longitudinal antenna pattern response functions are degenerate, and even though is a matrix, there are only five linearly-independent vectors [81, 80, 102*, 228*].If we do not know the form of the signal present in our data, we can obtain maximum likelihood estimators for it. For simplicity, let us assume the data are Gaussian and of unit variance (the latter can be achieved by whitening the data). Just as we did in Eq. (80*), we can write the probability of obtaining datum , in the presence of a gravitational wave as

The logarithm of the likelihood ratio, i.e., the logarithm of the ratio of the likelihood when a signal is present to that when a signal is absent, can then be written as If we treat the waveform values for each datum as free parameters, we can maximize the likelihood ratio and obtain maximum likelihood estimators for the gravitational wave, We can further substitute this solution into the likelihood, to obtain the value of the likelihood at the maximum, where The maximized likelihood can be thought of as the power in the signal, and can be used as a detection statistic. is a projection operator that projects the data into the subspace spanned by . An orthogonal projector can also be constructed, which projects the data onto a sub-space orthogonal to . Thus one can construct a certain linear combination of data streams that has no component of a certain polarization by projecting them to a direction orthogonal to the direction defined by the beam pattern functions of this polarization mode This is called a null stream and, in the context of GR, it was introduced as a means of separating false-alarm events due, say, to instrumental glitches from real detections [212*, 101*].With just three independent detectors, we can choose to eliminate the two tensor modes (the plus and cross-polarizations) and construct a GR null stream: a linear combination of data streams that contains no signal consistent within GR, but could contain a signal in another gravitational theory, as illustrated in Figure 4*. With such a GR null stream, one can carry out null tests of GR and study whether such a stream contains any statistically-significant deviations from noise. Notice that this approach does not require a template; if one were parametrically constructed, such as in [102*], more powerful null tests could be applied. In the future, we expect several gravitational wave detectors to be online: the two aLIGO ones in the United States, Advanced VIRGO (adVirgo) in Italy, LIGO-India in India, and KAGRA in Japan. Given a gravitational-wave observation that is detected by all five detectors, one can then construct three GR null streams, each with power in a signal direction.

For pulsar timing experiments where one is dealing with data streams of about a few tens of pulsars, waveform reconstruction for all polarization states, as well as numerous null streams, can be constructed.

### 4.3 Stochastic background searches

Much work has been done on the response of ground-based interferometers to non-tensorial polarization modes, stochastic background detection prospects, and data analysis techniques [299, 323, 191, 329*, 121]. In the context of pulsar timing, the first work to deal with the detection of such backgrounds in the context of alternative theories of gravity is due to Lee et al. [284*], who used a coherence statistic approach to the detection of non-Einsteinian polarizations. They studied the number of pulsars required to detect the various extra polarization modes, and found that pulsar timing arrays are especially sensitive to the longitudinal mode. Alves and Tinto [22*] also found enhanced sensitivity to longitudinal and vector modes. Here we follow the work in [329*, 99*] that deals with the LIGO and pulsar timing cases using the optimal statistic, a cross-correlation that maximizes the SNR.

In the context of the optimal statistic, the derivations of the effect of extra polarization states for ground-based instruments and pulsar timing are very similar. We begin with the metric perturbation written in terms of a plane wave expansion, as in Eq. (50*). If we assume that the background is unpolarized, isotropic, and stationary, we have that

where is the gravitational-wave power spectrum for polarization . is related to the energy density in gravitational waves per logarithmic frequency interval for that polarization through where is the closure density of the universe, and is the energy density in gravitational waves for polarization . It follows from the plane wave expansion in Eq. (51*), along with Eqs. (101*) and (102*) in Eq. (103*), that and thereforeFor both ground-based interferometers and pulsar-timing experiments, an isotropic stochastic background of gravitational waves appears in the data as correlated noise between measurements from different instruments. The data set from the instrument is of the form

where corresponds to the gravitational-wave signal and to noise. The noise is assumed in this case to be stationary and Gaussian, and uncorrelated between different detectors, for .Since the gravitational-wave signal is correlated, we can use cross-correlations to search for it. The cross-correlation statistic is defined as

where is a filter function to be determined. Henceforth, no summation is implied on the detector indices . At this stage it is not clear why depends on the pair of data sets being correlated. We will show how this comes about later. The optimal filter is determined by maximizing the expected SNR Here is the mean and is the square root of the variance .The expressions for the mean and variance of the cross-correlation statistic, and respectively, take the same form for both pulsar timing and ground-based instruments. In the frequency domain, Eq. (109*) becomes

by the convolution theorem, and the mean is then where is the finite time approximation to the delta function, . With this in hand, the mean of the cross-correlation statistic is and the variance in the weak signal limit is where the one-sided power spectra of the noise are defined by in analogy to Eq. (76*), where plays here the role of .The mean and variance can be rewritten more compactly if we define a positive-definite inner product using the noise power spectra of the two data streams

again in analogy to the inner product in Eq. (78*), when considering inspirals. Using this definition where we recall that the capital Latin indices stand for the polarization content. From the definition of the SNR and the Schwartz’s inequality, it follows that the optimal filter is given by where is an arbitrary normalization constant, normally chosen so that the mean of the statistic gives the amplitude of the stochastic background.The differences in the optimal filter between interferometers and pulsars arise only from differences in the overlap reduction functions, . For ground-based instruments, the signal data are the strains given by Eq. (57*). The overlap reduction functions are then given by

where and are the locations of the two interferometers. The integrals in this case all have solutions in terms of spherical Bessel functions [329*], which we do not summarize here for brevity.For pulsar timing arrays, the signal data are the redshifts , given by Eq. (63*). The overlap reduction functions are then given by

where and are the distances to the two pulsars. For all transverse modes pulsar timing experiments are in a regime where the exponential factors in Eq. (121*) can be neglected [30, 99*], and the overlap reduction functions effectively become frequency independent. For the and mode the overlap reduction function becomes where is the angle between the two pulsars. This quantity is proportional to the Hellings and Downs curve [231]. For the breathing mode, the overlap reduction function takes the closed form expression [284*]: For the vector and longitudinal modes the overlap reduction functions remain frequency dependent and there are no general analytic solutions.The result for the combination of cross-correlation pairs to form an optimal network statistic is also the same in both ground-based interferometer and pulsar timing cases: a sum of the cross-correlations of all detector pairs weighted by their variances. The detector network optimal statistic is,

where is a sum over all detector pairs.In order to perform a search for a given polarization mode one first needs to compute the overlap reduction functions (using either Eq. (120*) or (121*)) for that mode. With that in hand and a form for the stochastic background spectrum , one can construct optimal filters for all pairs in the detector network using Eq. (119*), and perform the cross-correlations using either Eq. (109*) (or equivalently Eq. (111*)). Finally, we can calculate the overall network statistic Eq. (124*), by first finding the variances using Eq. (114*).

It is important to point out that the procedure outlined above is straightforward for ground-based interferometers. However, pulsar timing data are irregularly sampled, and have a pulsar-timing model subtracted out. This needs to be accounted for, and generally, a time-domain approach is more appropriate for these data sets. The procedure is similar to what we have outlined above, but power spectra and gravitational-wave spectra in the frequency domain need to be replaced by auto-covariance and cross-covariance matrices in the time domain that account for the model fitting (for an example of how to do this see [162]).

Interestingly, Nishizawa et al. [329*] show that with three spatially-separated detectors the tensor, vector, and scalar contributions to the energy density in gravitational waves can be measured independently. Lee et al. [284] and Alves and Tinto [22] showed that pulsar timing experiments are especially sensitive to the longitudinal mode, and to a lesser extent the vector modes. Chamberlin and Siemens [99] showed that the sensitivity of the cross-correlation to the longitudinal mode using nearby pulsar pairs can be enhanced significantly compared to that of the transverse modes. For example, for the NANOGrav pulsar timing array, two pulsar pairs separated by result in an enhancement of 4 orders of magnitude in sensitivity to the longitudinal mode relative to the transverse modes. The main contribution to this effect is due to gravitational waves that are coming from roughly the same direction as the pulses from the pulsars. In this case, the induced redshift for any gravitational-wave polarization mode is proportional to , the product of the gravitational-wave frequency and the distance to the pulsar, which can be large. When the gravitational waves and the pulse direction are exactly parallel, the redshift for the transverse and vector modes vanishes, but it is proportional to for the scalar-longitudinal mode.

Lee et al. [285] studied the detectability of massive gravitons in pulsar timing arrays through stochastic background searches. They introduced a modification to Eq. (59*) to account for graviton dispersion, and found the modified overlap reduction functions (i.e., modifications to the Hellings–Downs curves Eq. (122*)) for various values of the graviton mass. They conclude that a large number of stable pulsars () are required to distinguish between the massive and massless cases, and that future pulsar timing experiments could be sensitive to graviton masses of about (). This method is competitive with some of the compact binary tests described later in Section 5.3.1 (see Table 2). In addition, since the method of Lee et al. [285] only depends on the form of the overlap reduction functions, it is advantageous in that it does not involve matched filtering (and therefore prior knowledge of the waveforms), and generally makes few assumptions about the gravitational-wave source properties.