In statistics the formal procedure to separate voxels that contain noise from those that also contain signal is to perform a hypothesis test on the magnitude and phase. In hypothesis testing there are four possible outcomes. One can imagine the usual 2×2 table where the first row is labeled “Reject H_{0},” the second row is labeled “Do Not Reject H_{0},” the first column is labeled “H_{0} True,” and the second column is labeled “H_{0} False.” In the top right and bottom left cells correct decisions are made. In the top left cell a Type I error is made in which the null hypothesis is rejected when it is true. The probability of a Type I error is the level of significance denoted by α. In the bottom right cell a Type II error is made in which the null hypothesis is not rejected when it is false. The probability of a Type II error is the level of significance denoted by β. These two error rates will be examined in more detail later.

Formally, this voxel separation procedure can be achieved by testing null and alternative hypotheses H0: ρ=0, θ=0 versus H

_{1}: ρ>0, θ≠0 where ρ is the population magnitude and θ is the population phase. In general, hypotheses can be evaluated with test statistics that are derived via a likelihood ratio statistic [

6,

7]. Under the constrained null hypothesis H

_{0}: ρ=0, θ=0, the maximum likelihood estimators (MLEs) for the magnitude and phase from

Eq. (4) are

while under the unconstrained alternative hypothesis H

_{1}: ρ>0,θ≠0 the MLEs from

Eq. (4) are

where

is the mean of the real channel measurements and

is the mean of the imaginary channel measurements. These estimates in

Eq. (5) and

Eq. (6) are then inserted back into the likelihoods and the ratio

taken. Under some regularity conditions (that are satisfied in this case), − 2log(λ) is asymptotically χ

^{2} distributed with degrees of freedom equal to the difference in the number of estimated parameters between H

_{0} and H

_{1}, (two in this case).

Upon insertion of the MLEs in

Eq. (5) and

Eq. (6) back into the likelihood function in

Eq. (4), the ratio of null hypothesis likelihood over alternative hypothesis

taken. Some terms in the likelihood ratio can be cancelled to become

However, algebra can usually be performed to simplify this into a statistic with known distribution under the null hypothesis. This procedure can be applied to the general linear model to derive the usual t and F statistics. Applying this procedure here, the test statistic

can be arrived at as

*F* = (1−λ

^{1/n}). The probability distribution of this statistic in

Eq. (8) needs to be found. The steps through the logic for the derivation of the distribution for the numerator and the denominator of the

*F* statistic are shown below in .

| **Table 1**Derivation of the distribution of the chi square statistics. |

Since the numerator and denominator terms

*x*_{1} and

*x* are χ

^{2} distributed with 2 and 2

*n* degrees of freedom, dividing them by their degrees of freedom and taking the ratio should result in a statistic that under the null hypothesis has an F distribution with 2 and 2

*n* degrees of freedom. However, this is not true in this case. These two χ

^{2} statistics must be statistically independent for this to be true. The correlation between these two statistics can be derived. First, since these are χ

^{2} distributed, their expectation is their degrees of freedom and their variances are twice their degrees of freedom

*E* (

*x*_{1}) = 2,

*E* (

*x*_{2}) = 2

*n*, var (

*x*_{1}) = 4, var (

*x*_{2}) = 4

*n*. The covariance can be found as cov (

*x*_{1},

*x*_{2}) =

*E* (

*x*_{1} ·

*x*_{2}) −

*E* (

*x*_{1})

*E* (

*x*_{2}), where

The correlation between

*x*_{1} and

*x*_{2} is now

, which is

. This correlation tends to zero in large samples (large

*n*) and the

*F* statistic becomes F distributed with two numerator and

*2n* denominator degrees of freedom. In non-fMRI applications where there may be a very small number of repeated images if any, this asymptotic result does not hold. Thus, critical values from the F distribution do not apply to this

*F* statistic. However, critical values for small

*n* can be achieved by way of Monte Carlo simulation. For a given level of significance (Type I error rate α), we reject H

_{0} (do not threshold voxel) if the test statistic

*F* is larger than the critical value

*F*_{α}(

*2,2n*) and do not reject (threshold voxel) if

*F* is smaller than the critical value

*F*_{α}(

*2,2n*).

To examine the convergence of the distribution of the

*F* statistic to the F distribution, one million simulated data sets was created under the null hypothesis (ρ=

*0* and θ=

*0*) for

*n*=

*5, 9, 25, 50, 100*, and

*250*. Normally distributed independent noise variates were generated for the real and imaginary parts with a variance of σ

^{2}=

*1*. This corresponds to a signal-to-noise ratio (SNR) of zero. It was found (but not shown), that regardless of the sample size

*n*, the two statistics

*x*_{1} and

*x*_{2} in that make up the numerator and denominator are χ

^{2} distributed. Additionally, the

correlation between the numerator and denominator χ

^{2} statistics was verified. For each sample size

*n*, the

*F* statistic was computed for each of the data sets. Cumulative distribution functions (CDFs) from the Monte Carlo empirical distributions (dashed curves) and corresponding CDF for the F distribution (solid curves) are presented in for

*n=5* (red),

*9* (black),

*25* (yellow),

*50* (green),

*100* (blue),

*250* (violet). Note the disparity between the Monte Carlo empirical distributions (dashed curves) and corresponding CDF for the F distribution (solid curves). One can see that it takes a relatively large sample size for this disparity to decrease. Therefore for small samples, Monte Carlo critical values need to be used.

For our application, we will be utilizing the test statistic denoted by

*F* for a sample of size

*n*=

*9*. The Type I error rate was examined in for several values of

*n*. However, the Type II rate is also of interest. To examine the Type II error rate, one million simulated data sets for

*n*=

*9* were created under the alternative hypothesis (ρ≠0 and θ≠0) with ρ=(

*0,1,2,3,5*) and θ=

*0*°. Normally distributed independent noise variates were generated for the real and imaginary parts with a mean of zero and a variance of σ

^{2}=

*1*. Pairs of histograms (not shown) for the one million data sets when ρ=0 and each of the alternative hypothesis cases ρ=(

*1,2,3,5*) were formed. A vertical line denoted by

*F*_{α} is drawn with the false positive rate α being the fraction of variates in the ρ=0 histogram that are larger than the vertical line. The true positive rate

*1*−α is the fraction of variates in the ρ=0 histogram that are smaller than the vertical line at

*F*_{α}. The fraction of variates in the alternative hypothesis case (ρ=

*1,2,3,5*) larger than the vertical line at

*F*_{α} is the true negative rate

*1*−β. The false negative rate β is the fraction of variates in the alternative hypothesis case (ρ=

*1,2,3,5*) smaller than the vertical line at

*F*_{α}. By sliding this vertical line left and right for varying false positive rate areas α to the right of it, we can determine the true positive rate

*1*−β. A plot of α on the horizontal axis and

*1*−β on the vertical axis is called a receiver operating characteristic (ROC) curve [

8]. This curve can be made for each combination of ρ and θ. In are the ROC curves for ρ=(

*0,1,2,3,5*) and θ=

*0*°. When the θ values are not zero, minor variations from the curves for the corresponding ρ value are produced.

To determine more accurate critical values in the upper tail of the *F* statistic for *n=9*, 5×10^{7} data sets were generated. For each set, the F statistic was computed. A histogram of these *F* statistics for *n=9* is presented in . These *F* statistics were ordered and percentiles determined. For example, the 0.95×(5×10^{7})^{th} largest value was taken as the 95th percentile. Selected critical values are presented in for *n=5* and *n=9*. These critical values will be used for thresholding the magnitude and phase of voxels. Additional critical values can reliably be interpolated for significance levels between those that have been presented.

| **Table 2**Critical *F* statistic values for *n=5* and *n=9*. |

From a statistical point of view, we would like to have *n* repeated images. However, we are interested in thresholding high-resolution anatomical images where replicated images are rarely available. We will use the observed magnitude and phase values in each voxel and its surrounding voxels. Each voxel and its eight neighbors (*n*=9) will be used to estimate the magnitude and phase of every voxel with image wrap around. Alternatively, each voxel and its four neighbors (*n*=5) can be used. The estimated values for each voxel will be used to compute the previously described *F* statistic. The map of these *F* statistics is thresholded with the critical values in . A zero-one mask is produced from the thresholded *F* statistics then applied to the original magnitude and phase images.

Since only a single image is available, using a voxel and its neighboring voxel will result in very minor local correlation in the

*F* statistics. A statistical critical value for a given α level might be slightly lower than the values we have used from the Monte Carlo simulation. To investigate this phenomenon, additional simulations were performed where images were created and the

*F* statistics computed using a voxel and its four neighbors. Images of size 352×512 and 1000×1000 were created where each voxel had a magnitude of ρ=(

*0,1,2,3,5*) and a phase of θ=

*0*°. Normally distributed independent noise variates were generated and added to the real and imaginary parts with a mean of zero and a variance of σ

^{2}=1. ROC curves were produced for each magnitude (equivalently SNR) value. The fraction of true positives and false positives were determined. The false positive α and false negative rates β were consistent with previous values and yielded visually identical ROC curves as seen in . Simulations were also performed in which 512×512 images were produced with a circle of radius 128 pixels containing the same magnitudes of ρ=(

*0,1,2,3,5*) and a phase of θ=

*0*°. True and false positive rates were also consistent as were ROC curves. The very small local correlation yields the same critical values in and does not affect the global image threshold as previously reported [

9]. If one were extremely concerned about a potential change in the image threshold due to local correlations (particularly in small images), one could perform larger simulations for more accuracy or use a thresholding method that incorporates a neighborhood about the voxel of interest [

10].