In this work we have developed a segmentation method specifically designed for the cerebral cortex. We evaluated the robustness and accuracy of the segmentation and PV estimation and the ability to directly use the segmentation for cortical thickness estimation on synthetic and real data.
In Atlas dependency study section, a study testing for atlas independence was performed on real data from the ADNI database in order to evaluate the efficacy of the prior relaxation. When segmenting the datasets using two normal population atlases, an algorithm that is less dependent on the prior probability would produce two closely matching segmentations. As expected, the results show that when priors derived from a control population are applied to a control group, there is no change in the average dice score, since the atlas is representative of that specific population. However, when a control population atlas is applied to an AD population, an increase of the relaxation factor has a positive effect on the segmentation overlap. Although the difference is not significant, there is an upward trend on the average and a decrease on the standard deviation of the Dice score distributions. This shows that after prior relaxation, the segmentations become more similar, and thus, less dependent on the priors. Visual assessment shows a noticeably better segmentation in the ventricle area of the AD patients, mainly when the ventricles are expanded (see ). This is caused by the spatial ambiguity when the ventricle edge is close to the cortical GM. A higher relaxation factor also produces a visually better temporal lobe segmentation when these are highly atrophied. Overall, the extra knowledge introduced in the prior relaxation step by the neighbouring tissue structure leads to reduced bias, resulting in less ambiguity regarding miss-segmented areas due to different anatomy.
A second experiment showed that the proposed improvements can help to accurately extract meaningful thickness measurements from the segmentation. Using a digital phantom created specifically for this purpose, the average thickness was measured with the proposed method, without the refinement steps (MAP with MRF), and just using the intensity component of the model (MAP without MRF). The results are displayed in . Consistent results were found for both low and high noise cases. An unweighted MRF causes an overestimation of the thickness and standard deviation due to the lack of detail in highly convolute and PV corrupted areas. Without any type of MRF, the thickness measurements are much more prone to noise, leading to a number of short paths to mis-segmented voxels and consequently an artificial increase of the standard deviation of the measurement. Oddly, when the noise level is high, the presence of short paths combined with the lack of detail leads to a more accurate estimate of the average thickness. However, because the standard deviation is much higher than expected, this measurement lacks precision.
In Segmentation evaluation section, the Dice score and PV estimation accuracy were evaluated using BrainWeb data. The proposed method and FAST both showed higher PV estimation accuracy than SPM8 and FANTASM. This is most probably due to the MRF smoothing properties that make the PV estimation more robust. Also, the MRF will ensure a more robust assignment of voxels surrounded by only one tissue class, thus making the posterior probabilities more closely resemble partial volume fractions. The small Dice score improvement of the proposed method can be explained by the adaptive nature of the MRF in areas close to sulci and gyri, increasing the level of detail whilst maintaining robustness to noise. On the other hand, due to the lack of adaptivity in FAST's MRF, some of the details are lost, leading to worse PV estimation when compared to the proposed method. SPM8 underperforms both FAST and the proposed method with regards to PV estimation accuracy. We speculate that for cortical segmentation specifically, the advantages of having an iterative segmentation/registration procedure may not compensate for the lack of MRF. Finally, even though FANTASM is tolerant to noise, it does not model noise implicitly. This might explain the small underperformance with regards to Dice score of FANTASM over the other methods for low PV differences. The difference between FANTASM and the proposed method becomes smaller for difference values above 0.3.
The proposed method achieved significantly higher Dice scores when compared to FAST, SPM and FANTASM. We hypothesise that the improved overlap between structures is probably due to the enhanced delineation of the sulci and gyri and implicit PV modelling. Also because these improvements are iteratively fed back into the segmentation, there is a gradual reduction of the PV related parameter bias. One might also conclude that SPM outperforms FAST in terms of Dice score due to the iterative segmentation/registration procedure, improving the overlap of the segmented structures. Another explanation might be the lack of spatial adaptiveness in FAST's MRF, as the MRF tends to minimize the boundary length between tissues which discourages classifications from accurately following the highly convoluted shape of the human cortex. For the proposed method, this problem is reduced as the MRF is spatially adaptive.
In the fourth experiment, using ADNI data, we compared three segmentation methods in terms of group separation between control subjects and Alzheimer's Disease (AD) diagnosed patients. Using the proposed segmentation we see a statistically significant, clinically-expected pattern of difference in cortical thickness between AD patients and controls. Although most of the same expected areas are also statistically significant when using FAST's segmentation, there is a less symmetric pattern of atrophy and some of the expected areas (i.e. right and left middle frontal gyrus) do not achieve statistical significance. This is probably caused by the lack of detail due to the use of a stationary MRF. When using SPM, there is a noticeable overall decrease of statistical significance throughout the brain, with only the middle and inferior temporal areas achieving statistical significance. This is again caused by the lack of detail, mostly due to the need for an artificial threshold to separate pure from non-pure voxels. This shows how important the presence of an MRF is when segmenting the cortex. Throughout the literature, the vast majority of clinical studies have been carried out using surface-based cortical thickness techniques (Lerch et al., 2005
; Du et al., 2007
; Lehmann et al., in press
; Rosas et al., 2008
; Nesvåg et al., 2008
; Salat et al., 2004
) with a few using voxel-based techniques (Querbes et al., 2009
). Both methods depend on the segmentation step; however, for surface-based techniques, the segmentation is only used as an initialisation for a surface mesh. The mesh is typically deformed to fit the cortical GM/WM boundary and expanded outwardsto the GM/CSF boundary. This gives surface-based methods sub-voxel accuracy and robustness to noise. However, due to smoothness and topology constraints, it is difficult to correctly fit the surface to very complex shapes thus requiring laborious manual corrections. Additionally, the implicit surface modelling can lead to bias in the thickness measurements (MacDonald et al., 2000
; Kim et al., 2005
). Conversely, voxel-based techniques can potentially cope with any topology or shape because they work on the 3D voxel grid. However, these techniques were never specifically tailored for the highly convoluted shape of the cortex. The proposed segmentation method improves the quality and topology of the cortical segmentation and enhances the detection of PV corrupted sulci and gyri, enabling the direct use of the segmentation for cortical thickness as opposed to requiring post-processing techniques (Hutton et al., 2008
; Lohmann et al., 2003
; Acosta et al., 2009
In this paper, the focus has been on accurate segmentation specifically for the cortex and how can that directly influence the thickness measurements. We have not compared cortical thickness results with other cortical thickness algorithms. We consider that the comparison with other cortical thickness estimation methods is necessary in order to validate the segmentation method for cortical thickness estimation. However, such a comparison requires voxel-based and surface-based measurements to be brought together in a common space, which is difficult to achieve without bias towards either approach. For this reason, we believe that comparison to surface-based methods is out of the scope of the paper. Future work will compare voxel-, registration- and surface-based cortical thickness estimation techniques.
On a methodological side, future work will investigate the use of Variational Bayes inference and hyperparameter optimisation in a similar way to (Woolrich and Behrens, 2006
), enabling an unification of the segmentation framework. Furthermore, we would also like to explore the use of topological constrains on the space of solutions in order to obtain a topologically correct segmentation of each structure.