|Home | About | Journals | Submit | Contact Us | Français|
The aim of this paper is to apply a non-parametric statistical tool, Ripley's K-function, to analyze the 3-dimensional distribution of pyramidal neurons. Ripley's K-function is a widely used tool in spatial point pattern analysis. There are several approaches in 2D domains in which this function is executed and analyzed. Drawing consistent inferences on the underlying 3D point pattern distributions in various applications is of great importance as the acquisition of 3D biological data now poses lesser of a challenge due to technological progress. As of now, most of the applications of Ripley's K-function in 3D domains do not focus on the phenomenon of edge correction, which is discussed thoroughly in this paper. The main goal is to extend the theoretical and practical utilization of Ripley's K-function and corresponding tests based on bootstrap resampling from 2D to 3D domains.
The study of anatomy and its relation to function requires that we can quantify the observed anatomical parameters. Just as we can quantify changes in functional properties such as synaptic strength, the study of neuronal circuits will benefit from quantifications of the topographical organization of neurons. Examples of such anatomical features that have been quantified with regard to the spatial distribution includes the bundling of the apical dendrites from layer 5 pyramidal cells (Skoglund et al., 2004; Vercelli et al., 2004; Krieger et al., 2007), the vertical organization of cell bodies in human cortex (Buxhoeveden et al., 2000), amacrine cells in the retina (Diggle, 1986; Costa et al., 2007), peripheral nerve organization (Prodanov et al., 2007) and the location of ponto-cerebellar neurons (Bjaalie et al., 1991). These studies give examples of how the analysis of the spatial distribution is important for testing hypothesis on development, pathological reorganization and information processing in a cortical column.
With new emerging methods that allow us to genetically target cell types with fluorescent markers (Gong et al., 2003; Groh and Krieger, 2010) and subsequently reconstruct their distribution in 3 dimensions we need mathematical tools to analyze these data. Recent efforts in the development of high-resolution 3D digital atlases require the development of analytical tools for the analysis of the data. The digital atlases provide remarkable tools for visualization, but the fundamental advantage of the digital era in anatomy is the use of mathematical tools to analyze large datasets. The methods used for analyzing the spatial distribution of neurons and their processes must thus be expanded to 3D space.
The distribution of neurons can be described as a spatial point pattern, a multi-dimensional stochastic process which can be analyzed with mathematical methods (Baddeley et al., 1993; Diggle, 2003; Illian et al., 2008). In the 2D space the spatial distribution of neurons and neuronal processes have been analyzed using methods such as nearest-neighbor analysis, Voronoi tessellation and Ripley's K-function (Duyckaerts et al., 1994; Duyckaerts and Godefroy, 2000; Prodanov et al., 2007).
Diggle (2003) classifies spatial point patterns in three main classes, namely aggregation (clustering), regularity (inhibition) and complete spatial randomness (CSR). The corresponding theoretical framework to simulate these classes uses Poisson cluster, simple Poisson inhibition and homogenous Poisson process, respectively. In 2D, in theory and in practice, both Voronoi tessellation and Ripley's K-function can be used to analyze spatial organization of events (Prodanov et al., 2007). Ripley's K-function has been, since its introduction in 1976 (Ripley, 1987), used frequently in 2D applications in a wide variety of fields. Attempts have been made at applying 3D Voronoi analysis (Eglen et al., 2008) and Ripley's K-function (Baddeley et al., 1993) to 3D image analysis. The major obstacles when extending Voronoi tessellation to 3D domains are the decomposition of space into polyhedrons and the evaluation of their volumes. Another obstacle is the absence of inferential tools (such as the bootstrap) for Voronoi decompositions. Extending Ripley's K-function from 2D to 3D is from a theoretical point of view straightforward. The real remaining challenge, in the practical aspect, is introducing a valid edge correction term. None of the edge correction terms used in 2D (Ripley, 1987; Stoyan and Stoyan, 1994; Diggle, 2003) can directly be extended and applied to a point pattern process in 3D. Baddeley et al. (1993) proposed three edge correction terms operating in 3D domains. The similarities of one of these correction terms to the one proposed in the present paper will be discussed. Yet another approach is the usage of a probabilistic edge correction term as introduced by Doguwa and Upton (1988) and employed by Reed and Howard (1997). Ripley's K-function can also be used in 3D in a local (non-domain exceeding) form in which case there is no need for an edge correction term (Beil et al., 2005; da Silva et al., 2008). In the present paper we describe one practical relevant procedure for using Ripley's K-function in 3D using numerical evaluations that can be run on a computational platform.
The described screening analysis tool was used to investigate the 3D distribution of two different types of layer 5 pyramidal neurons. These cell types and the acquisition of the data has been previously described in Groh et al. (2010). Briefly, the distribution of layer 5 (L5) etv-expressing pyramidal neurons (etv-pyramids) and L5 glt- expressing pyramidal neurons (glt-pyramids) was determined by manual cell-counting of all EGFP labeled cells in a given area of layer 5 in mouse somatosensory barrel cortex in 50 to 100-μm thick slices.
An essential assumption needed in order to obtain reliable results when samples are processed through Ripley's K-function with constant intensity, is the assumption of stationarity. Spatial intensity is generally defined by:
where N(V) denotes the number of events in a region V. λ(x) in Eq. 1Section “Stationarity” is called the intensity function and called constant intensity or simply intensity as represented in Eq. 2. When stationarity exists the distribution of the overall pattern is invariant under translation and regarded from any fixed point follows the same probability law. Hence the constant value regardless of x.
The assumption of stationarity, as seen above, implies a constant intensity. Although having an intensity function liberates us from adjusting the raw samples, it would indeed imply difficulties in practice due to its direct involvement in the estimation of K- function as we shall see later. Intensity functions have, however, been used and discussed as in Stoyan and Stoyan (2000) and Tscheschel and Chiu (2008).
The assumption of isotropy implies invariance under rotation. Loosely speaking, under isotropy the spatial point pattern of interest appears the same in all directions.
Ripley's K-function is a quantitative tool in assessing the structure of the underlying point pattern in a sample. It has a non-parametric nature and does not rely on prior assumptions on the distribution family of samples (we still have to assume stationarity and isotropy due to our desire to employ constant intensity). Ripley's K-function, regardless of the domain where it is applied, can simply be stated as
We estimate a constant total cell intensity using where n is the observed number of cells in region V, which is in fact the unbiased maximum likelihood estimator if the underlying process fulfills the assumptions of complete spatial randomness (CSR) (Møller, 2003). Intuitively the K-function in 3D is estimated by:
where |V| is the volume of the sample domain of interest, Vt is a spherical neighborhood of radius t, n is the number of events in the sample, ei(t) is the edge correction term for event i (see Edge Correction in 3D), I[·] is the indicator function and D(i,j) represents the Euclidean distance from cell i to cell j.
The expected value of in 3D is:
Throughout this paper we choose to deliver our graphic representations of the estimated K-functions in form of due to the stronger distinguishing factor of this representation when there are deviations from CSR
To acquire the variance of estimated K-functions, we use the bootstrap procedure (see Estimating the Variance of the K-function Using Bootstrap) rather than establishing a theoretical framework for this. There are, nevertheless, numerous ways of representing the variance of theoretically in 2D (Diggle, 2003), none of which is naturally extendable to 3D.
Our interest is not only to observe possible deviations from CSR but to quantify the deviations from CSR in a statistically consistent manner. Since the geometry of samples in general differ to some point, we choose to design individual two-sided null hypothesis corresponding to the geometry of the sample in question. Loosely formulated, the hypothesis is:
After conducting s simulations of CSR-distributed point patterns identical in geometry to the samples of interest, the corresponding obtained estimates of the K-functions are processed to return a probability density function for each t. That is t probability density functions for each point pattern obtained from s simulations.
Acquiring a p-value for the two-sided test, would then be a simple matter of locating on the corresponding probability density function and calculating the area under the curve (see Figure Figure11).
To estimate the variance of the K-functions we adapt the method described by Diggle (2003) to estimate the K-function using replicated data by the bootstrap (Efron and Tibshirani, 1993). Here is how we proceed:
where r is the number of point patterns in each group.
where the are chosen randomly with replacement from the set of all Ri(t): i=1,…,r.
This is repeated, say, 1000 times in our applications and results in an equal number of -functions for each type/group.
The variance of for different values of t is used as an estimate of V ar.
The lower and upper boundaries of the 95% confidence intervals are then constituted by the 25th smallest and 25th largest -functions respectively.
In a test for between-group comparisons, the null hypothesis states that the samples of interest follow the same underlying distribution. The test focuses on significant deviations of the estimated K- functions from a weighted average over the entire group/type (Eq. 7) under repeated sampling. This test is argued to be valid even when the underlying intensities of the estimated K-functions vary within or between groups (Diggle, 2003). To start, we let nij denote the number of events in sample j in group i, the corresponding estimated K-functions and the group estimator according to Eq. 6. We define , our weighted average, as:
The null hypothesis is thus:
where K(t) is estimated by Eq. 7 and Ki(t) by . The statistic suggested by Diggle for this hypothesis is:
where BTSS is the abbreviation for ‘between-treatment sum of squares’ and w(t) is a weight function (we use w(t)=t−2 in our applications). This particular choice of weight function is believed to counter the increasing variance of the K-function for increasing values of t in an effective way.
To proceed, we follow a procedure somewhat similar to the one given in Section “Estimating the Variance of the K-function Using Bootstrap”. Having the notations given earlier intact, we define the residual K-function for sample j in group i as:
We then introduce a set of resampled K-functions under H0 as
where is obtainted by random sampling with replacement from the set of Rij(t):j=1,…,p and i=1,…,r. We then establish BTSSk:k=1,…,s from the according to Eq. 9 and compare our observed value with this set (using a probablity density function from the resampling). We choose to apply this resampling procedure s=1000 times where every resampling procedure leads to estimations of bootstrapped K-functions.
Our interest in this study is to detect deviations from CSR, a model-based study, and to measure the difference between groupwise deviations using inferential statistics, a design-based study using bootstrap.
The brain slice samples are approximately 50-μm thick. Since the analyzed neurons have a cell diameter of 15–20μm the process of edge correction (see Edge Correction in 3D) is sensitive to false geometrical assumptions. One other challenge here is the non-stationary nature of our samples.
Due to the geometry of the samples the assumption of stationarity is vital for the reliability of our results. Assuming a common underlying process in each group (etv-pyramids and glt-pyramids; Groh et al., 2010) to justify partitioning, the following procedure is meant to enhance the assumption of stationarity (see Figure Figure22 for a graphic representation of this procedure):
Due to time efficiency and given that the rotation transformation is performed in 2D, extensive simulations of the K-function in 2D confirmed the satisfactory output of this procedure.
Edge correction in 3D is vastly different and comes with more limitations than in 2D. The idea is, nevertheless, the same as in 2D where only those parts of a sphere's volume outside the sample domain are measured when evaluating Ripley's K-function according to Eq. 4 in Section “Ripley's K-function”.
Baddeley et al. (1993) proposed three edge correction terms of which the ‘isotropic’ term bears some similarities with the edge correction term we propose in this paper. The ‘isotropic’ edge correction term proposed by Baddeley et al. (1993) is ei(t)= w(i,j)s(|i− j|),i≠ j where w(i,j) is an edge correction equal to the proportion of the surface area of the sphere with center at i and radius |i−j| within the sample domain and s(·) is a global geometry correction. The derivation of these terms is demonstrated comprehensively in Baddeley et al. (1993). One distinguishing difference between the ‘isotropic’ edge correction term and the edge correction term outlined in the present paper is the absence of a term treating the surface area of the spherical neighborhoods. Our edge correction operates solely on the volumes of those parts of the spheres outside the sample domain. This and other distinguishing factors between our and the ‘isotropic’ edge correction term are discussed in the Section “Discussion” of this paper.
We establish the following edge correction term for event i with coordinates (xi,yi,zi):
where ci(t) is the outside-of-sample volume of a sphere with radius t intercepted by one plane (cap), wi(t) is the outside-of-sample volume of a wedge caused by the interception of two planes and si(t) is the outside-of-sample volume of a semi-wedge caused by the interception of three planes. These volumes are calculated using analytical formulas and integrals where the latter are computed numerically by the quad function in MATLAB.
Volume of a cap, using the notations in Figure Figure3,3, is evaluated according to:
As for a wedge, using the notations in Figure Figure33:
Semi-wedges occur when the spherical neighborhood is centered around an event in the corner of the domain. Taking these semi-wedges into our calculations would liberate us from eliminating the corner-events and naturally increase the number of events to be observed and thus enhance the edge correction term in accuracy. The cost-benefit analysis here is, however, subjective and depends primarily on the geometry of samples and the contribution of corner events to the overall accuracy, and the performance of the computing device. The volume of a semi-wedge can be expressed by:
where, keeping the notations of Figure Figure33 intact, g is the Euclidean distance from the center of the sphere to a plane, parallel to the page, cutting the sphere. The edge correction term in Eq. 11 is outlined in details in the Section “The Edge Correction Term” in Appendix.
Thus, the edge correction term ei(t) in Eq. 11 consists of 6 possible analytical terms and 20 possible integral terms of which 12 are wedge volumes and 8 are semi-wedge volumes. The computational efficiency of this method depends, to a large extent, on a few intuitively obvious factors such as the upper limit of t, the number of events in the sample and the performance of the computing device.
Figures Figures4A,B4A,B represent the obtained plots from simulations of under CSR without any edge correction term. Figures Figures4C,D4C,D represent the simulations with the edge correction term in Eq. 11. The geometries of the sample domains are identical to the samples used in the 3D application after executing the steps in Section “Data Management”. Clearly, the absence of an edge correction term results in a display of inhibition by the K- function as it regards the ‘empty space’ outside the sample domain as a part of the domain. The simulation results in Figure Figure44 confirm the satisfactory performance of the established edge correction term where the K-function estimations subtracted by the expected value (4πt3/3) are expected to be centered around 0.
Note that the K-function has been evaluated for t=0,…,60μm. Ripley (1987), Diggle (2003) and Costa et al. (2007) recommend a maximum threshold for t equal to the 0.25 of the shortest side length. This is partially the reason why we use simulations to have an understanding of how t-values larger than the one recommended influence the outcome of the K-function. How exceeding the recommended threshold for t influences our results and interpretation is outlined in the Section “Discussion”.
Having performed the steps in Section “Data Management” and established a consistent edge correction term, the samples fulfill the prerequisites to be analyzed by Ripley's K-function. Figure Figure55 represents the estimated K-function values subtracted by its expected value for etv- and glt-pyramids. Values larger than zero given by suggest aggregated underlying point patterns.
In order to investigate the significance of the observed deviations from CSR, the p-values from the hypothesis tests of CSR, the two-sided null hypothesis in Eq. 5, obtainted from 500 CSR point pattern simulations, are given in the table below. Figure Figure66 present an arguably more useful representation of these p-values.
Given that the results obtained from the simulations allow more variance for larger values of t, the etv- and glt-pyramids display significant tendencies towards clustering for values larger than t=47μm and t=38μm, respectively. Loosely interpreted in a geometrical sense, this expresses a more intense clustering in glt-pyramid samples, or alternatively, a higher frequency of empty spaces in the spatial point pattern underlying the glt-pyramids.
To investigate the fact that the glt-pyramids incline towards significant aggregation at lower values of t than the etv-pyramids, we will employ the BTSS statistic presented in Between-group Comparisons in Section “Between-group Comparisons”.
We estimate the variance of the underlying K-functions applying the theoretical framework presented in Section “Estimating the Variance of the K-function Using Bootstrap”. Figure Figure77 represents the variance of the estimated K-functions based on bootstrap resampling. The figure illustrates rather similiar variance functions for etv- and glt pyramids which is reasonable (but not necessary) justification to compare these two groups using the BTSS statistic outlined in Section “Between-group Comparisons”. The estimated variance is also used to establish confidence intervals as seen in Figure Figure88.
The p-values represented below are obtained using the test statistic BTSS and the weight function w(t)= t−2 as represented in Eq. 8 in Section “Between-group Comparisons”. The probability density function of the BTSS test is represented in Figure Figure9.9. A look at the long-tailed density function reveals the large variance of this test statistic. This large variance corresponds to the large variability of the samples in the etv- and glt-pyramids which produce a test sensitive only to relatively large between-group deviations; hence the large p-values.
The analysis in general reveals the aggregated point pattern distribution of the etv- and glt-pyramid samples. This aggregation is significant for values greater than 47μm among the etv-pyramids and values over 38μm among the glt-pyramids. The seemingly stronger inclination of glt-pyramids to demonstrate aggregated behavior is, however, insignificant.
We present a framework in which Ripley's K-function and a corresponding edge correction term can be used to analyze the spatial distribution of events (such as neurons) in 3D space. We exemplify the use of this method by analyzing the distribution of genetically labeled layer 5 pyramidal cells in mouse somatosensory barrel cortex.
The assumption of stationarity after adjusting and dividing the initial samples into smaller ones turned out to be very solid. This is, as emphasized earlier, a very important corner stone of our theoretical design when we wish to use a constant intensity in the estimation of the K-function. The assumption of isotropy, however, was believed to be fulfilled in the default setting of the samples. Employing the K-function in a global form implies establishing a solid edge correction term to avoid underestimating the number of events in the underlying point patterns. The edge correction term in Eq. 11 is constructed to neutralize the false assumption of point pattern vacancy when the observed space is outside the sample domain. The correction is done for every single event at a range of distance values, t, and is therefore called a local edge correction term (not to be confused with the local (non-domain exceeding) form of Ripley's K-function). By subtracting the volumes outside the domain, the edge correction term determines what proportions of the spherical neighborhoods are inside the domain. A large part of these volumes are evaluated using numerical methods in MATLAB (the quad function).
The edge correction method outlined in the present paper bears some similarities in its theoretical design to the ‘isotropic’ edge correction term described by Baddeley et al. (1993) [and subsequently applied in Eglen et al. (2008)]. The differences, however, are firstly that our approach produces the volume integrals directly from the theoretical framework without any transformations or other intermediate calculations. Secondly, in contrast to the ‘isotropic’ edge correction method we do not need to include a term for the proportion of the surface area in the edge correction term. Our edge correction term is consistent and leads to an unbiased estimate of the K-function according to the simulations. An improvement of the correction term could be made along the lines established by Ward and Ferrandino (1999), where the edge correction term is evaluated globally for each value of t.
The recommendation on what maximum value of t is reasonable to employ in the estimation of the K-function (Ripley, 1987; Diggle, 2003), is an additional reason why we use simulations to have an understanding of how t-values larger than the one recommended influence the outcome of the K-function. For this purpose, the simulations were done on two sets (500 samples in each) of samples constituted of Poisson-distributed spatial point patterns. These simulated samples follow the same geometry and point pattern intensity of the etv- and glt-samples on which we conducted the analyses. What is observed is that as the value of t increases, the K-function tends to indicate inhibition (Figures (Figures4A,B).4A,B). Bearing this tendency to inhibition for large t-values in mind, and given that the K-functions obtained from the analysis on the etv- and glt-samples indicate aggregation, we draw the conclusion that using the K-function for values larger than 15μm (which in for these data samples is 0.25 of the shortest sample side), although arguably inconsistent, does not lead to an erroneous indication of aggregation. If anything, the aggregation is rather underestimated. Additionally, given what is biologically relevant and interesting (the cell diameter itself is 15–20μm), t-values smaller than 15μm do not deliver any result of interest in the present case.
The analysis of the 3D-distribution of specifically labeled neuron populations will be an important contribution to the characterization of neuronal types and for studying structure-function relationships. The mouse brain samples investigated in the present study were from the somatosensory barrel cortex and contained cells expressing the fluorescent protein EGFP under the control of a cell-type specific promoter (Gong et al., 2003). The two different cell types labeled were glt-expressing layer 5 pyramidal neurons projecting to thalamus and pons (corticothalamic/pontine cells), and etv-expressing layer 5 pyramidal neurons projecting to striatum (corticostriatal cells; Groh et al., 2010). Using Ripley's K-function adapted to 3D we show that the spatial distribution of the two cell types was largely similar with a significant clustering for t values around 40μm. Although a tendency was observed for more clustering of corticothalamic/pontine (glt-pyramids) compared to corticostriatal cells (etv-pyramids). The larger tendency for clustering of the glt-pyramids in lower layer 5 (the so called, layer 5B) is probably due to a partial overlap between the glt-expressing population and the genetically labeled layer 5B population described in Krieger et al. (2007), where the pyramidal neurons were arranged in cell groups having bundling dendrites. Interestingly the significant clustering appears at t-values in the range of the expected spacing between minicolumns (White and Peters, 1993; Krieger et al., 2007). The emphasis in the present project was to develop a practical tool to study the spatial distribution in 3D, thus a full investigation of the spatial distribution of neurons would require a larger data sample covering the whole extent of the barrel column to account for variations in the cell density within a barrel column (White and Peters, 1993).
In a broader perspective, the emergence of high-resolution 3D digital atlases and tools for 3D morphological reconstructions elucidating various underlying biological structures necessitates the implementation of consistent analytical methodologies in order to utilize the output of these reconstructions in the most optimal way (Allen Brain Atlas1; GENSAT2; Kötter, 2001; Briggman and Denk, 2006; Hjornevik et al., 2007; Martone et al., 2008; Mailly et al., 2009; Oberlaender et al., 2009). Ripley's K-function and its corresponding inferential designs using bootstrap re-sampling is one such methodology for analyzing the spatial distribution of neurons and neuronal processes having the advantage that it is theoretically fairly simple and practically reasonably intuitive.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The station function is based on the two following 2D rotation matrices:
The 2D rotation is even applied to 3D data due to the slight thickness of samples in this paper. The function recognizes the thickness dimension (we choose to call this dimension z) automatically as the dimension with the smallest span of coordinates, and performs a 2D rotation if the angle constituted by the diagonal on the xy-plane and the longest edge exceeds theta degrees. The output is a new set of coordinates constituting a domain which fulfills the assumption of stationarity and a 2D scatter-plot that shows the data before and after the 2D rotation on the xy-plane.
This function divides the initial 2D set of coordinates into subsets where the width and length of the domains embodying the new subsets is as equal as possible. The goal, in other words, is to create quadratically shaped subsets on the 2D plane.
We introduce the following notations for an arbitrary event i with coordinates (xi,yi,zi):
where xmin, for instance, represents the lower sample domain boundary along the x-axis and xmax its corresponding upper boundary.
We establish the following edge correction term for event i with coordinates (xi,yi,zi):
where ci(t) is the outside-of-sample volume of a sphere with radius t intercepted by one plane (that is volume of the cap outside the sample domain), wi(t) is the outside-of-sample volume of a wedge caused by the interception of two planes and si(t) is the outside-of-sample volume of a semi-wedge caused by the interception of three planes.
The term ci(t) expands to:
Likewise wi(t) is expanded to:
The semi-wedge volumes can be expressed by:
Implementing the first semi-wedge integral in MATLAB for a given t, given that (t>dxmin && t>dymin && t>dzmin), corresponds to:
F=@(x,y)(sqrt(t.^2−(x.^2+ y.^2))−dymin). *(t.^2−x.^2−y.^2<= dymin^2);
Vol_sw= dblquad(F,dzmin,i,dxmin,sqrt (t^2−dymin^2),tol);
where tol is a specified tolerance degree in the numerical evaluation by the function dblquad.