|Home | About | Journals | Submit | Contact Us | Français|
We present a novel algorithm, Partial Vector Space Projection (PVSP), for estimation of missing data given a database of similar datasets, and demonstrate its use in restoring the centerlines through simulated occlusions of femoropopliteal arteries, derived from CT angiography data. The algorithm performs Principal Component Analysis (PCA) on a database of centerlines to obtain a set of orthonormal basis functions defined in a scaled and oriented frame of reference, and assumes that any curve not in the database can be represented as a linear combination of these basis functions. Using a database of centerlines derived from 30 normal femoropopliteal arteries, we evaluated the algorithm, and compared it to a correlation-based linear Minimum Mean Squared Error (MMSE) method, by deleting portions of a centerline for several occlusion lengths (OL: 10mm, 25mm, 50mm, 75mm, 100mm, 125mm, 150mm, 175mm and 200mm). For each simulated occlusion, we projected the partially known dataset on the set of basis functions derived from the remaining 29 curves to restore the missing segment. We calculated the maximum point-wise distance (Maximum Departure or MD) between the actual and estimated centerline as the error metric. Mean (standard deviation) of MD increased from 0.18 (0.14) to 4.35 (2.23) as OL increased. The results were fairly accurate even for large occlusion lengths and are clinically useful. The results were consistently better than those using the MMSE method. Multivariate regression analysis found that OL and the root-mean-square error in the 2cm proximal and distal to the occlusion accounted for most of the error.
The problem of estimating missing data is widely encountered in medical imaging. Because human anatomy is fairly consistent from person to person, prior knowledge about typical shapes of organs might be used to estimate missing data from medical images. In contrast medium-enhanced Computed Tomography Angiography (CTA) of the lower extremities, missing data problems appear in the form of occluded blood vessels. In patients with peripheral arterial occlusive disease, complete obstruction of blood flow occurs preferentially in the region of the distal superficial femoral artery and proximal popliteal artery - or femoropopliteal artery (Zarins et al., 2005). Comprehensive visualization of peripheral arterial occlusive disease requires the computation of so-called curved planar reformations through the centerlines of perfused and occluded portions of an artery as shown in Figure 1 (Fleischmann et al., 2006). Since contrast medium-enhanced blood cannot reach occluded portions of the vessel, the CT attenuation values of these occluded segments differ substantially from those with maintained blood flow, making them inconspicuous on a CT scan. Available automatic centerline tracking algorithms based on density or gradient information thus fail to track the clinically occluded segments. In practice, this problem is solved by inspecting the transverse CT images and manually selecting the missing points of the centerline based on the user’s knowledge of arterial anatomy.
We propose an automated knowledge-based approach to solve this problem, consisting of two main aspects, namely, characterization or description of shapes and modifying the description to match the data available from the subject. We characterize the shape of a vessel centerline by expressing it as a combination of certain basis functions. The basis functions are eigenshapes obtained by performing Principal Component Analysis (PCA) (Cootes et al., 1995; Duda et al., 2000) on a database of similar shapes. We also make use of anatomic landmarks to define oriented frames of reference for these vessels. We then express the incomplete dataset as a linear combination of the eigenshapes and use it to estimate the missing data. We correlate the error with several parameters and try to build a statistical model for predicting the error. We also compare the results with those predicted by a minimum mean squared error method.
This paper is organized as follows. In Section 2 an overview of the work done in the field of shape analysis for different applications is presented. In Section 3 the notation used in this paper is explained. Sections 4 and 5 explain how the database of centerlines was acquired and parameterized. Section 6 explains how the database consists of similar shapes and how the linear dependency in shapes can be used to serve our purpose. Section 7 explains our algorithm and a minimum mean squared error method for comparison. In Section 8 we explain an additional step to improve the results. Sections 9 and 10 present the experimental framework and results in detail. In Section 11 we discuss the results and limitations of our method.
When the lumen of a normal arterial segment is filled with contrast medium-opacified blood, tracking can be accomplished using intensity information from the CT data by cost function minimization (Kanitsar et al., 2001), by using the connected components with minimal path technique (Cohen and Deschamps, 2001), by using flux driven algorithms (Bouix et al., 2005) and by skeletonization (Paik et al., 1998). Vessel tracking of Magnetic Resonance Angiography data has been done using these methods as well as by geometry-based methods (Fridman et al., 2004). Model based algorithms use additional prior information about the topology (Golland and Grimson, 2000) or shapes of the structure (Kirbas et al., 2004). However, when one or more segments are non-opacified due to atherosclerotic vessel occlusion, more advanced techniques may be required. To our knowledge, this specific problem of tracking centerlines through occlusions in blood vessels has not been addressed.
There have been attempts to estimate more general shapes from incomplete or corrupted information based on prior knowledge. Level-set based methods using shape priors have been shown to be useful in segmenting occluded objects (Cremers and Soatto, 2003; Fussenegger et al., 2006; Kim et al., 2005) as well as noisy data (Leventon et al., 2000). These approaches do not make use of the knowledge of which part of the object is occluded, whereas our method uses this knowledge to get a more precise estimate. Prior knowledge of shapes have also been used to fill in incomplete 3D laser scans (Pauly et al., 2005). However, in this approach the database consists of clearly distinguishable and labeled objects, whereas in our application the database consists of the objects that are anatomical variations of the same object (a specific vessel centerline). A survey of content based 3D shape retrieval methods (Tangelder and Veltkamp, 2004), mainly used for Computer Aided Detection and visualization, shows that the methods applicable to partial matching of data, as required in our application, tend to lack robustness.
Approaches to define shapes and deformations for the purpose of shape-modeling and classification include using splines or wavelet models (Kybic amd Unser, 2003; Mehaute, 1994), using spherical harmonic basis functions (Keleman et al., 1999), using medial primitives (Styner and Gerig, 2000) and defining active shape models (Cootes et al., 1995). The description of a shape based on the eigenshapes of the family has been used to describe the nature of variation of shapes (Cootes et al., 1995; Baumberg et al., 1995; Kotcheff and Taylor, 1998). This approach has gained popularity in medical imaging applications due to the inherent similarity in the shapes of the same organ imaged in different subjects. Similarity of shapes can be used to classify shapes into different categories (Styner and Gerig, 2000; Gerig et al., 2003).
In our approach, we defined a 3D centerline curve as a collection of points in space, the basic idea of which comes from the Point Distribution Model of shapes (Bowden et al., 2000). Given a collection of such shapes, the data are analyzed with Principal Component Analysis to produce eigenshapes of the family of shapes. For the purpose of performance evaluation, we compare our results with those of a correlation based algorithm (Haykin and Widrow, 2003). We extend the idea of warping models of basic shapes to define new shapes by transformation (rotation and translation), scaling and linear combination of curves in a 3D space in specific oriented frames of reference. The concept of using PCA to find eigenvectors of shapes, as used in our work, is similar to the Active Shape Models proposed by Cootes et al. (1995). Our approach, however, is designed to describe occluded shapes for the purpose of estimating their missing portions.
Let V = f(t) represent a row vector of length p as a function of discrete parameter t = 0, 1, …, N − 1. In other words f(t) is a collection of p functions. If V is unknown for t = m, m + 1,..., m + n − 1, our task is to estimate f(t) for these missing indices. We are given a database of N similar functions Si = fi(t) for t = 0, 1, …, N − 1 and i = 1, 2, …, M, where each Si is a variable row vector of length p. Note that each of Si and V can be completely described as an N × p matrix. For example, for p = 2:
In short, we are trying to reconstruct a missing piece of a multidimensional curve using prior knowledge of similar curves. An example of such a situation would be estimating an occluded portion of the centerline of an artery imaged using CT Angiography, using a database of centerlines obtained from scans of other patients, where we expect similar, though not identical, trajectories.
To construct a knowledge base, we created a database of lower extremity vascular centerlines of 30 subjects (17 men and 13 woman: age 19–86 year) without peripheral arterial occlusive disease, who underwent lower-extremity CT angiography (18 for planning of free fibular flap graft harvesting, 4 for trauma, 5 for non-vascular causes of claudication due to musculoskeletal or neurologic disorders and 3 for miscellaneous reasons). Prior Institutional Review Board approval was obtained for retrospective data collection. CT data were acquired with an 8- or 16-channel multiple detector-row CT scanner. Transverse CT images (512×512 matrix) were reconstructed at field of views ranging from 300 to 450mm with a 1.25mm nominal section thickness and 0.9mm section spacing. Vessel trees were extracted semiautomatically by a vascular radiologist using a vessel tracking and centering algorithm described in Kanitsar et al. (2001). The method only requires the user to identify the beginning and the end of the opacified vessel. The algorithm uses the density and gradient information and minimizes a cost function. This method is shown to perform well within clinically desired accuracy for vessels without any disease. The arteries included in our database were the common iliac, external iliac, femoral, popliteal, tibio-peroneal trunk, peroneal, anterior tibial and posterior tibial arteries. In this study we used only femoropopliteal arteries of the left lower extremities.
Arteries of subjects of different heights may have different segment lengths and proportions. To compensate for these differences we used vascular bifurcations as anatomic landmarks to register different data sets with each other. The landmarks used to identify the femoropopliteal artery were the common femoral artery bifurcation, and the popliteal artery bifurcation. These two landmarks correspond approximately to the hip joint and the knee joint respectively and, therefore, compensate for the flexing of these joints and the distance between them. The femoropopliteal artery was defined in a specific reference frame relative to the anatomic landmarks. The original coordinate system used to describe the CT data has its X, Y and Z axes parallel to the left-right, anterior-posterior and superior-inferior directions respectively. In the new reference frames, used for describing the femoropopliteal artery centerline, the Z axis runs through the common femoral bifurcation (Z = 0) and the popliteal bifurcation (Z = 450). The new X axis is perpendicular to the original Y axis and the new Y axis perpendicular to the original X axis. All of the new axes are perpendicular to each other. Both the original and the new coordinate systems are shown in Figure 2 from anterior-posterior (AP) and lateral views. Point 1 is the common femoral bifurcation and point 2 is the popliteal bifurcation. It is assumed that the locations of these bifurcations can be provided by the user. This implicit scaling of the Z axis to the distance between these bifurcations is an effort to normalize the segment length to patient height. It should be noted that we did not scale the X or Y axes, as there is no known correlation of segment distribution in this plane with any anatomical variation.
Each vessel centerline is then stored as a 3-D curve, uniformly sampled in the Z direction of its reference frame. Each sample of the curve is represented with its (X, Y ) coordinates as a function of its Z coordinate. Each centerline is then resampled to have an equal number of samples, N. Therefore, using the convention defined in Section 3, Si [Xi, Yi], t Z, p = 2 and M = 30. Since there are 450 points in each curve, N = 450. Here we implicitly assumed that the curve does not run parallel to the (X, Y) plane of the modified coordinate system and every value of Z must correspond to a unique point in the curve. This assumption holds true for the femoropopliteal artery, since the artery is always observed to go monotonically from proximal to distal end. We note that although parametrization of 3D curves is often done as a function of arc length, (X(t), Y(t), Z(t)), we did not follow this approach, since for our application we need to parameterize incomplete curves, which is not possible if the curve is not completely known.
Figure 3 shows all 30 femoropopliteal centerlines in the Anterior-Posterior (AP) view, in the oriented reference frames described earlier. The vertical axis represents the line joining the common femoral and popliteal bifurcation. Figure 4 shows the same centerlines in the lateral view. Note that the curves have some fundamental similarities, having the same anatomic origin, although the shapes are not identical and the span of the curves in X and Y dimensions is not the same for every centerline. To examine the linear dependency of the curves on each other, we performed PCA on the data and found the cumulative sum of the eigenvalues of the covariance matrix, with the sum of the eigenvalues normalized to 1 (Figure 5). Note that the first principal component captures about 85% of the variation in the data, whereas about 5 principal components capture around 99%. This shows a significant linear dependence of the data. Figures 6 and and77 show the first and the second principal components in each figure, in AP and lateral views respectively. Note the similarity of shapes between the curves and their principal components.
In this algorithm we assume that the curve to be reconstructed can be approximated as a linear combination of the curves in the database. Given the derived set of best orthonormal basis functions, the curve to be reconstructed is also a linear combination of the basis functions. Let B represent the matrix formed with kb number of basis functions as columns, where 1 ≤ kb ≤ k. We always choose kb basis functions in the decreasing order of the corresponding eigenvalues. Let Vv be the vector form of V, obtained by concatenating the columns of the N × p matrix of variable V. We can write
Where ac is the column vector containing the coefficients of each basis function. The elements of vector Vv can be separated into two vectors: Vvkn of length (N − n)p, corresponding to the known portion of the curve and Vvun of length np, corresponding to the unknown portion. Similarly the rows of matrix B can also be separated to form two matrices Bkn and Bun, corresponding to Vvkn and Vvun respectively. Thus, we have
vun thus estimated can be written as matrix of size n × p. Here is called the pseudo-inverse of Bkn. More details about the pseudo-inverse can be found in Ben-Israel and Greville (2003). Because the PVSP algorithm requires inversion of a matrix, we need to address the issue of what happens if the matrix is rank deficient. In our case, we can always ensure that the matrix to be inverted is full rank, by choosing a smaller number of eigen-curves.
Due to the lack of other automated methods addressing the problem of interpolating missing vessel segments, we decided to compare our algorithm with a linear minimum mean squared error (MMSE) estimate of the missing data. In a standard linear MMSE estimate setup, each missing data-point can be represented as a weighted sum of the known data-points and a constant. Hence we can write
Each Wi is a weight column vector associated with a coordinate of the missing data point. Therefore W is a np × (N − n)p + 1 matrix. If the rows of the data matrix D are separated into two matrices Dkn and Dun corresponding to the known and unknown data of Vv, then the MMSE estimate can be written in the form of a pseudo-inverse as
where and is a row vector of appropriate size with every element 1.
The results obtained by the PVSP method may need further correction for matching end-points of the estimated curve with the known curve, since in our application, we require that the interpolated section is contiguous with the known portions of the centerline. There are 2 constraints in each of the X and Y coordinates for matching the endpoints. A polynomial of degree one has two constraints and hence can be used as a correction curve. The polynomial
is uniquely determined by specifying x1 = f(z1) and x2 = f(z2). We add two linear polynomials, one in each of X and Y coordinates, such that the physical constraints are satisfied at each one of the two points, one proximal and one distal to the occlusion.
The basic assumption behind the PVSP algorithm is that a partially known curve can be represented as linear combination of the curves in the database. To validate this assumption, we performed PCA on the database of all 30 centerlines and observed that the curves are indeed similar to each other. Centerline paths in truly occluded arteries are not known accurately. Hence, we simulated occlusions in order to evaluate the performance of the algorithm presented. We selected each one of 30 patients as a test case and used the remaining 29 as the database, a method also known as leave-one-out cross validation. For each of the 30 femoropopliteal arteries, we created occlusions of nine different lengths: 10mm, 25mm, 50mm, 75mm, 100mm, 125mm, 150mm, 175mm and 200mm. For each of these occlusion lengths, we chose as many locations as would fit the entire vessel segment, allowing 50% overlap between occluded regions. Note that the length was with respect to the associated frame of reference, assuming that the distance between the common femoral and the popliteal bifurcations was 450mm. The occlusions were simulated by simply assuming that the centerline coordinates were not known for the segment of that particular length. Out of the 29 principal components obtained from the PCA, we used only 15 (that is, kb = 15). This is roughly 3 times the number of principal components required to capture 99% of the variation in all 30 centerlines (Fig. 2), and attempts to assure that we will capture this much of the variation in the database for each experiment where one centerline is left out and the database used to reconstruct the missing segment is made up of the remaining 29.
The results were quantitatively assessed by evaluating the Euclidean distance between each estimated point and its corresponding original point. Since in clinical practice the quality of a Curved Planar Reformat (CPR) generated using a centerline is determined by the worst deviation of the centerline from its correct path, we used the maximum point-wise error of each simulation (referred to as the Maximum Departure or MD), as our error metric. It should be noted that our algorithm leaves some residual error in the known section of the curve (even after the endpoint matching step). The RMS value of this error (referred to as the Neighborhood Error or NE) in the portions 2cm proximal and distal to the occlusion was used as one of the predictors for the actual error. We also computed the distance of each subject from the mean of the whole database, by computing the square root of the sum of squares of point-wise distances between two curves, referred to as the Distance Index (DI). This distance tells us how far a curve is from the center of the cluster of all curves. Intuitively, the higher this distance is, the lower is the similarity between this curve and the remaining curves. The data were analyzed by a random-intercept mixed-model regression of MD on occlusion length (OL), NE, age and sex of the patient and DI. Because of skewness in the distributions of MD and OL, log-transformed values were used. All statistical analyses were done with Stata 9.1 (Stata Corp., College Station, TX, USA).
We calculated the mean and standard deviation of MD for each of 9 occlusion lengths for all subjects as shown in Table 1. The distributions of the MD error for each length are shown in Figure 8. Figure 10 compares log-MD for the MMSE and the PVSP methods. The PVSP method consistently performs better (p<0.001, Wilcoxen signed rank test). Table 2 shows the estimated population percentiles for the MD. For example, we estimate that 90% of 75cm occlusions will be reconstructed with a maximum departure from the true centerline of no more than 2.84mm. The numbers in parentheses are upper bounds of the 95% confidence intervals of these estimates, so we can say with 95% confidence that 90% of 75cm occlusions will be interpolated with a MD of no more than 3.26mm.
The random-intercept mixed-model regression analysis indicates that the most important effect on MD is that of the OL (p<0.0001), with a (log-log) coefficient of 1.06 (95% CI: 1.04–1.08): each order-of-magnitude increase in OL increases MD by a factor of 11.5. As a point of comparison, in simple linear regression of log-MD vs. log-OL accounts for 50% of the variability in the former. There were smaller but still significant effects of the NE and the DI, both with p<0.001. Figure 9 shows the plot of log-MD vs log-NE, with a line showing a linear regression. There was no significant effect of age or sex of the patient, but patient-to-patient variation, as reflected in the random regression intercepts, accounted for approximately 30% of the residual variability. The complete regression table is shown in Table 3.
The estimation of arterial centerlines through diseased arteries for subsequent generation of CPR images is an integral part of vascular imaging using CT Angiography (Fleischmann et al., 2006). While automated or semiautomated vessel-tracking algorithms are helpful to identify the centerlines of normal or mildly disease arteries, they fail in clinically important severely diseased and occluded portions of the arterial tree, and these segments have to be processed manually by expert users familiar with vascular anatomy. In the setting of lower extremity CTA, this can be a very time consuming task.
In this paper we have described a new method for estimating missing portions of curves when a database of similar curves can be constructed, and showed its application in restoring missing portions of vascular centerlines in CTA data with simulated femoropopliteal artery occlusions. Our results show that the estimated centerline path stays in the close vicinity of the known path. From the box plots in Figure 8 we can see that while larger errors did occur, it was only for a small fraction of the total simulation. Even for larger occlusion lengths, the estimate is a reasonable guess of where the centerline could be, and has a potential for being used as an initial guess for other methods and/or minimal user intervention. We also showed that it is possible to build a statistical model that can be used to predict the error. The most significant factor affecting the error was the length of the occlusion. This is not surprising since longer occlusions are inherently associated with proportionally shorter known portions of the centerline. The ability to predict the error of an estimated segment is potentially important, because it might be used to prompt the user to intervene if the expected error is too large.
Our study has several limitations. First, our results are based on a database of only 30 left femoropopliteal arteries and thus cannot reflect the total arterial shape variation in humans. Since none of our subjects had arterial occlusive disease, we cannot generalize our findings to this particularly important patient population. Arterial centerlines in patients with true arterial occlusions are not clearly defined, however, and thus we decided to use a set of known centerlines for these initial experiments instead. Our algorithm assumes that the centerline is never perpendicular to the uniformly sampled axis, an assumption generally true for the femoropopliteal and tibial arteries. This approach will have to be modified if this assumption is not valid for other more tortuous arteries. We also assume that the common femoral and popliteal artery bifurcations are reliable landmarks and can be provided by the user. The sensitivity of the estimate to the error in identifying these bifurcations needs to be assessed.
The obvious next step in this research is the generation of a larger database and the evaluation of the accuracy of this approach in patients with peripheral arterial occlusive disease in comparison to expert derived centerlines.
This work was supported in part by grant RO1 HL67194 from the National Institutes of Health (NIH). Justus E. Roos was supported by the Swiss National Science Foundation, PBBEB 106796.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.