Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2904530

Formats

Article sections

Authors

Related links

Proc IEEE Int Symp Biomed Imaging. Author manuscript; available in PMC 2010 July 15.

Published in final edited form as:

Proc IEEE Int Symp Biomed Imaging. 2010 April 14; 2010: 1401–1404.

doi: 10.1109/ISBI.2010.5490260PMCID: PMC2904530

NIHMSID: NIHMS181352

Maria S. Ayad,^{1} Junghoon Lee,^{2} Anton Deguet,^{1} Everette C. Burdette,^{3} and Jerry L. Prince^{2,}^{1}

C-arms are increasingly being used to assist in a large number of surgical procedures. Fairly accurate and fast pose estimates are needed for non-encoded c-arms that are commonly available in most operating rooms in order to attain quantitative feedback from the x-ray images. We propose the use of an image-based fiducial composed of a set of coplanar ellipses to track the c-arm. We adopt an existing method for planar homography and propose a variation consisting of three modifications: including a weighting scheme for the linear system used, orthonormalizing the vectors pertaining to the rotation component of the transformation, and fine tuning the estimates using a constrained optimization step. We show that these variations make the approach more robust to noise that typically arises in fluoroscopy imaging and guarantee the orthonormality of the estimated rotation. The performance of the modified algorithm is demonstrated using realistic x-ray simulations. We also run sensitivity analysis for segmentation and calibration errors that are likely to occur in a practical setting. Preliminary results show mean tracking accuracy within 0.5° and 0.9 mm for segmentation error variance up to 2 pixels squared. The algorithm also proves to be robust to calibration errors up to 1 cm.

C-arm fluoroscopy has been widely used for qualitative assessment of a variety of computer-assisted surgical procedures [1]. Attaining quantitative feedback from the x-ray images will become more feasible and widely available through developing fast, cheap and reliable techniques for calibration and pose estimation for the c-arm. In this paper we present a clinically friendly solution for pose determination of the commonly-available nonencoded c-arms that exist in most operating rooms (OR)s. There exist two major approaches for pose recovery: reliance on auxiliary devices and employing image-based tracking fiducials. External trackers, such as optical and EM trackers have proven to be both expensive and cumbersome in the OR. Optical trackers require a line of sight, whereas EM trackers are sensitive to the presence of metal objects in the work area. Therefore, fiducial-based pose estimation has gained wider clinical acceptance [2].

For a fiducial to be clinically appealing, it needs to be compact in size, easy to include in the surgical working space and noninterfering with the trajectory of the c-arm motion, which often requires a wide field of view. This motivated the idea of a flat fiducial that can be cheaply manufactured and easily placed under the patient and tracked without special mounting or fixation issues. If the x-ray absorption that is created by this fiducial is relatively low, then it will be visible to the clinician, but will not interfere with visualization of the anatomy or surgical tools that are within the field of view. We thus needed to choose suitable features that can satisfy a number of requirements. First, they need to be easily embedded in a simple non-invasive flat fiducial; second, they should accurately recover the pose intraoperatively; third, they must neither interfere with the physical constraints of the space nor negatively impact the quality of the images needed for guidance. Our choice was a set of coplanar ellipses as shown in Fig. 1(b). The study of conics for pose estimation and object recognition has been an active research topic for several groups due to a number of reasons. First, conics are more compact than points or segments. Second, segmentation of conics is more immune to noise due to the large number of points on the curves. Third, they are easier to establish correspondence. Fourth, they provide closed-form solutions. Ellipses in particular are especially attractive since a 3D ellipse projects to an ellipse in the image. It has been shown in [3, 4] that c-arm pose estimation for computer-assisted surgery can be achieved by a single ellipse and a point correspondence.

(a) Different coordinate systems involved in the projection of a conic. (b) X-ray image including ellipse projections.

In the current work, we eliminate the need for points and utilize only planar ellipses with known correspondence. With planar targets, the problem of pose determination amounts to solving a planar homography. Solving camera pose and planar homography in general using conics has been previously presented by several authors including De Ma [5], Forsyth et al. [6], Sugimoto [7] and Kannala et al. [8]. While De Ma [5] proposes that two ellipses can uniquely determine the pose, it is not clear how this estimate can deteriorate in the presence of noise similar to our target application. On the other hand, Sugimoto's method [7] utilizes a minimum of seven conics. Kannala et al. however, present an attractive approach that needs three or more corresponding conics for computing planar homographies. We adopt this method and apply three main modifications to be able to solve the problem of pose estimation for c-arms from the noisy fluoroscopic images that are typically taken during surgical procedures. First, we propose a weighting technique that accounts for segmentation errors, being a dominant source of noise. Second, we enforce the constraints needed to ensure that the homography computed can represent a meaningful pose in terms of the rotational component. Finally, a constrained optimization step is added to the algorithmic flow to fine tune all the pose parameters after the third rotation vector is obtained.

Let **x _{w}** = [

$$\mathbf{\text{x}}=\mathbf{\text{R}}\phantom{\rule{0.3em}{0ex}}{\mathbf{\text{x}}}_{\mathbf{\text{w}}}+\mathbf{\text{t}},$$

(1)

where **R** = [**r _{1}, r_{2}, r_{3}**] is the rotation matrix and

$$\mathbf{\text{x}}=\mathbf{\text{G}}\phantom{\rule{0.3em}{0ex}}{\mathbf{\text{u}}}_{\mathbf{\text{w}}},$$

(2)

where **G** = [**r _{1}, r_{2}, t**] and

C-arm imaging is modeled by the full perspective projection model where the optical center is the origin of the camera frame, the *z*-axis is the optical axis and the image plane is parallel to the *xy*-plane at a distance *f* from the origin. Let [*o _{u}, o_{v}*] be the location of the image center in pixel coordinates and

$$\mathbf{\text{x}}=z\mathbf{\text{Cu}}={\mathbf{\text{Gu}}}_{\mathbf{\text{w}}},$$

(3)

where

$$\mathbf{\text{C}}=\left[\begin{array}{ccc}{s}_{u}/f& 0& -{o}_{u}{s}_{u}/f\\ 0& {s}_{v}/f& -{o}_{v}{s}_{v}/f\\ 0& 0& 1\end{array}\right].$$

(4)

Let the equation of a conic in the *x _{w}y_{w}*-plane in the world frame be given by

$${\mathbf{\text{u}}}_{\mathbf{\text{w}}}{{\displaystyle {}^{\text{T}}\mathbf{\text{Q}}\mathbf{\text{u}}}}_{\mathbf{\text{w}}}=0,$$

(5)

and the equation of the corresponding projected conic in the image plane be given by

$${\mathbf{\text{u}}}^{\text{T}}\phantom{\rule{0.2em}{0ex}}\mathbf{\text{Au}}=0.$$

(6)

$${\mathbf{\text{u}}}_{\text{w}}{{}^{\text{T}}\mathbf{\text{G}}}^{\text{T}}{\mathbf{\text{C}}}^{-\text{T}}{\mathbf{\text{AC}}}^{-1}{\mathbf{\text{Gu}}}_{\text{w}}=0,$$

(7)

where **C**^{−T} is the transposed inverse of **C**. Comparing (5) to (7), we have

$${\mathbf{\text{G}}}^{\text{T}}{\mathbf{\text{C}}}^{-\text{T}}{\mathbf{\text{AC}}}^{-1}\mathbf{\text{G}}={\mathbf{\text{G}}}^{\text{T}}\stackrel{\mathbf{\sim}}{\mathbf{\text{A}}}\mathbf{\text{G}}=k\mathbf{\text{Q}},$$

(8)

where *k* is a constant and **Ã** = **C**^{−T} **AC**^{−1}.

An extension of this formulation for estimating the pose using a single image of multiple conics has been adopted from the method proposed by Kannala et al. [8] and is also written out below. First, consider two coplanar conics **Q _{1}** and

$${\mathbf{\text{G}}}^{\text{T}}\phantom{\rule{0.3em}{0ex}}{\stackrel{\mathbf{\sim}}{\mathbf{\text{A}}}}_{1}\phantom{\rule{0.3em}{0ex}}\mathbf{\text{G}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{k}_{1}{\mathbf{\text{Q}}}_{1},$$

(9)

$${\mathbf{\text{G}}}^{\text{T}}\phantom{\rule{0.3em}{0ex}}{\stackrel{\mathbf{\sim}}{\mathbf{\text{A}}}}_{2}\phantom{\rule{0.3em}{0ex}}\mathbf{\text{G}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{k}_{2}{\mathbf{\text{Q}}}_{2}.$$

(10)

In order to use the above two equations simultaneously, all the coefficient matrices are normalized to have unit Frobenius norm. Then, the constant *k _{i}* is computed by fixing the determinant of the matrix of unknowns to be unity [8]. Thus

$${k}_{i}={\left(\frac{\text{det}({\stackrel{\mathbf{\sim}}{\mathbf{\text{A}}}}_{i})}{det({\mathbf{\text{Q}}}_{i})}\right)}^{\frac{1}{3}},i=1,2.$$

(11)

Each **Q _{i}** is then replaced by

$${\mathbf{\text{M}}}^{\text{T}}\phantom{\rule{0.3em}{0ex}}{\stackrel{\mathbf{\sim}}{\mathbf{\text{A}}}}_{1}\phantom{\rule{0.3em}{0ex}}\mathbf{\text{M}}={\mathbf{\text{Q}}}_{1},$$

(12)

$${\mathbf{\text{M}}}^{\text{T}}\phantom{\rule{0.3em}{0ex}}{\stackrel{\mathbf{\sim}}{\mathbf{\text{A}}}}_{2}\phantom{\rule{0.3em}{0ex}}\mathbf{\text{M}}={\mathbf{\text{Q}}}_{2}.$$

(13)

where **M** is a constant multiple of **G** and has determinant value equal to one. From (12) and (13) we get that

$${\mathbf{\text{M}}}^{\text{T}}\phantom{\rule{0.3em}{0ex}}{\stackrel{\sim}{\mathbf{\text{A}}}}_{1}\phantom{\rule{0.3em}{0ex}}\mathbf{\text{M}}\phantom{\rule{0.3em}{0ex}}{\mathbf{\text{Q}}}_{1}^{-1}={\mathbf{\text{M}}}^{\mathbf{\text{T}}}\phantom{\rule{0.3em}{0ex}}{\stackrel{\sim}{\mathbf{\text{A}}}}_{2}\phantom{\rule{0.3em}{0ex}}\mathbf{\text{M}}\phantom{\rule{0.3em}{0ex}}{\mathbf{\text{Q}}}_{2}^{-1},$$

(14)

which gives

$${\stackrel{\sim}{\mathbf{\text{A}}}}_{2}^{-1}\phantom{\rule{0.3em}{0ex}}{\stackrel{\sim}{\mathbf{\text{A}}}}_{1}\phantom{\rule{0.3em}{0ex}}\mathbf{\text{M}}=\mathbf{\text{M}}\phantom{\rule{0.3em}{0ex}}{\mathbf{\text{Q}}}_{2}^{-1}\phantom{\rule{0.3em}{0ex}}{\mathbf{\text{Q}}}_{1}.$$

(15)

Let ${\mathbf{\text{P}}}_{A}={\stackrel{\mathbf{\sim}}{\mathbf{\text{A}}}}_{2}^{-1}\phantom{\rule{0.2em}{0ex}}{\stackrel{\mathbf{\sim}}{\mathbf{\text{A}}}}_{1}$ and ${\mathbf{\text{P}}}_{Q}={\mathbf{\text{Q}}}_{2}^{-1}\phantom{\rule{0.2em}{0ex}}{\mathbf{\text{Q}}}_{1}$. Thus, we have

$${\mathbf{\text{P}}}_{A}\phantom{\rule{0.3em}{0ex}}\mathbf{\text{M}}=\mathbf{\text{M}}\phantom{\rule{0.3em}{0ex}}{\mathbf{\text{P}}}_{Q}.$$

(16)

Equation (16) which results from a pair of conics and their corresponding projections can be rewritten as

$${\mathbf{\text{F}}}_{12}\phantom{\rule{0.3em}{0ex}}\mathbf{\text{m}}=0,$$

(17)

where **m** is a 9 × 1 vector containing the elements of **M** and **F**_{12} is a 9 × 9 matrix obtained from the elements of **P*** _{A}* and

In the case of more than two conics, **F** is formed by stacking matrices **F*** _{ij}* arising from the matrix equation relating conics

Equation (17) now is an application of the total least squares problem where the right hand side is a zero vector. The idea as in [9, 10] is to find a rank-deficient least squares estimate of **F**_{0} by finding an error matrix **E** with minimum Frobenius norm that lowers the rank of **F**, i.e.

$$\text{min}\parallel \mathbf{\text{F}}-{\mathbf{\text{F}}}_{0}{\parallel}_{F}\phantom{\rule{1em}{0ex}}\text{s.t.}\phantom{\rule{1em}{0ex}}\text{rank}({\mathbf{\text{F}}}_{0})=8.$$

(18)

By the Eckart-Young-Mirsky theorem [9, 10], we have

$${\mathbf{\text{F}}}_{0}=\sum _{i=1}^{8}{s}_{i}{\mathbf{\text{u}}}_{i}{\mathbf{\text{v}}}_{i}^{\text{T}},$$

(19)

where

$$\mathbf{\text{F}}={\mathbf{\text{USV}}}^{\text{T}}=\sum _{i=1}^{9}{s}_{i}{\mathbf{\text{u}}}_{i}{\mathbf{\text{v}}}_{i}^{\text{T}}$$

(20)

is the singular value decomposition (SVD) of **F** with singular values *s*_{1} ≥ *s*_{2} ≥ .. ≥ *s*_{9}. Now the singular vector corresponding to the least singular value of **F** is used to estimate the vector **m** that solves (17). Scaling and sign are determined as stated above for the case when F has rank equal to 8.

Although this technique with multiple conic pairs has been shown to outperform the homography estimation results obtained using points only and a single pair of conics, the quality of pose estimates degrades dramatically even with a small amount of segmentation errors (which is inevitable according to the existing ellipse segmentation techniques). In this paper we apply three modifications to the algorithm described in the previous section. With such changes, the performance of this algorithm can be improved in the presence of noise beyond what has been achieved by the original algorithm. First, we incorporate a weighting scheme that enables us to take the effect of errors in the input data into consideration while solving the homogeneous total least squares (TLS) problem in (17); next, we find a rotation matrix that is a best orthonormal approximation to the one retrieved by the first step, and finally we refine all the pose parameters including the translation vector through a constrained optimization that minimizes the norm of the right hand side of the weighted system of equations and enforces orthonormality conditions on the rotation.

With the overdetermined system of equations reached in (17), our simulations showed - in agreement with the literature - that in order to achieve reasonable pose estimates in the presence of realistic amounts of noise, the consequences of the error sources need to be inherently taken into account in the unified framework of homography estimation [10]. This is attained by giving different credence to each of the equations stacked in **F**, a process termed equilibration, which relies on the assumption that the error matrix **E** is not of i.i.d structure. This process essentially replaces the metric in (18) by

$$\text{min}\parallel {\mathbf{\text{W}}}_{L}(\mathbf{\text{F}}-{\mathbf{\text{F}}}_{0}){\mathbf{\text{W}}}_{R}{\parallel}_{F},$$

(21)

where **W*** _{L}* and

The estimated parameters that we get from the solution to the TLS system is essentially a general planar homography and due to the presence of noise in the observations, orthonormality of the rotation matrix of the pose in our case is not ensured. So, another SVD operation is performed on the first two columns of the rotation and a best approximate is found by multiplying the matrices of the left and right singular vectors by a matrix containing the first two columns of a 3 × 3 identity matrix. The third column of the rotation is then calculated as the cross product of those two columns.

For a final tuning of the pose - including also the translational element of the pose - retrieved in the previous steps, we solve a constrained optimization problem that essentially maintains orthonormality gained in the previous step and simultaneously finds an optimal pose for our problem.

$$\begin{array}{ccc}\text{min}\parallel {\mathbf{\text{W}}}_{L}\mathbf{\text{Fm}}{\parallel}^{2}& \text{s.t}.& {\mathbf{\text{m}}}^{\text{T}}{\mathbf{\text{B}}}_{1}\mathbf{\text{m}}=0,\\ & & {\mathbf{\text{m}}}^{\text{T}}{\mathbf{\text{B}}}_{2}\mathbf{\text{m}}=1,\\ & \text{and}& {\mathbf{\text{m}}}^{\text{T}}{\mathbf{\text{B}}}_{3}\mathbf{\text{m}}=1.\end{array}$$

(22)

where **B _{1}** is the matrix corresponding to the orthogonality condition of the first two columns of the rotation matrix and

Simulation studies were conducted to examine the effect of several factors on the accuracy of the estimated pose.

Simulated planar sets of 3 ellipses were used (see Fig. 1(b)). In order to simulate ellipse segmentation errors, we uniformly sampled 50 points on each projected ellipse, added Gaussian noise to the sample points, then fitted a new ellipse to the points using the method in [12]. The variance of the Gaussian was increased from 0.5 to 2 pixels squared in increments of 0.5. The focal length was set to 1 m and the pixel size used for our experiments was 0.44 mm for each of the *u* and *v* directions. For each error level, 700 experiments with 7 different poses were simulated; the mean and standard deviation of rotation and translation errors in the pose are shown in Fig. 2(a). In order to demonstrate the effect of weighting described in Section (2.3.1), we present results analogous to the ones in Fig. 2(a) that were obtained by running our code on the same datasets before modifying the algorithm to incorporate the weighting step. The results presented in Fig. 2(b) were attained by directly using the unweighted system (17). It is to be noted that in these experiments the rotation orthonormalization and the final optimization step were both performed as before.

In order to test the effect of the size of ellipses used for the fiducial, the exact same simulations were done using a smaller pattern, i.e. with ellipses whose semi-major and semi-minor axes are half of those in the pattern used to generate the results in Fig. 2(a). The relative positioning and orientation were the same. The results for these simulations (based on a total of 2800 experiments) are shown in Fig. 2(c) and again show degradation in accuracy as the level of segmentation error increases. These results show that for a given configuration, the larger we can make the pattern, the better for pose accuracy.

To assess the effect of miscalibration of the c-arm, we generated a realistic simulation for possible errors in the position of the focal spot. The magnitude of the error was increased from 0 to 15 mm in increments of 1 mm. During simulation, the fact that the uncertainty in the focal length is much larger than that in the image plane was taken into consideration and the direction of the in-plane drift of the principal point was uniformly sampled over all angles. For this experiment, segmentation noise has not been taken into account and therefore the system (17) has a unique solution without need for the weighting matrix **W*** _{L}*. Experiments were done as previously described using the same set of three ellipses and the results are shown in Fig. 2(d). The algorithm proves to be robust to calibration errors upto 1 cm.

We proposed the use of a set of coplanar ellipses for c-arm pose estimation to assist in image-guided procedures. It has been shown in the literature on 3D reconstruction from 2D fluoroscopic images that reasonably accurate pose estimates for the c-arm (within a few degrees for the rotation and several mm for the translation) are sufficient for most clinical purposes. We showed using preliminary simulation studies that even with only three coplanar ellipses, we can attain a tracking accuracy within 0.5° and 0.9 mm for segmentation error variance up to 2 pixels squared and we are currently pursuing to apply this approach for precisely machined mechanical phantoms. Practical issues related to the specific fabrication parameters are still ongoing research issues. The presence of ellipses in the field of view must be considered and understood relative to the surgical procedure as well. In summary, this is a first step towards a simple clinically amiable tracker that may spare the need for proper positioning, mounting and yet provides a wide field of view for image capture in the OR. It provides proof of concept and encourages further investigation of this approach.

This work has been supported by NIH/NCI 2R44CA099374, and a National Science Foundation Graduate Research Fellowship.

1. Hofstetter R, Slomczykowski M, Sati M, Nolte LP. Fluoroscopy as an imaging means for computer-assisted surgical navigation. Computer Aided Surgery. 1999;4(no. 2) [PubMed]

2. Yaniv Z, Joskowicz L. Precise robot-assisted guide positioning for distal locking of intramedullary nails. IEEE Transactions on Medical Imaging. 2005;24(no. 5):624–635. [PubMed]

3. Burkhardt D, Jain A, Fichtinger G. A cheap and easy method for 3d c-arm reconstruction using elliptic curves in. Proceedings of SPIE. 2007;6509:65090B.

4. Chintalapani G, Jain AK, Burkhardt DH, Prince JL, Fichtinger G. Ctrec: C-arm tracking and reconstruction using elliptic curves. Computer Vision and Pattern Recognition Workshops, 2008. CVPR Workshops 2008. IEEE Computer Society Conference on; 2008; pp. 1–7. [PMC free article] [PubMed]

5. De Ma S. Conics-based stereo, motion estimation, and pose determination. International Journal of Computer Vision. 1993;10(no. 1):7–25.

6. Forsyth D, Mundy JL, Zisserman A, Coelho C, Heller A, Rothwell C. Invariant descriptors for 3 d object recognition and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1991;13(no. 10):971–991.

7. Sugimoto A. A linear algorithm for computing the homography from conics in correspondence. Journal of Mathematical Imaging and Vision. 2000;13(no. 2):115–130.

8. Kannala J, Salo M, Heikkila J. Algorithms for computing a planar homography from conics in correspondence. Proc. the 16th British Machine Vision Conference (BMVC; 2006; pp. 77–86. Citeseer.

9. Golub GH, Hoffman A, Stewart GW. A generalization of the Eckart-Young-Mirsky matrix approximation theorem. Linear Algebra Applic. 1987;88:317–328.

10. Muhlich M, Mester R. The role of total least squares in motion analysis. Lecture Notes in Computer Science. 1998;1407:305–321.

11. Muhlich M, Mester R. A considerable improvement in non-iterative homography estimation using tls and equilibration. Pattern Recognition Letters. 2001;22(no. 11):1181–1189.

12. Fitzgibbon A, Pilu M, Fisher RB. Direct least square fitting of ellipses. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1999;21(no. 5):476–480.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |