|Home | About | Journals | Submit | Contact Us | Français|
C-arms are increasingly being used to assist in a large number of surgical procedures. Fairly accurate and fast pose estimates are needed for non-encoded c-arms that are commonly available in most operating rooms in order to attain quantitative feedback from the x-ray images. We propose the use of an image-based fiducial composed of a set of coplanar ellipses to track the c-arm. We adopt an existing method for planar homography and propose a variation consisting of three modifications: including a weighting scheme for the linear system used, orthonormalizing the vectors pertaining to the rotation component of the transformation, and fine tuning the estimates using a constrained optimization step. We show that these variations make the approach more robust to noise that typically arises in fluoroscopy imaging and guarantee the orthonormality of the estimated rotation. The performance of the modified algorithm is demonstrated using realistic x-ray simulations. We also run sensitivity analysis for segmentation and calibration errors that are likely to occur in a practical setting. Preliminary results show mean tracking accuracy within 0.5° and 0.9 mm for segmentation error variance up to 2 pixels squared. The algorithm also proves to be robust to calibration errors up to 1 cm.
C-arm fluoroscopy has been widely used for qualitative assessment of a variety of computer-assisted surgical procedures . Attaining quantitative feedback from the x-ray images will become more feasible and widely available through developing fast, cheap and reliable techniques for calibration and pose estimation for the c-arm. In this paper we present a clinically friendly solution for pose determination of the commonly-available nonencoded c-arms that exist in most operating rooms (OR)s. There exist two major approaches for pose recovery: reliance on auxiliary devices and employing image-based tracking fiducials. External trackers, such as optical and EM trackers have proven to be both expensive and cumbersome in the OR. Optical trackers require a line of sight, whereas EM trackers are sensitive to the presence of metal objects in the work area. Therefore, fiducial-based pose estimation has gained wider clinical acceptance .
For a fiducial to be clinically appealing, it needs to be compact in size, easy to include in the surgical working space and noninterfering with the trajectory of the c-arm motion, which often requires a wide field of view. This motivated the idea of a flat fiducial that can be cheaply manufactured and easily placed under the patient and tracked without special mounting or fixation issues. If the x-ray absorption that is created by this fiducial is relatively low, then it will be visible to the clinician, but will not interfere with visualization of the anatomy or surgical tools that are within the field of view. We thus needed to choose suitable features that can satisfy a number of requirements. First, they need to be easily embedded in a simple non-invasive flat fiducial; second, they should accurately recover the pose intraoperatively; third, they must neither interfere with the physical constraints of the space nor negatively impact the quality of the images needed for guidance. Our choice was a set of coplanar ellipses as shown in Fig. 1(b). The study of conics for pose estimation and object recognition has been an active research topic for several groups due to a number of reasons. First, conics are more compact than points or segments. Second, segmentation of conics is more immune to noise due to the large number of points on the curves. Third, they are easier to establish correspondence. Fourth, they provide closed-form solutions. Ellipses in particular are especially attractive since a 3D ellipse projects to an ellipse in the image. It has been shown in [3, 4] that c-arm pose estimation for computer-assisted surgery can be achieved by a single ellipse and a point correspondence.
In the current work, we eliminate the need for points and utilize only planar ellipses with known correspondence. With planar targets, the problem of pose determination amounts to solving a planar homography. Solving camera pose and planar homography in general using conics has been previously presented by several authors including De Ma , Forsyth et al. , Sugimoto  and Kannala et al. . While De Ma  proposes that two ellipses can uniquely determine the pose, it is not clear how this estimate can deteriorate in the presence of noise similar to our target application. On the other hand, Sugimoto's method  utilizes a minimum of seven conics. Kannala et al. however, present an attractive approach that needs three or more corresponding conics for computing planar homographies. We adopt this method and apply three main modifications to be able to solve the problem of pose estimation for c-arms from the noisy fluoroscopic images that are typically taken during surgical procedures. First, we propose a weighting technique that accounts for segmentation errors, being a dominant source of noise. Second, we enforce the constraints needed to ensure that the homography computed can represent a meaningful pose in terms of the rotational component. Finally, a constrained optimization step is added to the algorithmic flow to fine tune all the pose parameters after the third rotation vector is obtained.
Let xw = [xw, yw, zw]T be a point in the world frame (which is the fiducial frame) and x = [x, y, z]T be the same point in the camera frame, as shown in Fig. 1(a). They are related by the equation
where R = [r1, r2, r3] is the rotation matrix and t is the translation vector between the two frames. With no loss of generality, consider points in the xwyw-plane; for all such points, the world frame coordinates can be given by vectors of the form xw = [xw,yw,0]T. In this case, (1) reduces to
where G = [r1, r2, t] and uw = [xw,yw, 1]T.
C-arm imaging is modeled by the full perspective projection model where the optical center is the origin of the camera frame, the z-axis is the optical axis and the image plane is parallel to the xy-plane at a distance f from the origin. Let [ou, ov] be the location of the image center in pixel coordinates and su, sv be the pixel sizes in the u and v directions respectively. Then the image point u = [u, v, 1]T can be given by and . Therefore
Let the equation of a conic in the xwyw-plane in the world frame be given by
and the equation of the corresponding projected conic in the image plane be given by
where k is a constant and Ã = C−T AC−1.
An extension of this formulation for estimating the pose using a single image of multiple conics has been adopted from the method proposed by Kannala et al.  and is also written out below. First, consider two coplanar conics Q1 and Q2 with respective images A1 and A2 as in (5) and (6). From (8) we have
In order to use the above two equations simultaneously, all the coefficient matrices are normalized to have unit Frobenius norm. Then, the constant ki is computed by fixing the determinant of the matrix of unknowns to be unity . Thus
Each Qi is then replaced by ki Qi and we now have
Let and . Thus, we have
Equation (16) which results from a pair of conics and their corresponding projections can be rewritten as
where m is a 9 × 1 vector containing the elements of M and F12 is a 9 × 9 matrix obtained from the elements of PA and PQ.
In the case of more than two conics, F is formed by stacking matrices Fij arising from the matrix equation relating conics i and j and their projections. All different ordered pair are considered resulting in an F matrix of 9n(n − 1) rows and 9 columns. When F has a rank equal to 8, m is directly obtained. It is then rescaled such that elements of the first vector of the rotation matrix form a unit vector, and the sign is determined by ensuring that the object lies between the c-arm source and the image plane. However, in practice, F is often of full rank due to the errors in the conic coefficients. In this case, in order to avoid a trivial solution for the system (17), we can write F as F = F0 + E where F0 is the exact rank-deficient matrix that we would get had there been no errors and E is an error matrix.
Equation (17) now is an application of the total least squares problem where the right hand side is a zero vector. The idea as in [9, 10] is to find a rank-deficient least squares estimate of F0 by finding an error matrix E with minimum Frobenius norm that lowers the rank of F, i.e.
is the singular value decomposition (SVD) of F with singular values s1 ≥ s2 ≥ .. ≥ s9. Now the singular vector corresponding to the least singular value of F is used to estimate the vector m that solves (17). Scaling and sign are determined as stated above for the case when F has rank equal to 8.
Although this technique with multiple conic pairs has been shown to outperform the homography estimation results obtained using points only and a single pair of conics, the quality of pose estimates degrades dramatically even with a small amount of segmentation errors (which is inevitable according to the existing ellipse segmentation techniques). In this paper we apply three modifications to the algorithm described in the previous section. With such changes, the performance of this algorithm can be improved in the presence of noise beyond what has been achieved by the original algorithm. First, we incorporate a weighting scheme that enables us to take the effect of errors in the input data into consideration while solving the homogeneous total least squares (TLS) problem in (17); next, we find a rotation matrix that is a best orthonormal approximation to the one retrieved by the first step, and finally we refine all the pose parameters including the translation vector through a constrained optimization that minimizes the norm of the right hand side of the weighted system of equations and enforces orthonormality conditions on the rotation.
With the overdetermined system of equations reached in (17), our simulations showed - in agreement with the literature - that in order to achieve reasonable pose estimates in the presence of realistic amounts of noise, the consequences of the error sources need to be inherently taken into account in the unified framework of homography estimation . This is attained by giving different credence to each of the equations stacked in F, a process termed equilibration, which relies on the assumption that the error matrix E is not of i.i.d structure. This process essentially replaces the metric in (18) by
where WL and WR are suitably chosen non-singular weight matrices that can be used to handle errors in the rows and columns of F respectively . For our system of equations, we use a left-hand equilibration approach in which - instead of using F directly - we pursue finding a rank-deficient approximation of F after scaling its rows using weights related to the variations that happen in its elements. In this case we replace F in (17) by WLF, where WL is a diagonal matrix with ith diagonal entry . To find estimates for each of the , one possibility is to average all the variances of the entries of the ith row of E after simulating the segmentation errors that might have occurred for a given observed image. In practice, one can use the equations of the observed ellipses, together with an assumed error model of the noise that has caused the system to be full rank and simulate such error numerous times, thus generating many such error matrices E. Applying the SVD to the matrix WLF yields its null vector m, which is the singular vector corresponding to its least singular value. Again, m is then rescaled and its sign is determined as denoted above.
The estimated parameters that we get from the solution to the TLS system is essentially a general planar homography and due to the presence of noise in the observations, orthonormality of the rotation matrix of the pose in our case is not ensured. So, another SVD operation is performed on the first two columns of the rotation and a best approximate is found by multiplying the matrices of the left and right singular vectors by a matrix containing the first two columns of a 3 × 3 identity matrix. The third column of the rotation is then calculated as the cross product of those two columns.
For a final tuning of the pose - including also the translational element of the pose - retrieved in the previous steps, we solve a constrained optimization problem that essentially maintains orthonormality gained in the previous step and simultaneously finds an optimal pose for our problem.
where B1 is the matrix corresponding to the orthogonality condition of the first two columns of the rotation matrix and B2 and B3 are the matrices corresponding to enforcing each of the first and second columns to have unit norm. Eventually, the final pose estimate (consisting of both rotation and translation) still satisfies a near-zero right hand side for (17).
Simulation studies were conducted to examine the effect of several factors on the accuracy of the estimated pose.
Simulated planar sets of 3 ellipses were used (see Fig. 1(b)). In order to simulate ellipse segmentation errors, we uniformly sampled 50 points on each projected ellipse, added Gaussian noise to the sample points, then fitted a new ellipse to the points using the method in . The variance of the Gaussian was increased from 0.5 to 2 pixels squared in increments of 0.5. The focal length was set to 1 m and the pixel size used for our experiments was 0.44 mm for each of the u and v directions. For each error level, 700 experiments with 7 different poses were simulated; the mean and standard deviation of rotation and translation errors in the pose are shown in Fig. 2(a). In order to demonstrate the effect of weighting described in Section (2.3.1), we present results analogous to the ones in Fig. 2(a) that were obtained by running our code on the same datasets before modifying the algorithm to incorporate the weighting step. The results presented in Fig. 2(b) were attained by directly using the unweighted system (17). It is to be noted that in these experiments the rotation orthonormalization and the final optimization step were both performed as before.
In order to test the effect of the size of ellipses used for the fiducial, the exact same simulations were done using a smaller pattern, i.e. with ellipses whose semi-major and semi-minor axes are half of those in the pattern used to generate the results in Fig. 2(a). The relative positioning and orientation were the same. The results for these simulations (based on a total of 2800 experiments) are shown in Fig. 2(c) and again show degradation in accuracy as the level of segmentation error increases. These results show that for a given configuration, the larger we can make the pattern, the better for pose accuracy.
To assess the effect of miscalibration of the c-arm, we generated a realistic simulation for possible errors in the position of the focal spot. The magnitude of the error was increased from 0 to 15 mm in increments of 1 mm. During simulation, the fact that the uncertainty in the focal length is much larger than that in the image plane was taken into consideration and the direction of the in-plane drift of the principal point was uniformly sampled over all angles. For this experiment, segmentation noise has not been taken into account and therefore the system (17) has a unique solution without need for the weighting matrix WL. Experiments were done as previously described using the same set of three ellipses and the results are shown in Fig. 2(d). The algorithm proves to be robust to calibration errors upto 1 cm.
We proposed the use of a set of coplanar ellipses for c-arm pose estimation to assist in image-guided procedures. It has been shown in the literature on 3D reconstruction from 2D fluoroscopic images that reasonably accurate pose estimates for the c-arm (within a few degrees for the rotation and several mm for the translation) are sufficient for most clinical purposes. We showed using preliminary simulation studies that even with only three coplanar ellipses, we can attain a tracking accuracy within 0.5° and 0.9 mm for segmentation error variance up to 2 pixels squared and we are currently pursuing to apply this approach for precisely machined mechanical phantoms. Practical issues related to the specific fabrication parameters are still ongoing research issues. The presence of ellipses in the field of view must be considered and understood relative to the surgical procedure as well. In summary, this is a first step towards a simple clinically amiable tracker that may spare the need for proper positioning, mounting and yet provides a wide field of view for image capture in the OR. It provides proof of concept and encourages further investigation of this approach.
This work has been supported by NIH/NCI 2R44CA099374, and a National Science Foundation Graduate Research Fellowship.