|Home | About | Journals | Submit | Contact Us | Français|
We describe a fast mesh-based Monte Carlo (MC) photon migration algorithm for static and time-resolved imaging in 3D complex media. Compared with previous works using voxel-based media discretization, a mesh-based approach can be more accurate in modeling targets with curved boundaries or locally refined structures. We implement an efficient ray-tracing technique using Plücker Coordinates. The Barycentric coordinates computed from Plücker-formed ray-tracing enables us to use linear Lagrange basis functions to model both media properties and fluence distribution, leading to further improvement in accuracy. The Plücker-coordinate ray-polygon intersection test can be extended to hexahedral or high-order elements. Excellent agreement is found when comparing mesh-based MC with the analytical diffusion model and 3D voxel-based MC code in both homogeneous and heterogeneous cases. Realistic time-resolved imaging results are observed for a complex human brain anatomy using mesh-based MC. We also include multi-threading support in the software and will port it to a graphics processing unit platform in the near future.
In bio-optics applications, the Monte Carlo (MC) method is often used to model photon migration inside human tissue [1–5]. Because MC effectively solves the Radiative Transport Equation (RTE) via random sampling, it offers excellent accuracy when simulating photon propagation inside general complex media. In many cases, it was chosen as the gold standard when validating a new algorithm [6–9] or approximation [10–12]. MC has several additional merits such as being easy-to-program and that it’s straightforward to parallelize [13,14]. Compared with finite-element (FE)  diffusion solvers used in many diffuse optical imaging applications, MC produces more accurate solutions, especially when simulating low-scattering media where the diffusion approximation becomes invalid. In many other cases, MC attracts users by avoiding the complex pre-/post-processing steps (such as mesh-generation) required by FE or finite-volume (FV)  methods. In particular, the expense of generating high quality 3D meshes for complex media had been a non-trivial task due to the lack of appropriate software tools. Only in recent years, the development of image-based 3D mesh generation software [17,18] had started to make mesh generation from volumetric data practical and easily accessible for general data processing.
The major drawbacks of MC are the low computational efficiency and the lack of ability to accurately model curved boundaries . A traditional MC simulation can easily take several hours or even over a day to obtain a solution with the desired accuracy [11,20]. In the last couple of years, the fast development of multi-core and many-core processors, especially the graphics processing units (GPU), has opened new possibilities for highly efficient MC simulations by exploring massively parallel computing. As a technology preview, Alerstam et al.  had shown a 1000x acceleration using GPU in a simple homogeneous domain. More realistic GPU-accelerated MC code for 3D heterogeneous media and time-resolved imaging was recently reported by Fang and Boas , obtaining a better than 300x acceleration on a low-cost graphics card compared with single-threaded CPU-based MC simulations. With a modern graphics card, these GPU-based MC codes are capable of producing 3D solutions in less than a minute.
In addition to the dramatic acceleration in speed, significant effort has been made to improve the capability of modeling complex heterogeneous structures. The most widely used MC photon migration code developed by Wang and Jacques, MCML , can handle layered media with a circular symmetry. Boas et al.  and Fang et al.  had developed methods to use a voxelated space to represent arbitrarily complex media, showing great flexibilities in a range of applications [23,11,12]. However, to model regions with curved boundaries with these tools, one has to increase grid density which requires more memory and computation. Margallo-Balbás and French  as well as Ren et al.  have explored triangular-surface-based MC approaches which allow an improved approximation for domain interfaces. However, to test for ray-surface intersection, one has to scan a range of triangles based on a partitioning scheme, which introduces redundant calculations. In addition, the surface representation is unable to model continuously varying complex media. An algorithm that combines the strengths of MC and FE methods, i.e both accuracy for general media and flexibility for representing arbitrary domain shapes, is highly desired.
Inspired by the works from computer graphics , we have developed a mesh-based Monte Carlo method (referred to as MMCM hereafter) by making use of a fast ray-tracing algorithm with Plücker coordinate representation . With this approach, one cannot only model domains with smooth boundaries, but also simulate continuously varying random media, or meshes with complex or mixed element types (tetrahedral, hexahedral, etc). We implemented MMCM in C and multi-threaded programming and developed a software package , including the core MMCM simulation code, validation suite, mesh processing scripts and a toolbox to compute analytical diffusion solutions for a heterogeneous sphere. Combined with the open-source 3D mesh-generation toolbox developed previously , we expect this software to be an accurate and efficient option for MC simulations in biological tissues.
We also noticed an independent study and software (TIM-OS) recently published by Shen and Wang  exploring a similar idea, i.e using tetrahedral meshes for MC simulations. The key differences between TIM-OS and MMCM are 1) MMCM uses Plücker coordinates while TIM-OS uses a Cartesian representation, 2) MMCM allows linear Lagrange basis functions to represent optical properties and fluence continuum while TIM-OS only supports piece-wise constant basis, 3) MMCM can handle more complicated element shapes such as high-order tetrahedral or hexahedral elements, and 4) MMCM can perform both static and time-resolved simulations. Additionally, we provide rigorous validations using analytical solutions and voxel-based MC for both homogeneous and heterogeneous cases to show the accuracy improvement using the mesh-based approach.
In the remaining portion of the paper, we first present the basic ray-polygon intersection test using Plücker coordinates and describe the algorithm work-flow. Then we discuss extensions of the algorithm such as using linear basis functions, supporting complex mesh elements and parallelization. In the Results section, we validate the algorithm using a homogeneous domain with and without a spherical inclusion, and compare the results with analytical diffusion solutions as well as those from a voxel-based MC simulation; then we show an example of time-resolved simulation from a complex human brain atlas. In the last section, we summarize our findings and discuss further expansion of the algorithm, particularly the massively parallel version running on GPU processors.
A 3D ray (with a direction) can be uniquely determined by specifying two distinct 3D points, p = (x1,y1,z1) and q = (x2,y2,z2). As a result, the Cartesian form of a 3D ray can be typically written as
where t is a scalar. In the Plücker coordinates , a 3D ray, R, can be expressed by a vector-pair R(d:m) where d represents displacement and m represents moment, defined as
respectively. One of the attractive properties of the Plücker coordinates is its simplicity in a ray-polygon intersection test. In Fig. 1(a) , we show a diagram where a ray enters (R1), exits (R2) or misses (R3) a 3D triangle (ABC). We assume the edges of a 3D triangle (e1(Ue1:Ve1), e2(Ue2:Ve2) and e3(Ue3:Ve3)) are oriented in counter-clock-wise order. A ray r(Ur:Vr) can be tested for intersection with the triangle by computing the permuted inner products wi as
from which we can assert that
The test in (4) only involves several vector dot-products and tests for signs, thus, it can be very fast. As a by-product from the above test, the Barycentric coordinates , ui, at the intersection point are directly related to the permuted inner products by
and the coordinates of the intersection point can be computed by
For a tetrahedron that is known to intersect with a ray, the above test can quickly identify the indices of the faces where the ray enters and exits the tetrahedron; with a few additional floating-point operations, we can get the coordinates of the entry/exit points. Note that the above test is not limited to tetrahedrons. It can be easily extended to any convex polyhedral elements . A few examples of other commonly used polyhedral elements are shown in Fig. 1(b).
In MMCM, the heterogeneous media is represented by a 3D volumetric mesh, which can be either a uniform grid (with cubic or cuboid elements) such as the one for voxel-based MC, or an unstructured mesh with polyhedral elements (tetrahedra, hexahedra etc). For a given element in the mesh, the indices of the neighboring elements through each shared face are pre-computed as the “face-neighbor list” and stored in the input files. A source position vector (s0) and an incident directional vector (c0) are specified by the user.
After initializing the mesh information, including nodes, elements and face-neighbor list, an initial intersection test is performed for all elements to identify the initial element (e0) that encloses the source (e0 can also be pre-computed). Starting from e0, we simulate the photon propagation using the well-established MC simulation procedures [5,20,22]. Briefly, we produce a random scattering length and scattering (azimuth and zenith) angles based on the exponential distribution and Henyey-Greenstein phase function , respectively, and move each photon along the scattering trajectory recursively. For each element along the path, we perform the above ray-tracing test and calculate the distance (d) between the current photon position (p) to the exit point (px) of the enclosing element along the propagation vector (c). If d is greater than the remaining scattering length, we will move the photon to the end of the scattering path, and generate a new set of scattering length and angles and repeat the above process. If d is less than the remaining scattering path, we move the photon to the intersection point, px, and update the enclosing element ID (e) by looking up the face-neighbor list. Then we repeat the above ray-element test for the new element to further propagate the photon. Encountering an empty face-neighbor (denoted by a zero value) indicates a photon passing through a boundary face. We can either reflect the photon at the exiting face of the element based on the Snell's law, or terminate the photon and launch a new one from the source. The escaped photon energy is stored to an array associated with surface nodes. Because the domain boundary is represented by a surface mesh, MMCM can be more accurate than the boundary reflection scheme developed for the voxelated-space .
In order to compute the volumetric fluence inside the domain, we assign an initial photon weight of 1 for each simulated photon and attenuate this weight by its propagation distance using the Beer-Lambert law based on local absorption coefficients . For each ray-polygon test and the subsequent photon movement, we compute the weight loss of the photon and distribute it to the nodes of the interacting element. Because we have access to the Barycentric coordinates of the entry and exit points, we can use the more accurate linear basis functions to store energy deposit/fluence. In our current implementation, we calculate the Barycentric coordinates at the mid-point between p and px and accumulate the energy deposit to the nodes of the enclosing element. Because photon weight decays exponentially with distance, we also provide an option to set a maximum step-size along the path for the accumulation. This can potentially reduce the error due to non-linearity without noticeably increasing the computation time. When the optical properties are defined on the vertices rather than elements, we can also use the computed Barycentric coordinates to quickly interpolate the local optical properties at the current photon position based on the nodal values of the enclosing element. This enables us to simulate continuously varying media which was impossible with previous MC simulations. The final stored energy within the mesh is converted to volumetric fluence by dividing the Veronoi volume at each node ( where Tk is the k-th tetrahedron sharing the i-th node and is its volume) and the local absorption coefficient at each node.
MMCM supports time-resolved photon migration using a similar approach as in  and . Briefly, we create multiple copies of photon weight array at all nodes for each time gate of the simulation. For each photon, we keep track of the elapsed time of the propagation and accumulate the photon weight loss to the array of the matched time gate. MMCM can produce static solutions when specifying a single time gate with sufficient length. To produce the volumetric fluence distribution, we apply a normalization scheme as in  to get the temporal point-spread function, or TPSF.
The parallelization of MMCM is surprisingly straightforward. A simple multi-threaded implementation using OpenMP  is included in our software. Because the essential part of the calculation in MMCM involves inner and cross products of short vectors, for modern CPUs, this calculation is well suited for Streaming SIMD Extensions (SSE) instructions (inner product is supported in SSE4). Therefore in our implementation, vector operations are written in SSE assembly and can be enabled by compiling switches. These operations are expected to be significantly more efficient on graphics hardware . Further discussion about the massively parallel version of MMCM for the GPU platform using CUDA  and OpenCL  can be found in the last section.
A multi-threaded implementation of MMCM algorithm for multi-core CPU was developed and released under an open-source license . In the current version of the software, we provide support for tetrahedral elements. The extensions to more general elements or mixed types will be added in the future. Two multi-threaded random number generators (RNG) are included: a thread-safe 48bit linear congruential RNG and a Logistic-lattice RNG . In this software package, we also provide a Matlab toolbox to compute the analytical solution for a sphere inside a semi-infinite medium or infinite slab based on diffusion theory.
In this section, we first assess the accuracy improvement for MMCM by comparing its output with the analytical solution from diffusion theory and the output from a voxel-based MC software, Monte Carlo eXtreme (MCX) . We also explore memory utilization and simulation speed with respect to different mesh configurations. Lastly, we show a realistic simulation using complex meshes generated from a human brain atlas. All the computations were performed on a computer running Ubuntu 9.10 (64bit) with an Intel Xeon E5520 (2.3G) Quad-core processor. The voxel-based MC simulator, MCX, ran on the same computer with an nVidia GTX 470 GPU. For all the cases, the FE meshes were produced using our previously published 3D mesh generation software “iso2mesh”  where part of the meshing tasks were performed by the Computational Geometry Algorithms Library (CGAL)  and tetgen .
In the first example, we validate the algorithm using a homogeneous medium in a 60x60x60mm cubic domain. The configuration is similar to that in  and : the point source is located at (30.1,30.2,0) mm (the non-integer coordinates are used to avoid simulating photons near the edges/faces of this particular mesh) with an incident direction vector of (0,0,1); the medium has an absorption coefficient μa = 0.005/mm, scattering coefficient μs = 1/mm, refractive index n = 1 and anisotropy g = 0.01. We simulate 3x107 photons for a time window of 0 to 5 ns and a resolution of 0.1 ns. For MMCM, we generate a tetrahedral mesh by splitting each 2x2x2 cube in a 60x60x60 grid into 5 tetrahedra (i.e. a T5 mesh ). The resulting mesh contains 29791 nodes and 135000 elements.
The temporal response (TPSF) recorded at location (30,14,10) mm is shown in Fig. 2(a) . The continuous wave (CW) fluence profiles  along plane y = 30 mm, represented by contour lines with 10db spacing, are shown in Fig. 2(b). For comparisons, we also plot the analytical solution from the diffusion model  and the simulation results from MCX (running for 3x108 photons) in Fig. 2(a).
In the second example, we changed the background μa to 0.002/mm and embed a sphere centered at (30,30,30) mm with a radius of 10 mm inside the domain. The optical properties for the sphere are μa = 0.05/mm, μs = 5/mm, n = 1.37 and g = 0.9. The source position is set to (30,30,0) mm with an incident vector (0,0,1). We also set n = 1.37 in the background medium to ignore reflection. Two sets of meshes are generated for this case: 1) a high-density uniform mesh and 2) a mesh with higher density around the sphere boundary and source. The node and element numbers are summarized in Table 1 . The cross-cut views of the 2 meshes are shown in Figs. 3(a) and (c) . We simulate 3x108 photons with MCX using a 60x60x60 voxel grid, and 3x107 photons for MMCM using the above meshes. By interpolating the MC solutions to a grid using a piecewise-linear basis functions, the contour plots at plane y = 30 mm are shown in Figs. 3(b) and (d) for the two mesh configurations, respectively. Meanwhile, we compute the analytical solution for a sphere inside an infinite-slab extended from the diffusion models in  and overlap it in both figures. Because of the presence of the vertical boundaries, we only show the analytical solution near the source and inside the sphere for the comparison. The speed and memory utilities for MCX and MMCM using various number of CPU cores/threads are summarized in Table 1.
In the last example, we use MMCM to simulate photon migration in a 3D complex FE mesh generated from a segmented human brain atlas [38,22]. Using the “iso2mesh” toolbox, we first extract four triangular surfaces for scalp, cerebro-spinal fluid (CSF), gray matter and white matter, respectively. Then we apply 4 iterations of the Laplacian + HC smoothing algorithm  to each surface, from which we generate a 3D volumetric FE mesh. We configure the mesh algorithm to generate a denser mesh near the top of the head or close to the complex cortex surface. This helps greatly in reducing overall memory demands of the computation. The total number of nodes and elements of the mesh are 69865 and 425224, respectively. The brain tissue optical properties are summarized in Table 2 and are chosen to be identical to those used in . A point source is positioned at (75.7,67.0,168.2) mm pointing toward the center of the head.
In Figs. 4(a) and (b) , we show the 3D-cut and the sagittal-cut views of the generated mesh. Again, a total of 3x107 photons are simulated for a time-window of 0-3 ns. The CW and the animated time-resolved solutions (in log-scale) extracted at plane x = 76 mm are shown in Figs. 4 (c) and (d), respectively. The computation for mesh generation is under 2 minutes and the MMCM simulation time is 40.5 minutes using 4 CPU cores.
For simulations in the homogeneous medium, the results from MMCM match well with voxel-based MCX output in both temporal and spatial profiles. In Fig. 2(a), both MCX and MMCM solutions show lower fluence in the later temporal windows compared with the diffusion model. This artifact was caused by the domain truncation due to a finite boundary in the two simulations  and disappeared when using a larger volume (not shown). In the heterogeneous case, the MMCM solutions inside the sphere and near the source for both meshes match accurately with the analytical diffusion model. In comparison, the solution from the voxel-space simulation deviates from the analytical/MMCM solutions inside the sphere, confirming that an inaccurate geometric representation of the target may negatively impact the accuracy of the solution .
Comparing Figs. 3(b) and 3(d), we find that the two MMCM solutions using different meshes are quite similar. From Table 1, their run-times are also similar. This is expected because the computational expense in MMCM is roughly proportional to the mesh density multiplied by the fluence distribution, not to the absolute numbers of elements or nodes. However, the essential advantage of the second mesh is its memory efficiency. From Table 1, we can see that the total memory required by the second mesh is only 1/3 of the first one. This may have a significant implication when running simulations on a memory-limited device, such as a GPU. Although MCX does not need extra memory to store the basic mesh information, for each additional time-gate, it needs to duplicate the entire grid. By comparison, the memory cost for each additional time-gate for MMCM is very small. For a device with 1GB memory (a typical size for a high-end graphics card), the above difference can translate to over 12000 time-gates using the coarser MMCM mesh compared to only 1200 time-gates using MCX.
Comparing speeds with different CPU cores, we find that MMCM offers excellent parallel acceleration, which is almost proportional to the number of CPU cores. It is not surprising that MCX is a few hundred times faster than MMCM, as we are comparing parallel implementations of different scales and architectures. However, it is reasonable to believe that by porting the MMCM ray-tracing code to CUDA or OpenCL, we can expect to see a comparable acceleration. In addition to running simulations with many parallel threads, the built-in short-vector operations in GPU hardware are highly efficient; this may further improve the speed of MMCM. On the other hand, each ray-tracing step in MMCM requires accessing mesh data (such as node coordinates and elements) from global memory, which may introduce overhead. Fortunately, by running a large number of threads, this latency can be efficiently “hidden” . To further reduce memory access latency, we can also exploit the constant memory in the GPU and optimal ordering of the mesh nodes to improve data caching efficiency.
In the last example, the calculated CW and time-domain solutions generally agree with what we have observed in : photons travel a longer distance in the low-scattering, low-absorption CSF layer, while attenuating rapidly toward the center of the brain. This indicates that both iso2mesh and MMCM are capable of handling real-world complex heterogeneous media.
In summary, we have described a mesh-based Monte Carlo algorithm and developed a free software package that combines the accuracy of MC and the flexibility of FE method for 3D photon migration modeling. We validate this code for both homogeneous and heterogeneous media and show that the algorithm is capable of producing more accurate solutions when simulating objects with curved boundaries.
In the next stage of this research, we will work on a massively parallel implementation of MMCM algorithm using CUDA and OpenCL. This may lead to a dramatic acceleration in speed. We will also extend the pencil beam source model to support more general source and detector forms, such as Gaussian beam or plane illumination. Additionally, we will add supports for hexahedral elements and high-order elements and take further advantage of the flexibility enabled by the mesh-based representations.
QF would like to thank the funding support from Harvard Catalyst Pilot Grant. He also would like to thank David Boas for the helpful discussions on the analytical diffusion solutions of a sphere.