Near-infrared (NIR) light with wavelength between 600 and 1000 nm can penerate deep into the
tissue [

1], owning primarily to the relatively
low optical absorption of human tissue chromorphores, namely oxy/deoxy-hemoglobin, lipids and water.
This remarkable characteristic makes NIR light a suitable candidate to probe deep tissue physiology
such as angiogenesis and oxygen metabolism in a safe and non-invasive manner. Substantial efforts
have been made in the past decade to utilize NIR imaging to facilitate the investigation of breast
cancer and human brain functions, monitoring therapy responses, and a range of other applications
[

2]. A notable challenge for NIR imaging is
the high scattering within the tissue. The transport of low-energy NIR photons is highly nonlinear
and exhibits a complex and diffusive pattern. Accurate and efficient numerical methods play an
essential role in NIR-based techniques such as diffuse optical tomography (DOT) and fluorescence
molecular tomography (FMT).

The Monte Carlo (MC) method is a numerical technique offering both generality and accuracy in
modeling photon transport inside human tissue, and often serves as the gold standard in biophotonic
applications. After 20 years of evolution, biophotonic MC algorithms have become increasingly more
efficient and capable in handling complex tissue structures. One of the first MC codes, MCML
[

3], can only simulate problems within layered
infinite media. A more general approach was proposed by Boas

*et al.* for 3D
heterogeneous domains using vox-elated representations [

4,

5]. However, to achieve higher accuracy
near curved boundaries in a voxelated domain, one has to increase the global density of the grid
resulting in a significant increase in memory costs and execution time. Over the past few years,
several mesh-based approaches have been pursued to obtain better accuracy in modelling complex
tissue boundaries. In [

6], Li

*et
al.* reported a Mouse Optical Simulation Environment (MOSE) that allows one to define 3D
heterogeneous structures using geometric primitives. Margallo-Balbás

*et al.*
[

7] and Ren

*et al.*
[

8] have reported methods using triangular
surface meshes and ray-surface intersection calculations to propagate photons in a
piece-wise-constant domain. This approach achieves better accuracy near tissue boundaries, however,
the speed of the ray-surface intersection tests depends on the partitioning scheme of the surfaces
[

9] and the traversal of the data structure
often introduces significant computational overhead [

10].

In the past year, two new MC algorithms were proposed to perform photon transport simulations
using a 3D volumetric mesh. These algorithms include the Tetrahedron-based Inhomogeneous MC Optical
Simulator (TIM-OS) by Shen & Wang [

11] and the Mesh-based MC Method (MMCM) by Fang [

12]. It has been shown that both methods pursue a similar approach,
but differ slightly in the ray-tracing formulation used [

13]. Compared to surface-based approaches, the use of a tetrahedral mesh leads to
significant savings in the overhead of testing ray-triangle intersections. On average, only 2.5
ray-triangle intersection tests are required per step in MMCM and four for TIM-OS. Combined with the
recent advances in image-based mesh generation methods [

14],[

15], the development of
these new mesh-based techniques provides a robust path to fast and accurate modeling of photon
propagation inside complex anatomical structures.

Ray-tracing techniques are at the heart of both mesh-based MC techniques. It is natural to
hypothesize that by incorporating the latest ray-tracing methods developed by the computer graphics
community, one can further advance mesh-based MC simulations. In particular, several Single
Instruction Multiple Data (SIMD)-based methods have been proposed recently, including Ward’s
method [

10] and Shevtsov’s method
[

16]. Recently, Havel and Herout proposed a
hybrid approach (referred to as H&H hereafter) that effectively combines both methods with
an efficient Streaming SIMD Extensions (SSE) implementation [

17]. Moreover, in [

16],
Shevtsov

*et al.* described a branch-less ray-tracing strategy to take advantage of
the pipelining mechanism in modern CPU architectures. Incorporating these innovations into
biophotonic MC simulation is expected to improve overall speed. Our main interest in this report is
to discuss the implementation and comparison of four ray-tracing techniques: a Plücker-based
ray-tracer, a Badouel-based ray-tracer and their SSE counterparts, in a mesh-based MC
simulation.

The contribution of this work is to demonstrate the use of compteporary ray-tracing techniques
for efficient MC simulations on a modern CPU processor. We highlight the ray-tracing nature of
biophotonic MC simulations, and suggest that a thorough investigation to the ray-tracing technique
state-of-the-art can not only guide the performance improvement of many existing MC algorithms, but
also provide inspirations for the future generations of MC algorithms. In the remaining sections, we
first introduce the general structure of a mesh-based Monte Carlo photon simulation. We subsequently
examine the computational bottlenecks present in non-SSE implementations with a profiling analysis.
Using the knowledge learned from the profiling study, we improve the simulation performance by
incorporating SSE in the ray-tracing, random number generation and the calculations of math
functions. We then examine the improvement in simulation speed by re-profiling the SSE-enabled codes
and report the improved runtimes. We conclude with a brief discussion on limitations and future
prospects.