|Home | About | Journals | Submit | Contact Us | Français|
The bidomain equations are widely used for the simulation of electrical activity in cardiac tissue. They are especially important for accurately modelling extracellular stimulation, as evidenced by their prediction of virtual electrode polarization before experimental verification. However, solution of the equations is computationally expensive due to the fine spatial and temporal discretization needed. This limits the size and duration of the problem which can be modeled. Regardless of the specific form into which they are cast, the computational bottleneck becomes the repeated solution of a large, linear system. The purpose of this review is to give an overview of the equations, and the methods by which they have been solved. Of particular note are recent developments in multigrid methods, which have proven to be the most efficient.
The set of bidomain equations (Sepulveda et al., 1989) is currently the most complete mathematical models for describing the spread of cardiac electrical activity, especially for simulating activity on the organ level. They were first applied to cardiac tissue in 1978(Miller and Geselowitz, 1978; Tung, 1978). The system of non-linear partial differential equations models both the intracellular and extracellular domains of the cardiac tissue from an electrostatic point of view. The coupling of the two domains is performed via non-linear models describing the current flow through the cell membrane, which leaves one domain to enter the other. Recently, validation of the bidomain model through animal experiments (Wikswo Jr et al., 1995; Muzikant and Henriquez, 1998) as well as its ability to reproduce cardiac phenomena (see (Lin and Wikswo, 2001; Roth, 2001; Trayanova et al., 1998; Wikswo Jr, 1994; Trayanova, 2006)) makes the bidomain model a strong candidate for simulating cardiac electrical behavior. In fact, the bidomain equations predicted the existence of virtual electrodesSepulveda et al. (1989), which were later experimentally validatedWikswo Jr et al. (1995), and have become instrumental in understanding defibrillation(Trayanova, 2006). See Fig. 1 for an example of the complex polarization patterns. When bath loading effects, extracellular stimulation, or the magnetic field is to be modeled(Weber dos Santos et al., 2003; Weber dos Santos and Dickstein, 2003) accurately, it is the only choice.
Several factors contribute to make the bidomain simulations inherently computationally burdensome. Spatially, a fine discretization is needed to properly capture field effects. When applying an extracellular shock in tissue, its directly elicited response decays with a space factor on the order of hundreds of micrometers. Thus, to properly capture this interaction, nodes should be spaced on the order of 100 micrometers. Microstructure is proving to be an important factor in the defibrillation process(Plank et al., 2005). With the availability of MRI data and progress in mesh generation techniques, more geometrically accurate and detailed representations of cardiac tissue are being produced containing many more nodes than previously (Burton et al., 2006). Temporally, phenomena of interest occur on the scale of hundreds of milliseconds if not seconds. Gating kinetics of channels in the membrane occur on the time scale of microseconds, and smaller under defibrillation strength shocks. Therefore, tens of thousands of time steps need to be executed. Lastly, ionic membrane models continue to increase in complexity. Since Hodgkin and Huxley developed their three state variable model in 1952(Hodgkin and Huxley, 1952), newer models have upwards of 20 state variables(Priebe and Beuckelmann, 1998; Hund and Rudy, 2004; Iyer et al., 2004) which use fluid compartment models to keep track of intracellular ion concentrations in addition to ion transport across the membrane.
This article will present the equations and discuss methods with which they may be solved to perform large scale cardiac simulations.
The bidomain equations(Plonsey, 1988) represent a homogenization of cardiac tissue, replacing the individually coupled cells with a syncytium. Each point in space is considered to exist in two domains, the intra- and extracellular, which are superimposed on one another (see Fig. 2). Any discrete objects get averaged over the volume. For example, the discrete junctional resistances, along with the myoplasmic conductivity, get combined and smeared over a volume of interest to create an equivalent conductivity. The equations relate the intracellular potential, i, to the extracellular potential, e, through the transmembrane current density, Im:
where σ̅i and σ̅e are respectively the intracellular and extracellular conductivity tensors, β is the surface to volume ratio of the cardiac cells, Itrans is the transmembrane current density stimulus as delivered by the intracellular electrode, Ie is an extracellular current density stimulus, Cm is the membrane capacitance per unit area, Vm is the transmembrane voltage which is defined as i–e. Iion is the current density flowing through the membrane ionic channels and depends on the transmembrane voltage and several other variables that here we represent by v. This formulation will be referred to as the coupled set of equations since the intracellular and extracellular potentials are solved simultaneously. The media by themselves are linear, with nonlinearities arising through the current-voltage relationship across the membrane (Eqn. 3) which is described by a set of nonlinear Ordinary Differential Equations (ODEs).
No intracellular current is assumed to flow across the tissue boundary. This results in the boundary condition
where η is a vector normal to the boundary.
Cardiac tissue is often surrounded by a conductive bath, whether it be the blood cavities, the torso volume conductor, or the perfusion fluid for a wedge preparation. This volume conductor is often modeled as isotropic with conductivity σo. The electric potential o : satisfies . It is assumed that the interstitial fluid and bath are continuous (Krassowska and Neu, 1994). Thus, the extracellular potential is continuous across the tissue bath interface, as is the current:
Hence, it is simplest to treat the bath as an extension of the interstitial fluid.
The intracellular domain exists only where there is cardiac tissue, but the extracellular domain may also incorporate a surrounding bath. At the edge of the extracellular space, a no flux boundary condition is also enforced:
This is advantageous since Vm is the quantity of chief interest, controlling membrane behavior as well as being directly observable from the single cell level to the tissue level through optical mapping(Efimov et al., 2004). It is this latter casting of the bidomain equations which are more popular for solution. Other linear combinations combinations of the basic bidomain equations are possible and discussed in Ref. (Hooke et al., 1994).
At this point, the bidomain equations may be considered as a coupled set of elliptic Partial Differential Equations (PDEs), parabolic PDEs, and ODEs. Due to the highly nonlinear nature of the ODEs, fully implicit solutions are extremely difficult. A sophisticated approach is found in Ref. (Pennacchio, 2004) where an adaptive space-time grid employing mortar finite elements was used. However, the ionic model implemented was not the most elaborate. These sets of ODEs describing cellular membrane dynamics are becoming more complicated as advances are made in the identification and characterization of ionic channels. To describe the cellular membrane response at one node may require over one hundred equations with more than sixty coupled differential equations(Iyer et al., 2004).
Operator splitting(Strikwerda, 1989) is a technique used to compute a large expression as a sequence of the simpler expressions which comprise it. This allows different numerical methods to be used at each step in the sequence to maximize computational efficiency, as well as to eliminate complex dependencies between variables. The disadvantage is a loss in accuracy since the dependencies between variables are not simultaneously enforced. Operator splitting is usually performed to separate the large non-linear system of ordinary differential equations, thus allowing the use of semi-implicit or fully implicit methods for the remaining system of Partial Differential Equations.
Explicit methods, also referred to as Euler forward, are numerical updating formula which compute the next value of a variable without a dependency on the updated value of the variable. The chief advantage is that no equations need to be solved to update the variable, but these methods usually suffer from stability conditions that enforce the use of small timesteps. In contrast, implicit methods compute the next value of a variable as a function relying on the future value, necessitating an equation to solve. These schemes are unconditionally stable for the case of the Partial Differential Equations that model cardiac activity. When discretization involves both explicit and implicit evaluations the resulting scheme is referred to as semi-implicit.
Both explicit(Vigmond et al., 2002; Potse et al., 2006) and implicit(Sundnes et al., 2001) methods have been applied for the time discretization of the non-linear system of ODEs. Explicit methods are more easily implemented but may require small time steps due to the stiffness of the ODE system. Nevertheless, the using of lookup tables to reduce calculation of mathematical functions can drastically reduce computational effort, with the result that this portion takes no more than 10% of the total computational effort(Vigmond et al., 2002; Potse et al., 2006). Furthermore, the set of ODEs is local to a node so the their integration over the entire domain is embarrassingly parallel.
Further operator splitting may be employed to ease the cost of calculations. For example, instead of solving Equations 8 and 9 simultaneously, they can be solved sequentially as an elliptic problem (Eqn. 8) followed by a parabolic problem (Eqn. 9) by treating Vm as fixed in Eqn. 8 and e as fixed in Eqn. 9(Vigmond et al., 2003). Theoretically, this scheme may lead to results deviating from those obtained by solving the coupled system. As suggested in (Franzone and Guerri, 1993; Pennacchio and Simoncini, 2002), an outer iteration, whereby the sequential sequence of split operations is repeated until the solution converges, has to be employed in order to arrive at the exact solution of the coupled system. In this case, solving the coupled system turned out to be computationally more efficient than the decoupling approach. Typically, less than 10 iterations are required to converge and the overall workload is domianated by the elliptic portion of the inner problem. In most studies (Skouibine et al., 2000; Vigmond et al., 2002; Weber dos Santos et al., 2004a; Plank et al., 2007), however, the decoupled components are considered as independent and no outer iteration is carried out. To which extent this may lead to deviating results, and how relevant these deviations are from an electrophysiological point of view remains to be investigated. Other ways to split the equations may be found elsewhere(Keener and Sneyd, 1998; Sundnes et al., 2002).
To solve the parabolic portion, regardless of whether it has been split from the elliptic portion, one may choose different types of integration schemes: implicit, semi-implicit, or explicit. Skouibine et al. (2000) used a predictor-corrector scheme to improve accuracy. It should be noted that employing a forward-Euler scheme for the system written as in Eqn. 1 and 2, with operator splitting to solve each equation independently, can be problematic. Since the transmembrane voltage, Vm, is the quantity of prime importance which determines membrane behavior, deriving it by differencing potentials may be inaccurate. For example, during a shock, i and e may be on the order of volts and their difference, Vm, may on the order of millivolts. In addition, it is interesting to note that if the parabolic equation is solved with an explicit method such as the forward-Euler method, the CFL condition(Courant et al., 1928), which dictates the maximum stable time step as a function of spatial discretization for an Euler forward scheme, severely restricts Δt. An approximation of the CFL condition can be derived by assuming equal anisotropy ratios (σi = ασe) and straight fibers:
where σ* = σi* σe*/(σi* + σe*) , (* = l, t). As an example of the imposed restriction, in (Weber dos Santos et al., 2004b) it was numerically verified that, for a problem with Δx = 3.3μm, using Δt = 0.07μs the forward-Euler scheme did not converge. For the same problem a semi-implicit based scheme allowed Δt to be more than 100 times greater than one restricted by an explicitly based method.
Next we present the time discretization for the bidomain equations using a three-step operator splitting scheme that involves the solution of the parabolic PDE (Eqn. 9), the elliptic system (Eqn. 8), and the non-linear system of ordinary differential equations (Eqn. 3) at each time step. Since the CFL condition of the parabolic PDE may severely restrict the time step, we discretize this equation via the Crank-Nicholson method (Eqn. 11), whereas the large non-linear ODE system is discretized via a forward-Euler scheme (Eqn. 13):
where Ai, Ae and Ao are the and σoΔ operators; Δt is the time step; , , and νk are time discretizations of Vm, e, o and v, respectively, for time equal to kΔt, with 0 ≤ k ≤ tf/Δt. A von Neumann analysis of the linearized system shows that the above scheme is unconditionally stable.
Equations (11)-(13) are time discretizations for the bidomain equations, however, spatial discretization is further needed to allow for numerical computation. Many different approaches are possible. Finite difference has been popular(Potse et al., 2006) for solving the bidomain equations. It is usually applied on a regularly spaced grid which hampers its ability to follow complex surfaces. It can handle anisotropy(Saleheen and Ng, 1998b), and has been expanded to handle irregular spacing(Zozor et al., 2003; Trew et al., 2005b), although this latter approach eliminates one of the main advantages of finite difference, its ease of implementation. Finite elements have also been widely used(Fischer et al., 2000; Plank et al., 2005; Trayanova, 2006) with the advantage that they can accurately follow the complex, curved geometries of the heart. The theory behind the method is well developed, and many meshing applications are available. Finally, some researchers have explored the finite volume method(Jacquemet and Henriquez, 2005; Trew et al., 2005a) which has the appeal of conserving local flux.
Unless Dirichlet conditions are imposed somewhere, the bidomain equations only solve the potentials within an additive constant. This constant is not important for system behaviour since the transmembrane voltage is the difference of two potentials which eliminates the constant, and current flow depends on the gradient of the potential. However, the system matrix is singular, i.e., it is no longer a set of independent equations, which can cause certain solution methods to mathematically fail. To generate a unique solution, several approaches may be used. A point in the extracellular region may simply be defined as the ground from which all potentials are referenced(Vigmond et al., 2002). Alternatively, an extra condition on the solution may be imposed, requiring a zero average potential over the whole region, Ω(Colli-Franzone and Pavarino, 2004; Austin et al., 2006):
Finally, certain iterative solvers, like Bi-Conjugate Gradient Stabilized(van der Vorst, 1992), are capable of finding solutions to positive semi-definite systems. In this case, no additional conditions need to be applied and the system evolves to its own constant(Austin et al., 2006; Potse et al., 2006).
The time discretization and implementation of the boundary conditions deserve some discussion. When numerically implementing a boundary condition to a differential equation it is important: 1) to use a numerical implementation that is a good approximation of the boundary condition; 2) to make sure the implemented boundary condition satisfies the equation it is related to, i.e., that the boundary condition is compatible with the associated differential equation. Next, we briefly discuss these issues concerning the bidomain discretized equations. Without loss of generality, we initially focus on Equation (8) and the associated boundary condition (7). The implementation of the other boundary conditions formulated by Equations (4), (5) and (6) will be discussed next.
We rewrite the time discretization proposed in (13) as follows
We are initially looking for a good discretization for the boundary condition
Let us write this boundary condition in a more general form, which soon will show to be more appropriate
Equation (18) should be equivalent to (17), and therefore g should be chosen appropriately. We will next look for the variable g that satisfies the compatibility condition between Equation (16) and boundary condition (18), and implements a good approximation of (17). By integrating the left hand side of this equation in the boundary of the domain and casting the divergence theorem we find
Now we would like this to be compatible with equation (16), so that we can write
Casting again the divergence theorem and using (18) we find the compatibility condition
Choosing clearly satisfies the compatibility condition. In addition, the boundary condition
is a good approximation of (17). Making and assuming the condition (4) holds, equation (22) results in . The same methodology can be used to derive the following implementations of the boundary condition (4), which is compatible to (11):
The bath-tissue interface (Γto) conditions and to the homogeneous Neumann condition for o can be derived in a similar fashion:
The time discretization described by Equations (11)-(13) coupled to the boundary conditions implemented by Equations (22)-(26) comprise a consistent and stable method. A numerical method converges only if it is both consistent and stable(Strikwerda, 1989). The implementation of consistent and compatible boundary conditions is therefore an important and delicate issue. Explicit schemes (see (Latimer and Roth, 1998; Skouibine and Krassowska, 2000; Saleheen and Ng, 1998a)) have been implemented with boundary conditions on Ω, applied to (11) and (13), respectively. The compatible condition (21) becomes and thus does not always hold. The violation of (21) may undermine numerical convergence. Such violation may not be critical for explicit schemes, since the CFL condition restricts Δt to very small values. In (Sundnes et al., 2001), the following conditions were applied to (11) and (13): on Ω, which satisfy the compatibility condition (21). However, these boundary conditions are equivalent to the homogeneous Neumann boundary conditions only if , i.e., when intra- and extracellular domains have equal anisotropy ratios. Since in the case of cardiac tissue, intra- and extracellular domains do not have equal anisotropy ratios, these boundary conditions can be seen as poor approximations and do not properly implement electric isolation.
Regardless of the exact formulation used, a linear system of the form A = b is generated which must be evaluated at each time step. The matrix A will be sparse and its dimensions will correspond to the number of nodes in the model. If no operator splitting is performed and the parabolic and elliptic problem remain coupled, the matrix will be square with the dimensions of twice the number of nodes composing tissue plus the number of nodes in the surrounding bath. The volume of the bath may be much greater than that of the myocardium, but with adaptive meshing, the number of bath nodes may be kept to a minimum. See Fig. 3 for an example. If operator splitting is performed, two different linear systems will be need to be solved: one comes from the spatial discretization of the parabolic equation (11) and has the dimension of the number of nodes in the tissue; another comes from the coupled elliptic equation (13) and has the dimension of the number of nodes in the tissue + bath. Thus, a large reduction in the matrix size is possible through operator splitting. To date, simulations of up to 36 million nodes have been performed on supercomputers(Potse et al., 2006).
Direct solvers are methods which arrive at a solution to a linear system by direct matrix manipulation and result in an exact solution. Examples of these include Gaussian elimination, and decomposition techniques whereby a matrix is factored into the product of several matrices with special forms, making it possible to solve the system. One common example is Lower Upper (LU) decomposition where a matrix is decomposed into the product of a lower triangular matrix and an upper triangular matrix. The system can then be solved by a forward substitution followed by a backward substitution. While the initial cost of decomposing a matrix is large, it only needs to be done once with the cost amortized over the entire simulation of some tens of thousands of time steps.
Direct solvers have been shown to be computationally the fastest to date(Weber dos Santos et al., 2004a; Plank et al., 2007). The matrices generated can be symmetric or asymmetric depending on the exact numerical procedure used to generate the matrix, whether it is finite difference, finite element, or finite volume. For symmetric matrices, Cholesky decomposition, a form of LU decomposition with the lower triangular matrix the transpose of the upper triangular matrix, or an LT DL decomposition, like Cholesky but D is an additional diagonal matrix, may be used. For asymmetric matrices, LU decomposition is in order. The memory requirements of direct solvers are considerable and depend on the bandwidth of the system, the distance from the main diagonal to the furthest nonzero entry in a row. A decomposed matrix retains the bandwidth of the original matrix, but nonzero entries within the band may be filled in. The factor by which the decomposed matrix exceeds the original matrix increases with problem size, being less than 10 for problems less than 105 nodes, and reaching 50 for systems on the order of a few million. Using this example and assuming double precision, a matrix will consume about 200 megabytes while its LU decomposition will consume 11 gigabytes. Such large storage requirements become an issue, especially as geometrical models now have many more nodes. This memory increase also has performance ramifications since the number of operations increases with the number of entries per row. This increase in memory is due to an increase in bandwidth which is determined by the connectivity of the nodes. Node renumbering algorithms, like Reverse Cuthill-McKee(E. Cuthill and J. McKee, 1969) or nested dissection(A. George, 1973), can be used to minimize bandwidth. Since the matrices are sparse, solution time tends to increase linearly with problem size. It should be noted that the size of the largest bidomain problems solved by direct methods have not been larger than a few million nodes. Thus, it is still not known if direct methods are the fastest for larger matrices, as the increase in bandwidth fill-in factor with size will degrade performance.
Iterative methods start with an approximate solution and attempt to reduce the error by some algorithm. The algorithm is repeatedly applied. A criterion to stop iterating is needed, and can be the residual, the change in the residual between iterations, the change in the norm of the residual, or some other measure. In Ref. (Vigmond et al., 2002), a stopping criterion of a 10−6 relative change in the L2 norm, as normalized by the number of unknowns, was necessary to get results which were indistinguishable from the direct solution method.
The conjugate gradient method provided a great improvement over other iterative methods in use at the time of its development. It recast the original problem of finding the solution of a linear system into an optimization problem with an identical solution. Further increases in performance were achieved with preconditioning, an additional step in the algorithm which performed a transformation to make the optimization surface more parabolic. This Preconditioned Conjugate Gradient (PCG) has quickly become the standard technique for solving very large systems(Shewchuk, 1994). Its efficiency and robustness, however, vary greatly with the preconditioner used, as does the memory usage. The preconditioner can be thought of as attempting to arrive at an approximate solution of the system with as little work as possible. Generally, more expensive preconditioners result in fewer iterations to achieve a desired accuracy, albeit at higher cost per iteration. Thus, much research has gone into developing a preconditioner that is optimal for cardiac simulations. Early work has shown that diagonal preconditioning results in a three-fold improvement in speed over no preconditioner(Potse et al., 2006). Such a preconditioner is trivial to implement with insignificant memory costs, and has been used for cardiac simulations(Eason and Malkin, 2000; Skouibine and Krassowska, 2000).
A more sophisticated preconditioner which has found much use is Incomplete LU decomposition (ILU). The original matrix is decomposed as in a direct method into upper and lower triangular matrices, but only parts of the decomposed matrices are retained, with entry retention based upon the entry magnitude. The product of the lower and upper matrices no longer recreates the original matrix exactly. At one extreme, the complete LU decomposition is kept and the system is solved in one iteration, albeit a costly one. At the other extreme, the sparsity pattern of the original matrix is preserved with all other entries ignored. Using ILU with this zero fill-in results in further memory savings since the data structures describing the original sparse matrix can also be used for the ILU decomposition. The exact amount of fill-in to allow depends upon problem size and execution environment. More fill-in better represents the true decomposed matrix, so the number of iterations should decrease as more entries are retained.
For 2D problems and fewer numbers of processors, more fill-in did, indeed, lead to faster convergence. However, using block-Jacobi domain decomposition(Weber dos Santos et al., 2004a), no fill-in was faster than any fill-in when simulations were performed in parallel over 16 processors in a distributed memory environment. Block-Jacobi decomposition refers to dividing the matrix amongst several processors, with each processor independently solving only for that portion of the matrix, ignoring entries not found on the processor. For example, if a matrix of size 10,000 was divided between four processors, the second processor would solve the submatrix with rows 25001–50000 and columns in the same range, thereby only solving for those same entries in the solution vector. Due to the loss of interaction between the four submatrices, the solution vector only approximates the true solution. With more fill-in, the time for each PCG iteration increased to perform a denser matrix-vector multiply, and the overall quality of the preconditioner decreased with parallelization since elements in off-diagonal blocks are ignored with Block Jacobi preconditioning, yielding an overall decrease in speed. Using ILU, Potse recently reported bidomain simulations with 55 million extracellular nodes, and observed that ILU preconditioning was about twice as fast as diagonal preconditioning(Potse et al., 2006).
Parallel performance is also an important aspect to consider. Distributed memory machines have considerable communication overhead, but are relatively cheap compared to shared memory machines. Preconditioned Conjugate Gradient with ILU preconditioning (PCG-ILU) has been implemented on both types of platforms. With a shared memory machine, PCG-ILU has displayed scaling of close to 80% efficiency with 96 machines(Potse et al., 2006). With a distributed memory cluster, a speedup of only 5.2 was achieved with 16 processors(Weber dos Santos et al., 2004a). The poor scalability of the latter results are due to several factors including solving a problem that is not large enough, such that communication and synchronization times are significant compared to computation times and that no low latency interconnect was available which causes a significant communication burden. On a distributed memory machine with a high speed, low latency network, the parallel performance was much improved, showing a speed improvement of 3.8-fold as the number of processors was increased from 16 to 64(Plank et al., 2007).
As mentioned above, using a block decomposition approach, the ILU becomes a worse approximation as entries in off diagonal blocks are ignored. To avoid this kind of degradation, overlapping domain decomposition techniques, such as the Additive Schwartz technique (ASM), may be considered. In ASM, each processor block overlaps to the neighboring domain block by a certain amount, ovl. A greater ovl means more communication is necessary between the processors. In (Weber dos Santos et al., 2004b), an ILU was performed over each processor block, and both overlapping and non-overlapping preconditioners were compared. It was shown that the ASM preconditioner achieved better performance results than the non-overlapping ILU. ASM was 1.5 times faster than ILU but required 55% more memory than ILU on 16 processors.
Symmetric Successive Over-Relaxation (SSOR) has been used as preconditioner for the bidomain equations. Essentially, SSOR is the Gauss-Siedel iterative method with a parameter to overweight the correction term to achieve faster convergence(Press et al., 1994). As a preconditioner, it is applied a set number of times instead of iterating until convergence is reached, and achieved similar performance to ILU(Weber dos Santos, 2002). It was shown to provide a speedup by at least a factor of two over diagonal preconditioning.
Multigrid methods (MGM) are a class of solution method based upon several discretizations of the same problem domain. The basic idea behind multilevel methods relies on the fact that simple iterative methods are very efficient at reducing high frequency components of the residual, but are very inefficient with respect to the lower frequency components. In other words, simple iterative methods remove abrupt spatial changes in the solution with a few iterations, but require many iterations to change the baseline level of the solution. The multilevel solution to this problem is to project the residual onto a smaller space, a coarser grid of the problem, where the lower spatial frequency components can be handled more efficiently. That is, a solution or residual is obtained on a particular grid and then interpolated or extrapolated to a grid of lower or higher resolution, respectively. Usually, iterative methods are applied at all grid levels but the coarsest, which is solved directly. Many different schemes exist for the pattern in which the different grids are solved. The basic pattern is a V-cycle where one starts with the finest level, works to the coarsest, and then works back to the finest. In MGM terminology, an iteration at a grid level is a smoothing process, transferring values from a finer to a coarser grid is a restriction operation, while the opposite procedure is a prolongation operation. The highest frequencies are smoothed on the fine levels, while the lowest frequencies are smoothed on the coarsest levels.
MGM have been applied directly to the bidomain equations(Keener and Bogar, 1998; Sundnes et al., 2001; Austin et al., 2006; Johnson et al., 2003) and good performance obtained. However, other researchers have used MGM as preconditioners for PCG(Tatebe and Oyanagi, 1994; Anwander et al., 2002; Johnson et al., 2003), hoping to combine the robustness of PCG with the speed of MGM(Press et al., 1994). Holst et al. could not get convergence using MGM directly, but did get convergence using it as a preconditioner(Holst and Saied, 1995) Results seem to be preconditioner and problem dependent as Austin et al. reported faster execution using one variant of MGM, called Blackbox MG, directly on a problem with highly discontinuous conductivity(Austin et al., 2006). They found, like Johnson(Johnson et al., 2003), better performance using other MGM approaches as preconditioners, rather than directly. Regardless of whether used as a preconditioner or not, MG methods have consistently outperformed ILU by severalfold(Weber dos Santos et al., 2004a), and have the potential to outperform direct methods(Kaltenbacher et al., 2000).
Geometric multigrid (GMG) is the conventional MGM which relies upon the user explicitly creating different resolution meshes of the same geometry. For structured meshes, this is straight forward, achievable by simple downsampling of nodes. For unstructured meshes, the work may be considerable. Unstructured meshes are superior for representing the boundary of organs due to their complexity. Using hexahedra, for example, as is common for structured grids, when discretizing curved volumes leads to spurious excitations at the artificial corners when a large electric field is applied.
Figure 4 shows one V-cycle of a GMG iteration as described in (Weber dos Santos et al., 2004a). Only on the coarsest level is the system solved by a direct method. At all other levels, only one smoothing pass is performed by applying one iteration of PCG using an ILU preconditioner.
Part of maximizing performance for GMG is determining the number of grids, i.e., the number of levels to generate. On a single processor, two levels was the best choice since this was the closest the preconditioner came to the complete LU factorization method. For a distributed memory system, the optimal number of levels depends on the number of processors. In parallel, as the coarsest grid was solved sequentially on every processor, the cost of fewer grid levels rivals the gains of parallelism. Therefore, as the number of processors increases, the optimal number of levels also slowly increases. As suggested in (Weber dos Santos et al., 2004a), this problem could be possibly circumvented by employing a parallel direct solver (Li and Demmel, 2003) at the coarsest grid, the additional communication burden imposed by a parallel direct method, however, has to be dealt with by using a low latency interconnect like InfiniBand. Preliminary tests demonstrated that these methods do not scale at all when standard gigabit ethernet connects are employed.
Memory usage is another limitation on the number of grid levels, since it is desirable that the smallest grid is solvable by a direct method. Given that grid resolution between different levels cannot change drastically, there is a minimum number of levels which must be used. The optimal number of levels for a 3D problem of one million nodes, without surrounding bath, on 16 processors using the scheme as described previously, was five(Weber dos Santos et al., 2004a). Memory demands for GMG were found to be much less than a direct solver, but about 1.4 times greater than an ILU preconditioner.
Weber Dos Santos et al. reported an increase in speed by factors between two and three using PCG-GMG over PCG-ILU, depending on the test case simulated(Weber dos Santos et al., 2004a). Sundnes et al. reported a significant reduction in iterations using MGM over ILU but did not report times(Sundnes et al., 2002). Again, the parallel execution environment is crucial. Using a distributed memory cluster, the MGM preconditioner did not suffer the same penalty from the domain decomposition as the ILU preconditioner did: the total number of iterations increased only slightly due to the parallel implementation. Nevertheless, the speedup results were poor. The explanation lies in the sequential solution method applied on the coarsest grid level since the cost of this part of the preconditioner was not reduced by increasing the number of processors, and thus limited the total parallel speedup of the preconditioner. For a small number of processors, the speedup was still reasonable, around three times faster on four processors, while using 16 processors only resulted in a fivefold increase in speed. It would be expected that parallelization on a shared memory machine or a cluster equipped with a low latency interconnect would show better scalability.
Algebraic Multigrid methods (AMG) are advantageous compared to GMG. The user does not have to explicitly create different resolution meshes. Instead, by examining the matrix structure, coarser representations of the matrix are produced, as well as the prolongation and restriction operators. This is especially important with unstructured grids. Such grids are necessary to represent the curved surface of the heart under defibrillation shocks. With jagged corners, as result from regular meshes, spurious corner excitations are introduced. Solution proceeds the same as with GMG once the grid hierarchy has been created.
Two AMG implementations have been applied to the bidomain equations: Pebbles(Haase et al., 2002; Haase and Reitzinger, 2005), and Boomer(Chartier et al., 2003; Brezina et al., 2005). Each implementation varies in the way that it constructs the coarser matrices. Boomer uses the parallel Falgout-coarsening strategy which is a combination of the classical Ruge-Stüben coarsening(Ruge and Stüben, 1986) and CLJP coarsening(Brezina et al., 2005). First, each processor domain is coarsened using the classical algorithm. Then, the coarse grid points in the interior of each processor domain are taken as the first independent set for the CLJP algorithm which proceeds until all points have been assigned. This way, the interior of a domain is coarsened as in the classical method, while the boundaries are coarsened CLJP-like. Therefore, Boomer is designed for large numbers of processors and its coarse grids are redistributed over the processors if necessary. Pebbles, on the other hand, uses a rather simple and fast coarsening strategy, either classical coarsening(Ruge and Stüben, 1986) or a coarsening on auxiliary matrices(Haase et al., 2001). This setup is tailored for a restricted range of problems which include potential problems. In contrast to Boomer, the parallel coarsening will start on the domain boundaries followed by the coarsening of the interior. The solver part of Pebbles is algorithmically, highly optimized with respect to fast smoothers and multiple right hand sides(Haase and Reitzinger, 2005).
Table 1 shows results for a reentry simulation of 111,000 nodes in a slice of rabbit ventricle run on a single processor.
The results of the two AMG methods are shown along with PCG-ILU as a reference and a direct solver (SuperLU). Boomer was also applied directly to the problem, i.e., not as a preconditioner. The AMG methods were about 7.7 to 14.3 times faster than ILU when used as preconditioners. Boomer used directly took about the same time as when used with PCG. The direct solver was the fastest although by less than a factor of two over Pebbles. Although Pebbles was faster than Boomer, there are many parameters to tune for each of the methods. A parameter search was not performed so the relative performance of the tho AMG methods cannot be judged by this test alone, although they are clearly superior to PCG-ILU. Generally, it is thought that GMG performs better than AMG, although by comparing the results from the previous section, AMG can be seen to be comparable to GMG. An exact comparison between studies is difficult due to implementation differences, and hardware differences like the faster interconnect used in the AMG study.
With respect to memory, PCG-ILU consumed the least while the direct method required about thirty times more than PCG-ILU. The AMG methods required about three times more than PCG-ILU, since they perform PCG-ILU on the original grid and the coarser grids.
Since Boomer was designed with parallelizability in mind, it was further investigated for its scalability when running on multiple processors. Pebbles, on the other hand, was designed with a sequential algorithm in mind. Using a cluster with a high speed, low latency interconnects(Plank et al., 2007), Boomer showed good scalability, speeding up by a factor of 3 as the the number of processors was quadrupled from four to 16, and increasing in speed by over a factor of two as the number of processors was further quadrupled from 16 to 64. Such less than optimal performance in the going to 64 processors can be explained by solving a problem that is too small for the resources. The number of iterations required by the method did not change as the number of processors was increased, indicative that it was not the algorithm which slowed down with more processes, but that interprocess communication began to play a significant role. Thus, as noted before, distributed memory machines suffer from the overhead associated with communication when the workload assigned to each node drops.
The bidomain equations have been used to simulate electrical activity in cardiac tissue, especially with regard to predicting virtual electrode polarization. Their solution is computationally expensive, with the most burdensome part resulting from a linear system which must be repeatedly solved. Currently, multigrid methods offer the fastest solution time, as well as having only modest memory requirements. The multigrid methods may be used standalone or as a preconditioner for the conjugate gradient method. Their performance is now close to that of a direct decomposition, and may become faster with further development. Algebraic multigrid has the additional advantage of not requiring the user to explicitly create coarser meshes which can be quite a difficult task. Implementation of these methods, especially in a distributed memory environment, is quite complex. Use of preexisting libraries is recommended unless the user wishes to go beyond simply tuning performance parameters, and wants to alter the underlying algorithm. While not as efficicient as multigrid preconditioners, PCG-ILU offers a reasonable alternative. It is often available in vendor numerical libraries, or can be implemented with relatively little effort if not avaialable. Quick solution methods for linear solvers continues to be a topic of interest and development as our models continue to grow in scope, complexity, and anatomical detail.
Edward Vigmond and Makarand Deo wish to acknowledge support by the Natural Sciences and Engineering Research Council of Canada, and Mathematics of Information Technology and Complex Systems (MITACS).
G. Plank and A. Prassl were supported by the Austrian Science Fund FWF, grant No R21-N04. DG. Plank was also supported by the Marie Curie Fellowship MC-OIF 040190.
R. Weber dos Santos thanks the support provided by the Universidade Federal de Juiz de Fora and the Brazilian Ministry of Science and Technology, CNPq (473852/2004-7).