Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3092301

Formats

Article sections

Authors

Related links

Ann Appl Stat. Author manuscript; available in PMC 2011 May 11.

Published in final edited form as:

Ann Appl Stat. 2010 December 1; 4(4): 1749–1773.

doi: 10.1214/10-AOAS357PMCID: PMC3092301

NIHMSID: NIHMS288657

Zhongyang Zhang^{1}

Department of Statistics University of California, Los Angeles Los Angeles, California 90095 USA ; Email: ude.alcu@yzgnahz

Kenneth Lange^{2}

Department of Biomathematics, Human Genetics and Statistics University of California, Los Angeles Los Angeles, California 90095 USA ; Email: ude.alcu@egnalk

Roel Ophoff^{1}

Center for Neurobehavioral Genetics University of California, Los Angeles Los Angeles, California 90095 USA ; Email: ude.alcu@ffohpo

Departments of HRP and Statistics Stanford University California 94305 USA

Departments of Human Genetics and Statistics University of California, Los Angeles Los Angeles, California 90095 USA ; Email: ude.drofnats@ittabas

See other articles in PMC that cite the published article.

Recent advances in genomics have underscored the surprising ubiquity of DNA copy number variation (CNV). Fortunately, modern genotyping platforms also detect CNVs with fairly high reliability. Hidden Markov models and algorithms have played a dominant role in the interpretation of CNV data. Here we explore CNV reconstruction via estimation with a fused-lasso penalty as suggested by Tibshirani and Wang [Biostatistics **9** (2008) 18–29]. We mount a fresh attack on this difficult optimization problem by the following: (a) changing the penalty terms slightly by substituting a smooth approximation to the absolute value function, (b) designing and implementing a new MM (majorization-minimization) algorithm, and (c) applying a fast version of Newton's method to jointly update all model parameters. Together these changes enable us to minimize the fused-lasso criterion in a highly effective way.

We also reframe the reconstruction problem in terms of imputation via discrete optimization. This approach is easier and more accurate than parameter estimation because it relies on the fact that only a handful of possible copy number states exist at each SNP. The dynamic programming framework has the added bonus of exploiting information that the current fused-lasso approach ignores. The accuracy of our imputations is comparable to that of hidden Markov models at a substantially lower computational cost.

New techniques of fine mapping have uncovered many regions of the human genome displaying copy number variants (CNVs) [Iafrate et al. (2004); Redon et al. (2006); Sebat et al. (2004)]. Variation is to be expected in cancer cells, but it also occurs in normal somatic cells subject to Mendelian inheritance. As awareness of the disease implications of CNVs has spread, geneticists have become more interested in screening their association study samples for copy number polymorphisms (CNPs) [Stefansson et al. (2008)]. Fortunately, the Illumina and the Affymetrix platforms used in high-density genotyping yield CNV information at no additional cost. Despite their obvious technical differences, the two platforms generate conceptually very similar CNV reconstruction problems.

Hidden Markov models and algorithms have dominated the field of CNV reconstruction [Colella et al. (2007); Korn et al. (2008); Scharpf et al. (2008); Wang et al. (2007, 2009)]. This statistical framework is flexible enough to accommodate several complications, including variable single nucleotide polymorphism (SNP) frequencies, variable distances between adjacent SNPs, linkage disequilibrium and relationships between study subjects. In the current paper we investigate the potential of penalized estimation for CNV reconstruction. Tibshirani and Wang (2008) introduced the fused-lasso penalty for the detection of CNVs based on generic considerations of smoothness and sparsity [Rudin, Osher and Fatemi (1992); Tibshirani et al. (2005)]. The application of the fused lasso to CNV detection is best motivated by a simplified model. Let the parameter vector **β** = (β_{1},β_{2}, …, β_{n}) quantify DNA levels at *n* successive SNPs. These levels are normalized so that β_{i} = 0 corresponds to the standard copy number 2, where SNP *i* is represented once each on the maternal and paternal chromosomes. Variant regions are rare in the genome and typically involve multiple adjacent SNPs; CNVs range from a few thousand to several million base pairs in length. In high-density genotyping we query SNPs that are on average about five thousand base pairs apart. The true **β** is therefore expected to be piecewise constant, with the majority of values equal to 0 and a few segments with positive values (indicating duplication) and negative values (indicating deletion).

Tibshirani and Wang (2008) proposed the joint use of a lasso and a fused-lasso penalty $p\left(\beta \right)={\Sigma}_{i=2}^{n}{\beta}_{i}-{\beta}_{i-1}$ to enforce this piecewise constant structure. One then estimates **β** by minimizing the objective function *l*(**β**) + λ_{1}**β**_{1} λ_{2}*p*(**β**), where *l*(**β**) is a goodness-of-fit criteria. The nondifferentiability of the objective function makes minimization challenging [Friedman et al. (2007)]. We mount a fresh attack on this difficult optimization problem by the following tactics: (a) changing penalty terms slightly by substituting a smooth approximation to the absolute value function, (b) majorizing the substitute penalties by quadratics and implementing a new MM (majorization–minimization) algorithm based on these substitutions, and (c) solving the minimization step of the MM algorithm by a fast version of Newton's method. When the loss function is quadratic, Newton's method takes a single step. More radically, we also reframe the reconstruction problem in terms of imputation via discrete optimization. Readers familiar with Viterbi's algorithm from hidden Markov models will immediately recognize the value of dynamic programming in this context. For the specific problem of detection of CNVs in DNA from normal cells, discrete imputation has the advantage of choosing among a handful of copy number states rather than estimating a continuous parameter. This fact renders discrete imputation easier to implement and more accurate than imputation via parameter estimation.

The remainder of the paper is organized as follows. In the methods section we briefly review the data generating mechanism for CNV problems. We then present our estimation approach to CNV reconstruction and the MM algorithm that implements it. Finally, we describe our new model and the dynamic programming algorithm for discrete imputation. In the results section we assess the statistical performance and computational speed of the proposed methods on simulated and real data sets.

When reconstructing CNV from genotype data, researchers rely not only on the final genotype calls but also on raw measurements obtained from the genotyping array. The character of these measurements varies slightly depending on the platform adopted. For definiteness, we focus on the data delivered by the Illumina platform at our disposal. A DNA sample from an individual is preprocessed, hybridized to a chip, and queried at *n* SNPs. For convenience, we will call the two alleles A and B at each SNP. The amount of DNA carried by each allele at a queried SNP is measured by recording the luminescence of specifically labeled hybridized DNA fragments. Transformations and normalizations of the luminescences lead to two noisy measurements for each SNP *i*: *y _{i}* (LogR) and

The B-allele frequency (BAF) represents the fraction of the total DNA attributable to allele B. The admissible values for *x _{i}* occur on the interval 0, 1]. When copy number equals 1,

Consider first CNV reconstruction using signal intensities *y _{i}* and neglecting B-allele frequencies

$$f\left(\beta \right)=\frac{1}{2}\sum _{i=1}^{n}{\left({y}_{i}-\sum _{j=1}^{p}{z}_{ij}{\beta}_{j}\right)}^{2}+{\lambda}_{1}\sum _{j=1}^{p}{\beta}_{j}+{\lambda}_{2}\sum _{j=2}^{p}{\beta}_{j}-{\beta}_{j-1}.$$

Here **y** = (*y _{i}*)

$$f\left(\beta \right)=\frac{1}{2}\sum _{i=1}^{n}{({y}_{i}-{\beta}_{i})}^{2}+{\lambda}_{1}\sum _{i=1}^{n}{\beta}_{i}+{\lambda}_{2}\sum _{i=2}^{n}{\beta}_{i}-{\beta}_{i-1}.$$

(1)

Notice that *f*(**β**) is strictly convex and coercive, so a unique minimum exists. When λ_{2} = 0, the objective function can be decomposed into a sum of *n* terms, each depending only on one β_{i}. This makes it very easy to find its minimum using coordinate descent [Friedman et al. (2007); Wu and Lange (2008)]. Unfortunately, this is not the case with λ_{2} ≠ 0 because the kinks in the objective function are no longer confined to the coordinate directions. This makes coordinate descent much less attractive [Friedman et al. (2007)]. Quadratic programming [Tibshirani et al. (2005); Tibshirani and Wang (2008)] is still available, but its computational demands do not scale well as *p* increases.

Inspired by the resolution of similar smoothing dilemmas in imaging [Bioucas-Diaa, Figueiredo and Oliveira (2006); Rudin, Osher and Fatemi (1992)], we simplify the problem by slightly modifying the penalty. The function

$${x2,\epsilon =\sqrt{{x}^{2}+\epsilon}}_{}$$

is both differentiable and strictly convex. For small ε >0 it is an excellent approximation to |*x*|. Figure 2 illustrates the quality of this approximation for the choice ε = 0.001. In practice, we set ε = 10^{−10}. If we substitute *x*_{2,ε} for |*x*|, then the CNV objective function becomes

$${f}_{\epsilon}\left(\beta \right)=\frac{1}{2}\sum _{i=1}^{n}({y}_{i}-{\beta}_{i})+{\lambda}_{1}\sum _{i=1}^{n}{{\beta}_{i}2,\epsilon +{\lambda}_{2}\sum _{i=2}^{n}{{\beta}_{i}-{\beta}_{i-1}2,\epsilon .}_{}}_{}$$

(2)

As ε tends to 0, one can show that the unique minimum point of (2) tends to the unique minimum point of the original objective function.

Contours corresponding to different penalties. Solid gray line: |β_{1}| + |β_{2}| = 1 and ${\beta}_{1}-{\beta}_{2}=\frac{1}{2}$; Dashed line: β_{1}_{2,ε} + β_{2}_{2,ε} = 1 **...**

Another virtue of the substitute penalties is that they lend themselves to majorization by a quadratic function. Given the concavity of the function $t\sqrt{t+\epsilon}$, it is geometrically obvious that

$${x2,\epsilon \le {z2,\epsilon +\frac{1}{2{z2,\epsilon}_{}[{x}^{2}-{z}^{2}],}}_{}}_{}$$

with equality holding if and only if *x* = *z*. This inequality enables a Majorization–Minimization (MM) [Lange (2004)] strategy that searches for the minimum of the objective function. Each step of this iterative approach requires the following: (a) majorizing the objective function by a surrogate equal to it at the current parameter vector and (b) minimizing the surrogate. The better-known EM algorithm is a special case of the MM algorithm. The MM algorithm generates a descent path guaranteed to lead to the optimal solution when one exists. More information can be found in Lange (2004). Returning to our problem, we can replace the objective function by the surrogate function

$${g}_{\epsilon},m(\beta {\beta}^{\left(m\right)})=\frac{1}{2}\sum _{i=1}^{n}{({y}_{i}-{\beta}_{i})}^{2}+\frac{{\lambda}_{1}}{2}\sum _{i=1}^{n}\frac{{\beta}_{i}^{2}}{{{\beta}_{i}^{\left(m\right)}2,\epsilon}_{}+\frac{{\lambda}_{2}}{2}\sum _{i=2}^{n}\frac{{({\beta}_{i}-{\beta}_{i-1})}^{2}}{{{\beta}_{i}^{\left(m\right)}-{\beta}_{i-1}^{\left(m\right)}2,\epsilon}_{}+{c}_{m},}}$$

where *m* indicates iteration number and *c _{m}* is a constant unrelated to

Despite these gains in simplicity, the surrogate function still does not decompose into a sum of *n* terms, with each depending on only one β_{i}. The fact that the even numbered β_{i} do not interact given the odd numbered β_{i} (and vice versa) suggests alternating updates of the two blocks of even and odd numbered parameters. In practice, this block relaxation strategy converges too slowly to be competitive. Fixing β_{i−1} and β_{i+1} leaves too little room to move β_{i}. Fortunately, full minimization of the quadratic is less onerous than one might expect. The surrogate function can be written in a matrix form

$${g}_{\epsilon},m(\beta {\beta}^{\left(m\right)})=\frac{1}{2}{\beta}^{T}{\mathbf{A}}_{m}\beta -{\mathbf{b}}_{m}^{T}\beta +{\stackrel{~}{c}}_{m},$$

(3)

where **A**_{m} is a tridiagonal symmetric matrix. In view of the strict convexity of the surrogate function, **A**_{m} is also positive definite. The nonzero entries of **A**_{m} and **b**_{m} are

$${a}_{1,1}^{\left(m\right)}=1+\frac{{\lambda}_{1}}{{{\beta}_{1}^{\left(m\right)}2,\epsilon}_{}+\frac{{\lambda}_{2}}{{{\beta}_{2}^{\left(m\right)}-{\beta}_{1}^{\left(m\right)}2,\epsilon}_{}}}$$

$${a}_{i,i}^{\left(m\right)}=1+\frac{{\lambda}_{1}}{{{\beta}_{i}^{\left(m\right)}2,\epsilon}_{}+\frac{{\lambda}_{2}}{{{\beta}_{i}^{\left(m\right)}-{\beta}_{i-1}^{\left(m\right)}2,\epsilon}_{}+\frac{{\lambda}_{2}}{{{\beta}_{i+1}^{\left(m\right)}-{\beta}_{i}^{\left(m\right)}2,\epsilon}_{},i=2,\dots ,n-1;}}}$$

$${a}_{n,n}^{\left(m\right)}=1+\frac{{\lambda}_{1}}{{{\beta}_{n}^{\left(m\right)}2,\epsilon}_{}+\frac{{\lambda}_{2}}{{{\beta}_{n}^{\left(m\right)}-{\beta}_{n-1}^{\left(m\right)}2,\epsilon}_{};}}$$

$${a}_{i,i+1}^{\left(m\right)}=-\frac{{\lambda}_{2}}{{{\beta}_{i+1}^{\left(m\right)}-{\beta}_{i}^{\left(m\right)}2,\epsilon}_{},i=1,\dots ,n-1;}$$

$${a}_{i-1,i}^{\left(m\right)}=-\frac{{\lambda}_{2}}{{{\beta}_{i}^{\left(m\right)}-{\beta}_{i-1}^{\left(m\right)}2,\epsilon}_{},i=2,\dots ,n;}$$

$${b}_{i}^{\left(m\right)}={y}_{i},i=1,\dots ,n.$$

The minimum of the quadratic occurs at the point $\beta ={\mathbf{A}}_{m}^{-1}{\mathbf{b}}_{m}$. Thanks to the simple form of **A**_{m}, there is a variant of Gaussian elimination known as the tridiagonal matrix algorithm (TDM) or Thomas's algorithm [Conte and deBoor (1972)] that solves the linear system **A**_{m}**β** = **b**_{m} in just 9*n* floating point operations. Alternatively, one can exploit the fact that the Cholesky decomposition of a banded matrix is banded with the same number of bands. As illustrated in Section 3.5, Thomas's algorithm is a vast improvement over block relaxation.

A few comments on the outlined strategy are in order. By changing the penalty from · _{1} to · _{2,ε}, we favor less sparse solutions. However, spareness is somewhat besides the point. What we really need are criteria for calling deletions and duplications. The lasso penalty is imposed in this problem because most chromo-some regions have a normal copy number where *y _{i}* hovers around 0. The same practical outcome can be achieved by imputing copy number 2 for regions where the estimated β

Perhaps more importantly, our strategy can be adapted to handle more general objective functions, as long as the resulting matrix **A** in (3) is banded, or, at least, sparse. For example, consider the inpainting problem in image reconstruction [Chan and Shen (2002)]. In this two dimensional problem, the intensity levels for certain pixels are lost. Let *S* be the set of pixels with known levels. The objective function

$$f\left(\beta \right)=\frac{1}{2}\sum _{(i,j)S}$$

represents a compromise between imputing unknown values and smoothing. If we majorize the penalties in this objective function by quadratics, then we generate a quadratic surrogate function. The corresponding Hessian of the surrogate is very sparse. (Actually, it is banded, but not in a useful fashion.) Although we can no longer invoke Thomas's algorithm, we can solve the requisite system of linear equations by a sparse conjugate gradient algorithm.

All of the algorithms mentioned so far rely on known values for the tuning constants. We will describe our operational choices for these constants after discussing the problem of imputing chromosome states from estimated parameters in the next section.

Imputation of copy number as just described has the drawbacks of neglecting relevant information and requiring the estimation of a large number of parameters. To overcome these limitations, we now bring in the BAF *x _{i}* and focus on a model with a finite number of states. This setting brings us much closer to the HMM framework, often used for CNV reconstruction. Such similarity will be evident also in the numerical strategy we will use for optimization. However, our approach avoids the distributional assumptions at the basis of an HMM.

We consider 10 possible genotypic states ϕ, A, B, AA, AB, BB, AAA, AAB, ABB and BBB at each SNP. Here ϕ is the null state with a copy number of 0. (Note that, in the interest of parsimony, we contemplate double deletions, but not double duplications. This has more to do with the strength of signal from duplications than their actual frequency, and it is an assumption that can be easily relaxed.) In the model the average signal intensity μ_{c(s)} for a state *s* depends only on its copy number *c*(*s*). Regardless of whether we estimate the μ_{c} or fix them, they provide a more parsimonious description of the data than the β_{i}, which could take on a different value for each SNP. Furthermore, while we still need to impute a state for each SNP *i*, selecting one possible value out of 10 is intrinsically easier than estimation of the continuously varying β_{i}. Table 1 lists the copy number *c*(*s*), the expected value of *y _{i}* and the approximate distribution of

$$f\left(\mathbf{s}\right)=\sum _{i=1}^{n}{L}_{1}({y}_{i},{s}_{i})+\alpha \sum _{i=1}^{n}{L}_{2}({x}_{i},{s}_{i})+{\lambda}_{1}\sum _{i=1}^{n}{\mu}_{c\left({s}_{i}\right)}+{\lambda}_{2}\sum _{i=2}^{n}{\mu}_{c\left({s}_{i}\right)}-{{\mu}_{c}}_{\left({s}_{i-1}\right)},$$

(4)

which again is a linear combination of losses plus penalties. Here α, λ_{1} and λ_{2} are positive tuning constants controlling the relative influences of the various factors. The lasso penalty makes the states with copy number 2 privileged. The fused-lasso penalty discourages changes in state. Minimizing the objective function (4) is a discrete rather than a continuous optimization problem.

Genotype states, corresponding copy numbers, expected values of y_{i} and approximate distributions of x_{i}

Different loss functions may be appropriate in different circumstances. If the intensity values are approximately Gaussian around their means with a common variance, then the choice *L*_{1}(*y, s*) = [*y* − μ_{c(s)}]^{2} is reasonable. For the BAF *x _{i}*, the choice

$${L}_{2}(x,\varphi )={\int}_{0}^{1}{(x-u)}^{2}\mathit{du}=\frac{1}{3}[{x}^{3}+{(1-x)}^{3}].$$

Once the loss functions are set, one can employ dynamic programming to find the state vector **s** minimizing the objective function (4). If we define the partial solutions

$${g}_{i}\left(j\right)=\underset{{s}_{1},\dots ,{s}_{i-1}}{\text{min}}f({s}_{1},\dots ,{s}_{i-1},{s}_{i}=j)$$

for *i* = 1,…,*n*, then the optimal value of the objective function is min_{j }*g _{n}*(

$${g}_{i+1}\left(j\right)=\underset{k}{\text{min}}[{g}_{i}\left(k\right)+{L}_{1}({y}_{i+1},j)+\alpha {L}_{2}({x}_{i+1},j)+{\lambda}_{1}{\mu}_{c\left(j\right)}+{\lambda}_{2}{\mu}_{c\left(j\right)}-{\mu}_{c\left(k\right)}],$$

(5)

with initial conditions

$${g}_{1}\left(j\right)={L}_{1}({y}_{1},j)+\alpha {L}_{2}({x}_{1},j)+{\lambda}_{1}{\mu}_{c\left(j\right)}.$$

The beauty of dynamic programming is that it applies to a variety of loss and penalty functions.

In fact, it is possible to construct an even more parsimonious model whose four states correspond to the four copy numbers 0, 1, 2 and 3. The loss function *L*_{1}(*y, c*) = (*y* − μ_{c})^{2} is still reasonable, but *L*_{2}(*x, c*) should reflect the collapsing of genotypes. Here *c* is the copy number. Two formulations are particularly persuasive. The first focuses on the minimal loss among the genotypes relevant to each copy number. This produces

$${L}_{2}(x,c)=\{\begin{array}{cc}{\int}_{0}^{1}{(x-u)}^{2}\mathit{du}=\frac{1}{3}[{x}^{3}+{(1-x)}^{3}],\hfill & \hfill c=0,\hfill \\ \text{min}\{{(x-0)}^{2},{(x-1)}^{2}\},\hfill & \hfill c=1,\hfill \\ \text{min}\{{(x-0)}^{2},{(x-1/2)}^{2},{(x-1)}^{2}\},\hfill & \hfill c=2,\hfill \\ \text{min}\{{(x-0)}^{2},{(x-1/3)}^{2},{(x-2/3)}^{2},{(x-1)}^{2}\}\hfill & \hfill c=3.\hfill \end{array}\phantom{\}}$$

(6)

The second formulation averages loss weighted by genotype frequency. There are other reasonable loss functions. Among these it is worth mentioning negative log-likelihood, Huber's function and the hinge loss function of machine learning.

Dynamic programming does require specification of the parameters characterizing the distribution of the intensities *y _{i}* and the BAF

In calling deletions and duplications with the fused lasso, we adopt the procedure of Tibshirani and Wang (2008). Originally designed for array-CGH platforms, this procedure aims to control false discovery rate (FDR). Fortunately, it can be readily applied to genotype data. The general idea is to formulate the problem as one of multiple hypothesis testing for nonoverlapping chromosome segments *S*_{1} through *S _{K}*. For each segment

$${\widehat{z}}_{k}=\frac{{\Sigma}_{i\mathrm{{S}_{k}}}\sqrt{{n}_{k}}\widehat{\sigma}}{,}$$

where *n _{k}* is the number of SNPs in segment

$$\widehat{\text{FDR}}\left(q\right)=\frac{{K}_{q}1/K{\Sigma}_{k=1}^{K}{n}_{k}{\Sigma}_{k=1}^{K}{n}_{k}{1}_{({p}_{k}\le q)}}{=}$$

(7)

Here the FDR is defined as the ratio between the number of SNPs in nominal CNV segments with true copy number 2 and the total number of SNPs claimed to be within CNV segments. In the FDR estimate (7), *q* is roughly regarded as the fraction of null (copy number 2) segments among all candidate CNV segments. In the numerator, $\frac{1}{K}{\Sigma}_{k=1}^{K}{n}_{k}$ counts the average SNP number within each segment, and *Kq* estimates the expected number of null segments. In the denominator, ${\Sigma}_{k=1}^{K}{n}_{k}{1}_{({p}_{k}\le q)}$ counts the total number of SNPs claimed to be located in CNV segments. Thus, this approximation is desired according to the SNP-number-based definition.

Once we decide on an FDR level α, the threshold *q* is determined as the largest value satisfying $\widehat{\mathrm{FDR}}\left(q\right)\le \alpha $. We call a segment *S _{k}* a deletion if ${\widehat{z}}_{k}<0$ and

Choice of the tuning constants λ_{1} and λ_{2} is nontrivial. Because they control the sparsity and smoothness of the parameter vector β and therefore drive the process of imputation, it is crucial to make good choices. Both of the references Friedman et al. (2007) and Wu and Lange (2008) discuss the problem and suggest solutions in settings similar to ours. While explicit theoretical expressions for optimal λ_{1} and λ_{2} are currently unavailable, known results can inform practical choices.

Friedman et al. (2007) consider the optimal solution to the fused-lasso problem

$$\widehat{\beta}({\lambda}_{1},{\lambda}_{2})=\text{arg}\underset{\beta}{\text{min}}\frac{1}{2}\sum _{i=1}^{n}{({y}_{i}-{\beta}_{i})}^{2}+{\lambda}_{1}\sum _{i=1}^{n}{\beta}_{i}+{\lambda}_{2}\sum _{i=1}^{n}{\beta}_{i}-{\beta}_{i-1}.$$

They prove that $\widehat{\beta}({\lambda}_{1},{\lambda}_{2})$ for λ_{1} > 0 is a soft-thresholding of $\widehat{\beta}(0,{\lambda}_{2})$ when λ_{1} = 0, namwly,

$${\widehat{\beta}}_{i}({\lambda}_{1},{\lambda}_{2})=\text{sign}\left({\widehat{\beta}}_{i}(0,{\lambda}_{2})\right){({\widehat{\beta}}_{i}(0,{\lambda}_{2})-{\lambda}_{1})+,\phantom{\rule{thickmathspace}{0ex}}i=1,\dots ,n.}_{}$$

(8)

This implies that λ_{1} > 0 will drive to 0 those segments of the piecewise constant solution $\widehat{\beta}(0,{\lambda}_{2})$ whose absolute values are close to 0. It is also important to note that, since $\widehat{\beta}(0,{\lambda}_{2})$ is piecewise constant, its effective dimension is much lower than *n*.

To understand how the optimal values of these tuning parameters depend on the dimension of the vector **β**, let us recall pertinent properties of the Lasso estimator in linear regression. In this setting

$$\widehat{\beta}=\text{arg}\underset{\beta}{\text{min}}\frac{1}{2}{\mathbf{y}-Z\beta {l}_{2}2+\lambda {\beta {l}_{1},}_{}}_{}^{}$$

(9)

where ${\mathbf{y}}_{n\times 1}~\mathcal{N}({\mathbf{Z}}_{n\times p}{\beta}_{p\times 1},{\sigma}^{2}{\mathbf{I}}_{n\times n})$, and · _{1} and · _{2} are the _{1} and _{2} norms. Candès and Plan (2009), Donoho and Johnstone (1994)and Negahban et al. (2009) show that a Lasso estimator with $\lambda =c\sigma \sqrt{\mathrm{log}p}$ for some constant *c* leads to an optimal upper bound on ${\mathbf{Z}\beta -\mathbf{Z}\widehat{\beta}{2}_{}2}_{}^{}$. Our problem with λ_{1} = 0 fits in this framework if we reparameterize via δ_{1} = β_{1} and δ_{i} = β_{i} − β_{i−1} for *i* for *i* = 2, …,*n*. In the revised problem

$$\widehat{\delta}=\text{arg}\underset{\delta}{\text{min}}\frac{1}{2}\sum _{i=1}^{n}{\left({y}_{i}-\sum _{j=1}^{i}{\delta}_{j}\right)}^{2}+{\lambda}_{2}\sum _{i=2}^{n}{\delta}_{i},$$

(10)

*p* = *n*, and the design matrix is lower-triangular with all nonzero entries equal to 1. This finding suggests that we scale λ_{2} by $\sqrt{\mathrm{log}n}$

On the basis of these observations, we explored the choices

$${\lambda}_{1}={\rho}_{1}\sigma \phantom{\rule{thickmathspace}{0ex}}{\lambda}_{2}={\rho}_{2}\sigma \sqrt{\text{log}n},\phantom{\rule{thickmathspace}{0ex}}{\rho}_{1},{\rho}_{2}>0.$$

Here σ relates the tuning parameters to the noise level. Because the effective dimension in (8) is much smaller than *n*, we assumed that λ_{1} does not depend on *n*. Although ρ_{1} and ρ_{2} can be tuned more aggressively by cross-validation or criteria such as BIC, we chose the sensible and operational combination

$${\lambda}_{1}=\sigma ,\phantom{\rule{thickmathspace}{0ex}}{\lambda}_{2}=2\sigma \sqrt{\text{log}n}.$$

(11)

A small scale simulation study suggested that the performance of our methods does not vary substantially for values of ρ_{1} and ρ_{2} close to 1 and 2, respectively. One may also vary ρ_{1} and/or ρ_{2} mildly to achieve different combinations of sensitivity and specificity as defined in Section 3.4. (Data not shown.)

In practice, we do not know the value of σ. Here we estimated a different σ for each individual, using the standard deviation of *y _{i}* values between their 2.5 and 97.5 percentiles. We decided to use only data points within the 95%-interquantile range in order to exclude values of

While most of the experiments in the paper used the values of λ_{1} and λ_{2} suggested in equation (11), we also designed and conducted a more general simulation study to find the optimal values of these tuning parameters; see Section 3.8 for details.

To illustrate the effectiveness of our algorithms, we tested them on simulated data. Real data with empirically validated CNVs would be ideal, but such a gold standard does not exist. Instead, we used data from male and female X chromosomes to construct *in silico* CNV. Since males are equipped with only one X chromosome, we can use their genotype data to approximate the signal generated by deletion regions. A patchwork of female and male data mimics what we expect from an ordinary pair of homologous chromosomes with occasional deletions. Our X chromosome data come from the schizophrenia study sample of Vrijenhoek et al. (2008) genotyped on the Illumina platform. We focus on the 307 male and 344 female controls.

To avoid artifacts, the data needed to be preprocessed. We identified SNP clusters on the X chromosome using the Beadstudio Illumina software on female controls. These clusters permit estimation of parameters typical of a diploid genome. We then normalized the corresponding male SNP signals relative to the corresponding female signals. Finally, to destroy the signature of possible CNVs in the female data, we permuted the order of the SNPs. This action breaks up the patterns expected within CNV regions and eliminates the smooth variation in the intensity signals [Diskin et al. (2008)].

After these preprocessing steps, we generated ordinary copy number regions from the female data and deleted regions from the male data. We also generated duplications by taking the weighted averages

$${y}_{i,dup}={y}_{i,f}+0.55\times \text{median}\left({y}_{f}\right)-\text{median}\left({y}_{m}\right),$$

$${x}_{i,dup}=\frac{1}{3}{x}_{i,m}+\frac{2}{3}{x}_{i,f}$$

for the intensities and BAFs, where the *f* and *m* subscripts refer to females and males. Because duplications show a lesser increase in logR values than the deletions show a decrease, the factor 0.55 multiplies the absolute difference |median(*y _{f}*)−median(

We generated two different data sets to assess the operating characteristics of the proposed algorithms. In both data sets the number of deletions equals the number of duplications. *Data set 1* consists of 3600 sequences, each 13,000 SNPs long, with either a deletion or a duplication in the central position. The CNVs had lengths evenly distributed over the 6 values 5, 10, 20, 30, 40 and 50 SNPs. *Data set 2* consists of 300 sequences with variable numbers of SNPs and either a deletion or duplication in the central position. The sequence lengths were evenly distributed over the values 4000, 8000, 12,000, 16,000 and 20,000 SNPs; the CNV lengths followed the distribution of data set 1.

The sequence and CNV lengths in our simulations were chosen to roughly mimic values expected in real data. For the Illumina HumanHap550 BeadChip platform, the median number of SNPs per chromosome arm is 13,279, with a median absolute deviation of 8172. Current empirical data suggests that there is usually at most one CNV per chromosome arm [Wang et al. (2007)] and that the length of the typical CNV is usually less than 50 SNPs [Jakobsson et al. (2008)]. The sequences from data set 1 represent an average chromosome arm, while the sequences from data set 2 capture the diversity across all chromosome arms. Both data sets have useful lessons to teach.

We will measure accuracy on a SNP by SNP basis, adopting the following indexes: true positive rate (TPR or sensitivity), false positive rate (FPR or 1-specificity), and false discovery rate (FDR). These are defined as the ratios

$$\text{TPR}=\frac{\text{TP}}{\mathrm{P}}=\frac{\text{TP}}{\text{TP}+\text{FN}},$$

$$\text{FPR}=\frac{\text{FP}}{\mathrm{N}}=\frac{\text{FP}}{\text{FP}+\text{TN}},$$

$$\text{FDR}=\frac{\text{FP}}{\text{TP}+\text{FP}},$$

where the capital letters T, F, P, N and R stand for true, false, positive, negative and rate, respectively. For example, the letter P by itself should be interpreted as the number of SNPs with true copy number equal to 0, 1 or 3; the pair of letters FN should be interpreted as the number of SNPs with true copy number 0, 1 or 3 but imputed copy number 2. We will also evaluate the number of iterations until convergence and the overall computational time required by each algorithm.

For benchmarking purposes, we will compare the performance of the proposed algorithms to that of PennCNV [Wang et al. (2007)], a state-of-the-art hidden Markov model for CNV discovery on Illumina data. PennCNV bases the geno-type call for SNP *i* on its *y _{i}* and

We first investigate two versions of the fused-lasso procedure. Both implement the MM algorithm on the objective function (2). The MMTDM algorithm solves the minimization step by the tridiagonal matrix algorithm. The MMB algorithm approximately solves the minimization step by one round of block relaxation. To assess the rate of convergence of MMTDM and MMB, we used data set 1 with 3600 sequences of 13,000 SNPs each. We declared convergence for a run when the difference between the objective function at two consecutive iterations fell below 10^{−}. To limit the computational burden, we set the maximum number of iterations equal to 10,000. Both algorithms started with the values β_{i} = *y _{i}*. Each entry of Table 2 summarizes the results for a different CNV width. The table makes it abundantly clear that MMB is not competitive. Because MMB never converged in these trials, we took one sequence and ran it to convergence under the more stringent convergence criterion of 10

Comparison of convergence rates for the two algorithms MMB and MMTDM for the fused lasso. (*a*) MMTDM converges much faster than MMB. Blue line: MMB; Red line: MMTDM; Black dashed line: minimum value of objective function; (*b*) After *10*^{5} iterations, MMB **...**

Data set 1 also illustrates the advantages of including BAF information in CNV reconstruction. Here we focus on dynamic programming imputation (DPI) based on the objective function (4). Note that this function does not incorporate prior knowledge of the frequency of deletions versus duplications. In running the dynamic programming algorithm, we rely on results from a previous study [Wang et al. (2009)] to initialize the intensity parameters μ* _{k}*. Because the μ

Table 4 reports the values of the accuracy indices for various CNV sizes and types. Here we compare PennCNV, fused-lasso minimization under MMTDM and DPI on data set 1. To avoid overfitting and a false sense of accuracy, we used 3-fold cross-validation to choose α. The accuracy indices reported in the table represent averages over the left-out thirds. Although PennCNV falters a little with the shortest CNVs, it is clearly the best of the three methods. More surprising, DPI achieves comparable FPR and FDR to PennCNV as well as fairly good TPR. In particular, its FDR is uniformly low across CNV sizes and types. Overall, Table 4 demonstrates the promise of DPI. In contrast, the results for fused-lasso minimization are discouraging. Despite its post-processing to control FDR, it does poorly in this regard. Furthermore, it displays substantially worse TPR for duplications than PennCNV and DPI, particularly for duplications spanning only 5 SNPs. This behavior is to be expected given the poor ability of signal strength alone to separate duplications from normal chromosome regions. The performance of fused-lasso minimization underscores the advantages of explicitly modeling the discrete nature of the state space and taking BAF information into account. Nonetheless, it is important to keep in mind that the previous data sets are by design more favorable to PennCNV and DPI. The analysis of tumor samples with ambiguous copy numbers or signals from experimental devices, such as CGH arrays that lack allele-specific information, are bound to cast fused-lasso minimization in a kinder light.

Data set 2 allowed us to assess performance on longer sequences with less frequent SNPs and to gain insight into the impact of the tuning parameters λ_{1} and λ_{2}. For the latter purpose we adopted two strategies: (a) define λ_{1} and λ_{2} by the values displayed in equation (11), and (b) adopt an “oracle” approach that relies on the knowledge of locations of deletions and duplications. Strategy (b) chooses constant values across the individuals to maximize TPR (sensitivity) while keeping FPR and FDR levels comparable to those under strategy (a). The oracle approach is not applicable to real data sets, where locations of deletions and duplications are unknown. We adopted it in this analysis to determine how optimal tuning parameters vary with sequence length.

Tables 5–7 summarize results for PennCNV, fused-lasso minimization and DPI, respectively. As with data set 1, PennCNV achieves the best sensitivity, followed by DPI. The best control of false positives occurs with DPI. The accuracy of the methods and the optimal values of λ_{1} and λ_{2} do not change with sequence length *n*. However, it is clear that the advantages of selecting individual-specific λ values outweigh the benefit of selecting constant λ values that maximize overall performance. In fact, the choice of the oracle λ is excessively influenced by some individuals with poor quality data; to control false discoveries in these subjects, one lowers performance in more favorable settings.

Finally, we compared the computational speeds of the three methods. Although the cost of each scales linearly with the number of SNPs, run times vary considerably in practice (see Figure 4). We base our comparisons on data set 2 run on an Intel Xeon 2.80 GHz processor operating under Linux. The PennCNV distributed software (2008, November 19 version) is a combination of C and Perl. We implemented DPI and the MMTDM algorithm for fused-lasso minimization in Fortran 95. The penalty tuning parameters were chosen according to equation (11). For DPI we set α = 12. Table 8 lists average run times for each sequence sample; standard errors appear in parentheses. As we anticipated, fused-lasso minimization and DPI require less computation per iteration and run much faster than PennCNV. DPI is 2 to 3 times faster than fused-lasso minimization.

Graphical comparison of computation speed as sequence length varies. Solid line: PennCNV; Dashed line: Fused Lasso; Dotted line: DPI.

We tested the three methods on genome scan data on four schizophrenia patients from the study of Vrijenhoek et al. (2008). These patients were selected because they each exhibit one experimentally validated CNV (two deletions and two duplications). The four CNVs disrupt the genes MYT1L, CTNND2, NRXN1 and ASTN2, which play important roles in neuronal functioning and are associated with schizophrenia. This subset of the data is ideal for our purpose. The entire data set was collected as part of a genome-wide association study and consists of blood samples from unrelated individuals. It is expected that only a modest amount of CNV may be present; most CNVs probably represent inherited neutral polymorphisms rather de novo mutations. Unlike cancer cell lines, copy numbers should rarely exceed 3.

We analyzed the entire genomes of these four subjects, applying the three methods to each chromosome arm. In calling CNVs with fused-lasso minimization, we controlled FDR at the 0.05 level. The penalty tuning parameters were chosen according to equation (11). For dynamic programming, we set α 12. It took on average 113.8, 8.6 and 4.7 seconds for the three methods to run on the approximately 550k SNPs typed on each individual. The computational efficiency of DPI displayed here may be a decisive advantage in other data sets with thousands of participants. To focus on signals with a higher chance of being real, we eliminated all CNV calls involving fewer than 5 SNPs.

Table 9 reports the numbers of detected CNVs and their median sizes; median absolute deviations are listed in parentheses. PennCNV produced the largest number of CNVs calls, followed by fused-lasso minimization. The CNVs detected by PennCNV and DPI had similar sizes; those detected by fused-lasso minimization tended to be longer. Table 10 summarizes the overlap between the CNVs calls for the three methods. The vast majority of CNVs detected by DPI are also detected by PennCNV. There is a smaller overlap between PennCNV and the Fused Lasso.

Overlap of CNVs detected by PennCNV, Fused Lasso and DPI. The percentages listed in parentheses refer to the ratio of the number of overlapping CNVs to the total number of unique CNVs detected. For patient 1 DPI treated a large duplication region on the **...**

Three of the experimentally verified CNVs were detected by all three methods. The fourth, a deletion on 9q33.1 in patient 4, was detected only by PennCNV (see Figure 5). It is noteworthy that the quality of the data for this patient is poor. For example, it fails to pass the PennCNV quality control criterion requiring the standard deviation of LogR to be less than 0.2. In this sample the standard deviation is 0.26. It appears that the higher sensitivity of PennCNV comes at the price of allowing too many false positives. PennCNV calls an exceptionally high number (85) of CNVs for patient 4, with limited overlap with the other two methods.

We have proposed two new methods for the reconstruction of CNV. Both methods are much faster than PennCNV, the current state-of-the-art method in CNV discovery. The greater accuracy of DPI versus fused-lasso minimization underscores the importance of using BAF measurements and capitalizing on the discrete nature of CNV imputation. DPI has the additional advantage of outputting the allelic copy numbers so helpful in refining the associations between CNVs and phenotypes. It is hardly surprising that DPI exhibits superior performance in the schizophrenia data where its underlying assumptions hold. By contrast in the analysis of tumor cells, it is much more difficult to fix a priori the number of copies. With its flexibility in fitting piecewise constant functions to LogR intensities, the fused lasso will shine in this less discrete setting.

We would like to emphasize that both proposed methods are rough compared to well-established algorithms like PennCNV. There is definitely room for further performance improvements by redefining the loss and penalty functions. As a concrete example, one could modify the fused-lasso penalties to reflect the distances between adjacent SNPs [Li and Zhu (2007)]. We suggest scaling the difference ${\beta}_{i}-{\beta}_{i-1}$ by the reciprocal of the physical distance ${b}_{i}-{b}_{i-1}$. Anyone wanting to use or modify our statistical procedures is welcome to our Fortran source code. Please contact the first author for a copy.

We can expect to see more applications of penalized estimation throughout genomics. In our view, penalized models are more parsimonious than hidden Markov models and achieve many of the same aims. Our redefinition of the fused-lasso penalty and application of the MM algorithm circumvent some of the toughest issues of penalized estimation in the CNV context and have important implications for other application areas such as time series analysis. For more traditional theoretical and numerical approaches to penalized estimation, we recommend the recent survey paper on _{1} trend filtering [Kim et al. (2009)].

We thank Joe de Young for help in reclustering the genotype signal and Jacobine Buizer-Voskamp for providing us with detailed information in real data analysis. We acknowledge the editor and the anonymous reviewer for their constructive comments on this manuscript.

- Bioucas-Diaa JM, Figueiredo MAT, Oliveira JP. Adaptive total variation image deconvolution: A majorization–minimization approach. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'06); Toulouse, France. 2006.
- Candès EJ, Plan Y. Near-ideal model selection by
_{1}minimization. Ann. Statist. 2009;37:2145–2177. MR2543688. - Chan TF, Shen J. Mathematical models for local nontexture inpainting. SIAM J. Appl. Math. 2002;62:1019–1043. MR1897733.
- Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J. QuantiSNP: An objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Research. 2007;35:2013–2025. [PMC free article] [PubMed]
- Conte SD, Deboor C. Elementary Numerical Analysis. McGraw-Hill; New York: 1972.
- Diskin SJ, Li M, Hou C, Yang S, Glessner J, Hakonarson H, Bucan M, Maris JM, Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Research. 2008;36:e126. [PMC free article] [PubMed]
- Donoho DL, Johnstone IM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81:425–455. MR1311089.
- Friedman J, Hastie T, Höfling H, Tibshirani R. Pathwise coordinate optimization. Ann. Appl. Statist. 2007;1:302–332. MR2415737.
- Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer S, Lee C. Detection of large-scale variation in the human genome. Nature Genetics. 2004;36:949–951. [PubMed]
- Jakobsson M, Scholz SW, Scheet P, Gibbs JR, Vanliere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M, Cann HM, Hardy JA, Rosenberg NA, Singleton AB. Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008;451:998–1003. [PubMed]
- Kim S-J, Koh K, Boyd S, Gorinevsky D.
_{1}trend filtering. SIAM Review. 2009;51:339–360. MR2505584. - Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins PJ, Darvishi K, Lee C, Nizzari MM, Gabriel SB, Purcell S, Daly MJ, Altshuler DD. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genetics. 2008;40:1253–1260. [PMC free article] [PubMed]
- Lange K. Optimization. Springer; New York: 2004. MR2072899.
- Li Y, Zhu J. Analysis of array CGH data for cancer studies using fused quantile regression. Bioinformatics. 2007;23:2470–2476. [PubMed]
- Negahban S, Ravikmuar P, Wainwright MJ, Yu B. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. The Neural Information Processing Systems Conference (NIPS'09); Vancouver, Canada. 2009.
- Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, Mac-Donald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME. Global variation in copy number in the human genome. Nature. 2006;444:444–454. [PMC free article] [PubMed]
- Rudin LI, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Physica D. 1992;60:259–268.
- Scharpf RB, Parmigiani G, Pevsner J, Ruczinski I. Hidden Markov models for the assessment of chromosomal alterations using high throughput SNP arrays. Ann. Appl. Statist. 2008;2:687–713. MR2524352. [PMC free article] [PubMed]
- Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson N, Zetterberg A, Wigler M. Large-scale copy number polymorphism in the human genome. Science. 2004;305:525–528. [PubMed]
- Stefansson H, Rujescu D, Cichon S, Pietiläinen OPH, Ingason A, Steinberg S, Fossdal R, Sigurdsson E, Sigmundsson T, Buizer-Voskamp JE, Hansen T, Jakobsen KD, Muglia P, Francks C, Matthews PM, Gylfason A, Halldorsson BV, Gudbjartsson D, Thorgeirsson TE, Sigurdsson A, Jonasdottir A, Jonasdottir A, Bjornsson A, Mattiasdottir S, Blondal T, Haraldsson M, Magnusdottir BB, Giegling I, Möller H-J, Hartmann A, Shianna KV, Ge D, Need AC, Crombie C, Fraser G, Walker N, Lonnqvist J, Suvisaari J, Tuulio-Henriksson A, Paunio T, Toulopoulou T, Bramon E, Di Forti M, Murray R, Ruggeri M, Vassos E, Tosato S, Walshe M, Li T, Vasilescu C, Mühleisen TW, Wang AG, Ullum H, Djurovic S, Melle I, Olesen J, Kiemeney LA, Franke B, Genetic Risk and Outcome in Psychosis (GROUP) Sabatti C, Freimer NB, Gulcher JR, Thorsteinsdottir U, Kong A, Andreassen OA, Ophoff RA, Georgi A, Rietschel M, Werge T, Petursson H, Goldstein DB, Nöthen MM, Peltonen L, Collier DA, St Clair D, Stefansson K. Large recurrent microdeletions associated with schizophrenia. Nature. 2008;455:232–236. [PMC free article] [PubMed]
- Tibshirani R, Wang P. Spatial smoothing and hot spot detection for CGH data using the Fused Lasso. Biostatistics. 2008;9:18–29. [PubMed]
- Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K. Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B. 2005;67:91–108. MR2136641.
- Vrijenhoek T, Buizer-Voskamp JE, van der Stelt I, Strengman E, Genetic Risk and Outcome in Psychosis (GROUP) Consortium. Sabatti C, van Kessel AG, Brunner HG, Ophoff RA, Veltman JA. Recurrent CNVs disrupt three candidate genes in schizophrenia patients. The American Journal of Human Genetics. 2008;83:504–510. [PubMed]
- Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SFA, Hakonarson H, Bucan M. PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Research. 2007;17:1665–1674. [PubMed]
- Wang H, Veldink JH, Blauw H, van den Berg LH, Ophoff RA, Sabatti C. Markov models for inferring copy number variations from genotype data on Illumina platforms. Human Heredity. 2009;68:1–22. [PMC free article] [PubMed]
- Wu TT, Lange K. Coordinate descent algorithm for lasso penalized regression. Ann. Appl. Statist. 2008;2:224–244. MR2415601.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |