In the work of Chari et al. entitled "Effect of active smoking on the human bronchial epithelium transcriptome" the authors use SAGE to identify candidate gene expression changes in bronchial brushings from never, former, and current smokers. These gene expression changes are categorized into those that are reversible or irreversible upon smoking cessation. A subset of these identified genes is validated on an independent cohort using RT-PCR. The authors conclude that their results support the notion of gene expression changes in the lungs of smokers which persist even after an individual has quit.
This correspondence raises questions about the validity of the approach used by the authors to analyze their data. The majority of the reported results suffer deficiencies due to the methods used. The most fundamental of these are explained in detail: biases introduced during data processing, lack of correction for multiple testing, and an incorrect use of clustering for gene discovery. A randomly generated "null" dataset is used to show the consequences of these shortcomings.
Most of Chari et al.'s findings are consistent with what would be expected by chance alone. Although there is clear evidence of reversible changes in gene expression, the majority of those identified appear to be false positives. However, contrary to the authors' claims, no irreversible changes were identified. There is a broad consensus that genetic change due to smoking persists once an individual has quit smoking; unfortunately, this study lacks sufficient scientific rigour to support or refute this hypothesis or identify any specific candidate genes. The pitfalls of large-scale analysis, as exemplified here, may not be unique to Chari et al.