PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1382877)

Clipboard (0)
None

Related Articles

1.  Convergence of Mutation and Epigenetic Alterations Identifies Common Genes in Cancer That Predict for Poor Prognosis  
PLoS Medicine  2008;5(5):e114.
Background
The identification and characterization of tumor suppressor genes has enhanced our understanding of the biology of cancer and enabled the development of new diagnostic and therapeutic modalities. Whereas in past decades, a handful of tumor suppressors have been slowly identified using techniques such as linkage analysis, large-scale sequencing of the cancer genome has enabled the rapid identification of a large number of genes that are mutated in cancer. However, determining which of these many genes play key roles in cancer development has proven challenging. Specifically, recent sequencing of human breast and colon cancers has revealed a large number of somatic gene mutations, but virtually all are heterozygous, occur at low frequency, and are tumor-type specific. We hypothesize that key tumor suppressor genes in cancer may be subject to mutation or hypermethylation.
Methods and Findings
Here, we show that combined genetic and epigenetic analysis of these genes reveals many with a higher putative tumor suppressor status than would otherwise be appreciated. At least 36 of the 189 genes newly recognized to be mutated are targets of promoter CpG island hypermethylation, often in both colon and breast cancer cell lines. Analyses of primary tumors show that 18 of these genes are hypermethylated strictly in primary cancers and often with an incidence that is much higher than for the mutations and which is not restricted to a single tumor-type. In the identical breast cancer cell lines in which the mutations were identified, hypermethylation is usually, but not always, mutually exclusive from genetic changes for a given tumor, and there is a high incidence of concomitant loss of expression. Sixteen out of 18 (89%) of these genes map to loci deleted in human cancers. Lastly, and most importantly, the reduced expression of a subset of these genes strongly correlates with poor clinical outcome.
Conclusions
Using an unbiased genome-wide approach, our analysis has enabled the discovery of a number of clinically significant genes targeted by multiple modes of inactivation in breast and colon cancer. Importantly, we demonstrate that a subset of these genes predict strongly for poor clinical outcome. Our data define a set of genes that are targeted by both genetic and epigenetic events, predict for clinical prognosis, and are likely fundamentally important for cancer initiation or progression.
Stephen Baylin and colleagues show that a combined genetic and epigenetic analysis of breast and colon cancers identifies a number of clinically significant genes targeted by multiple modes of inactivation.
Editors' Summary
Background.
Cancer is one of the developed world's biggest killers—over half a million Americans die of cancer each year, for instance. As a result, there is great interest in understanding the genetic and environmental causes of cancer in order to improve cancer prevention, diagnosis, and treatment.
Cancer begins when cells begin to multiply out of control. DNA is the sequence of coded instructions—genes—for how to build and maintain the body. Certain “tumor suppressor” genes, for instance, help to prevent cancer by preventing tumors from developing, but changes that alter the DNA code sequence—mutations—can profoundly affect how a gene works. Modern techniques of genetic analysis have identified genes such as tumor suppressors that, when mutated, are linked to the development of certain cancers.
Why Was This Study Done?
However, in recent years, it has become increasingly apparent that mutations are neither necessary nor sufficient to explain every case of cancer. This has led researchers to look at so-called epigenetic factors, which also alter how a gene works without altering its DNA sequence. An example of this is “methylation,” which prevents a gene from being expressed—deactivates it—by a chemical tag. Methylation of genes is part of the normal functioning of DNA, but abnormal methylation has been linked with cancer, aging, and some rare birth abnormalities.
Previous analysis of DNA from breast and colon cancer cells had revealed 189 “candidate cancer genes”—mutated genes that were linked to the development of breast and colon cancer. However, it was not clear how those mutations gave rise to cancer, and individual mutations were present in only 5% to 15% of specific tumors. The authors of this study wanted to know whether epigenetic factors such as methylation contributed to causing the cancers.
What Did the Researchers Do and Find?
The researchers first identified 56 of the 189 candidate cancer genes as likely tumor suppressors and then determined that 36 of these genes were methylated and deactivated, often in both breast and colon (laboratory-grown) cancer cells. In nearly all cases, the methylated genes were not active but could be reactivated by being demethylated. They further showed that, in normal colon and breast tissue samples, 18 of the 36 genes were unmethylated and functioned normally, but in cells taken from breast and colon cancer tumors they were methylated.
In contrast to the genetic mutations, the 18 genes were frequently methylated across a range of tumor types, and eight genes were methylated in both the breast and colon cancers. The authors found by reviewing the genetics and epigenetics of those 18 genes in breast and colon cancer that they were either mutated, methylated, or both. A literature review showed that at least six of the 18 genes were known to have tumor suppressor properties, and the authors determined that 16 were located in parts of DNA known to be missing from cells taken from a range of cancer tumors.
Finally, the researchers analyzed data on cancer cases to show that methylation of these 18 genes was correlated with reduced function of these genes in tumors and with a greater likelihood that a cancer will be terminal or spread to other parts of the body.
What Do These Findings Mean?
The researchers considered only the 189 candidate cancer genes found in one previous study and not other genes identified elsewhere. They also did not consider the biological effects of the individual mutations found in those genes. Despite this, they have demonstrated that methylation of specific genes is likely to play a role in the development of breast and/or colon cancer cells either together with mutations or independently, most likely by turning off their tumor suppression function.
More broadly, however, the study adds to the evidence that future analysis of the role of genes in cancer should include epigenetic as well as genetic factors. In addition, the authors have also shown that a number of these genes may be useful for predicting clinical outcomes for a range of tumor types.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050114.
A December 2006 PLoS Medicine Perspective article reviews the value of examining methylation as a factor in common cancers and its use for early detection
The Web site of the American Cancer Society has a wealth of information and resources on a variety of cancers, including breast and colon cancer
Breastcancer.org is a nonprofit organization providing information about breast cancer on the Web, including research news
Cancer Research UK provides information on cancer research
The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins publishes background information on the authors' research on methylation, setting out its potential for earlier diagnosis and better treatment of cancer
doi:10.1371/journal.pmed.0050114
PMCID: PMC2429944  PMID: 18507500
2.  Discovery of EST-SSRs in Lung Cancer: Tagged ESTs with SSRs Lead to Differential Amino Acid and Protein Expression Patterns in Cancerous Tissues 
PLoS ONE  2011;6(11):e27118.
Tandem repeats are found in both coding and non-coding sequences of higher organisms. These sequences can be used in cancer genetics and diagnosis to unravel the genetic basis of tumor formation and progression. In this study, a possible relationship between SSR distributions and lung cancer was studied by comparative analysis of EST-SSRs in normal and lung cancerous tissues. While the EST-SSR distribution was similar between tumorous tissues, this distribution was different between normal and tumorous tissues. Trinucleotides tandem repeats were highly different; the number of trinucleotides in ESTs of lung cancer was 3 times higher than normal tissue. Significant negative correlation between normal and cancerous tissue showed that cancerous tissue generates different types of trinucleotides. GGC and CGC were the more frequent expressed trinucleotides in cancerous tissue, but these SSRs were not expressed in normal tissue. Similar to the EST level, the expression pattern of EST-SSRs-derived amino acids was significantly different between normal and cancerous tissues. Arg, Pro, Ser, Gly, and Lys were the most abundant amino acids in cancerous tissues, and Leu, Cys, Phe, and His were significantly more abundant in normal tissues than in cancerous tissues. Next, the putative functions of triplet SSR-containing genes were analyzed. In cancerous tissue, EST-SSRs produce different types of proteins. Chromodomain helicase DNA binding proteins were one of the major protein products of EST-SSRs in the cancerous library, while these proteins were not produced from EST-SSRs in normal tissue. For the first time, the findings of this study confirmed that EST-SSRs in normal lung tissues are different than in unhealthy tissues, and tagged ESTs with SSRs cause remarkable differences in amino acid and protein expression patterns in cancerous tissue. We suggest that EST-SSRs and EST-SSRs differentially expressed in cancerous tissue may be suitable candidate markers for lung cancer diagnosis and prediction.
doi:10.1371/journal.pone.0027118
PMCID: PMC3208562  PMID: 22073269
3.  Gene Discovery in the Auditory System: Characterization of Additional Cochlear-Expressed Sequences  
To identify genes involved in hearing, 8494 expressed sequence tags (ESTs) were generated from a human fetal cochlear cDNA library in two distinct sequencing projects. Analysis of the first set of 4304 ESTs revealed clones representing 517 known human genes, 41 mammalian genes not previously detected in human tissues, 487 ESTs from other human tissues, and 541 cochlear-specific ESTs (http://hearing.bwh.harvard.edu ). We now report results of a DNA sequence similarity (BLAST) analysis of an additional 4190 cochlear ESTs and a comparison to the first set. Among the 4190 new cochlear ESTs, 959 known human genes were identified; 594 were found only among the new ESTs and 365 were found among ESTs from both sequencing projects. COL1A2 was the most abundant transcript among both sets of ESTs, followed in order by COL3A1, SPARC, EEF1A1, and TPTI. An additional 22 human homologs of known nonhuman mammalian genes and 1595 clusters of ESTs, of which 333 are cochlear-specific, were identified among the new cochlear ESTs. Map positions were determined for 373 of the new cochlear ESTs and revealed 318 additional loci. Forty-nine of the mapped ESTs are located within the genetic interval of 23 deafness loci. Reanalysis of unassigned ESTs from the prior study revealed 338 additional known human genes. The total number of known human genes identified from 8494 cochlear ESTs is 1449 and is represented by 4040 ESTs. Among the known human genes are 14 deafness-associated genes, including GJB2 (connexin 26) and KVLQT1. The total number of nonhuman mammalian genes identified is 43 and is represented by 58 ESTs. The total number of ESTs without sequence similarity to known genes is 4055. Of these, 778 also do not have sequence similarity to any other ESTs, are categorized into 700 clusters, and may represent genes uniquely or preferentially expressed in the cochlea. Identification of additional known genes, ESTs, and cochlear-specific ESTs provides new candidate genes for both syndromic and nonsyndromic deafness disorders.
doi:10.1007/s101620020005
PMCID: PMC3202364  PMID: 12083723
ESTs; genes; cochlea; cochlear-expressed genes
4.  A Genome-Wide Screen for Promoter Methylation in Lung Cancer Identifies Novel Methylation Markers for Multiple Malignancies  
PLoS Medicine  2006;3(12):e486.
Background
Promoter hypermethylation coupled with loss of heterozygosity at the same locus results in loss of gene function in many tumor cells. The “rules” governing which genes are methylated during the pathogenesis of individual cancers, how specific methylation profiles are initially established, or what determines tumor type-specific methylation are unknown. However, DNA methylation markers that are highly specific and sensitive for common tumors would be useful for the early detection of cancer, and those required for the malignant phenotype would identify pathways important as therapeutic targets.
Methods and Findings
In an effort to identify new cancer-specific methylation markers, we employed a high-throughput global expression profiling approach in lung cancer cells. We identified 132 genes that have 5′ CpG islands, are induced from undetectable levels by 5-aza-2′-deoxycytidine in multiple non-small cell lung cancer cell lines, and are expressed in immortalized human bronchial epithelial cells. As expected, these genes were also expressed in normal lung, but often not in companion primary lung cancers. Methylation analysis of a subset (45/132) of these promoter regions in primary lung cancer (n = 20) and adjacent nonmalignant tissue (n = 20) showed that 31 genes had acquired methylation in the tumors, but did not show methylation in normal lung or peripheral blood cells. We studied the eight most frequently and specifically methylated genes from our lung cancer dataset in breast cancer (n = 37), colon cancer (n = 24), and prostate cancer (n = 24) along with counterpart nonmalignant tissues. We found that seven loci were frequently methylated in both breast and lung cancers, with four showing extensive methylation in all four epithelial tumors.
Conclusions
By using a systematic biological screen we identified multiple genes that are methylated with high penetrance in primary lung, breast, colon, and prostate cancers. The cross-tumor methylation pattern we observed for these novel markers suggests that we have identified a partial promoter hypermethylation signature for these common malignancies. These data suggest that while tumors in different tissues vary substantially with respect to gene expression, there may be commonalities in their promoter methylation profiles that represent targets for early detection screening or therapeutic intervention.
John Minna and colleagues report that a group of genes are commonly methylated in primary lung, breast, colon, and prostate cancer.
Editors' Summary
Background.
Tumors or cancers contain cells that have lost many of the control mechanisms that normally regulate their behavior. Unlike normal cells, which only divide to repair damaged tissues, cancer cells divide uncontrollably. They also gain the ability to move round the body and start metastases in secondary locations. These changes in behavior result from alterations in their genetic material. For example, mutations (permanent changes in the sequence of nucleotides in the cell's DNA) in genes known as oncogenes stimulate cells to divide constantly. Mutations in another group of genes—tumor suppressor genes—disable their ability to restrain cell growth. Key tumor suppressor genes are often completely lost in cancer cells. But not all the genetic changes in cancer cells are mutations. Some are “epigenetic” changes—chemical modifications of genes that affect the amount of protein made from them. In cancer cells, methyl groups are often added to CG-rich regions—this is called hypermethylation. These “CpG islands” lie near gene promoters—sequences that control the transcription of DNA into RNA, the template for protein production—and their methylation switches off the promoter. Methylation of the promoter of one copy of a tumor suppressor gene, which often coincides with the loss of the other copy of the gene, is thought to be involved in cancer development.
Why Was This Study Done?
The rules that govern which genes are hypermethylated during the development of different cancer types are not known, but it would be useful to identify any DNA methylation events that occur regularly in common cancers for two reasons. First, specific DNA methylation markers might be useful for the early detection of cancer. Second, identifying these epigenetic changes might reveal cellular pathways that are changed during cancer development and so identify new therapeutic targets. In this study, the researchers have used a systematic biological screen to identify genes that are methylated in many lung, breast, colon, and prostate cancers—all cancers that form in “epithelial” tissues.
What Did the Researchers Do and Find?
The researchers used microarray expression profiling to examine gene expression patterns in several lung cancer and normal lung cell lines. In this technique, labeled RNA molecules isolated from cells are applied to a “chip” carrying an array of gene fragments. Here, they stick to the fragment that represents the gene from which they were made, which allows the genes that the cells express to be catalogued. By comparing the expression profiles of lung cancer cells and normal lung cells before and after treatment with a chemical that inhibits DNA methylation, the researchers identified genes that were methylated in the cancer cells—that is, genes that were expressed in normal cells but not in cancer cells unless methylation was inhibited. 132 of these genes contained CpG islands. The researchers examined the promoters of 45 of these genes in lung cancer cells taken straight from patients and found that 31 of the promoters were methylated in tumor tissues but not in adjacent normal tissues. Finally, the researchers looked at promoter methylation of the eight genes most frequently and specifically methylated in the lung cancer samples in breast, colon, and prostate cancers. Seven of the genes were frequently methylated in both lung and breast cancers; four were extensively methylated in all the tumor types.
What Do These Findings Mean?
These results identify several new genes that are often methylated in four types of epithelial tumor. The observation that these genes are methylated in multiple independent tumors strongly suggests, but does not prove, that loss of expression of the proteins that they encode helps to convert normal cells into cancer cells. The frequency and diverse patterning of promoter methylation in different tumor types also indicates that methylation is not a random event, although what controls the patterns of methylation is not yet known. The identification of these genes is a step toward building a promoter hypermethylation profile for the early detection of human cancer. Furthermore, although tumors in different tissues vary greatly with respect to gene expression patterns, the similarities seen in this study in promoter methylation profiles might help to identify new therapeutic targets common to several cancer types.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0030486.
US National Cancer Institute, information for patients on understanding cancer
CancerQuest, information provided by Emory University about how cancer develops
Cancer Research UK, information for patients on cancer biology
Wikipedia pages on epigenetics (note that Wikipedia is a free online encyclopedia that anyone can edit)
The Epigenome Network of Excellence, background information and latest news about epigenetics
doi:10.1371/journal.pmed.0030486
PMCID: PMC1716188  PMID: 17194187
5.  Genetic Progression and the Waiting Time to Cancer 
PLoS Computational Biology  2007;3(11):e225.
Cancer results from genetic alterations that disturb the normal cooperative behavior of cells. Recent high-throughput genomic studies of cancer cells have shown that the mutational landscape of cancer is complex and that individual cancers may evolve through mutations in as many as 20 different cancer-associated genes. We use data published by Sjöblom et al. (2006) to develop a new mathematical model for the somatic evolution of colorectal cancers. We employ the Wright-Fisher process for exploring the basic parameters of this evolutionary process and derive an analytical approximation for the expected waiting time to the cancer phenotype. Our results highlight the relative importance of selection over both the size of the cell population at risk and the mutation rate. The model predicts that the observed genetic diversity of cancer genomes can arise under a normal mutation rate if the average selective advantage per mutation is on the order of 1%. Increased mutation rates due to genetic instability would allow even smaller selective advantages during tumorigenesis. The complexity of cancer progression can be understood as the result of multiple sequential mutations, each of which has a relatively small but positive effect on net cell growth.
Author Summary
Cancer is a disease of multicellular organisms that is characterized by a breakdown of cooperation between individual cells. The progression of cancer proceeds from a single genetically altered cell to billions of invasive cells through a series of clonal expansions. During tumorigenesis the cancer cells undergo replication and mutation, thereby increasing the size and invasiveness of the tumor. Recent sequencing projects of cancer cells suggest that mutations in up to 20 different genes might be responsible for driving an individual tumor's development. This insight contrasts with most mathematical models of cancer progression, which assume that the cancer phenotype is driven by mutations in only a few genes. We present a new mathematical model in which tumorigenesis is driven by mutations in many genes, most of which confer only a small selective advantage. Specifically, the progression of a benign tumor of the colon (adenoma) to a malignant tumor (carcinoma) is described by a Wright-Fisher process with growing population size. We explore the basic parameters of the model that are consistent with observed data. We also derive an analytical formula for the expected waiting time for the progression from benign to maligant tumor in terms of the population size, the mutation rate, the selective advantage, and the number of susceptible genes.
doi:10.1371/journal.pcbi.0030225
PMCID: PMC2065895  PMID: 17997597
6.  High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome 
BMC Genomics  2008;9:312.
Background
Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation.
Results
With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches.
Conclusion
In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.
doi:10.1186/1471-2164-9-312
PMCID: PMC2483731  PMID: 18590545
7.  Somatic Mutations, Allele Loss, and DNA Methylation of the Cub and Sushi Multiple Domains 1 (CSMD1) Gene Reveals Association with Early Age of Diagnosis in Colorectal Cancer Patients 
PLoS ONE  2013;8(3):e58731.
Background
The Cub and Sushi Multiple Domains 1 (CSMD1) gene, located on the short arm of chromosome 8, codes for a type I transmembrane protein whose function is currently unknown. CSMD1 expression is frequently lost in many epithelial cancers. Our goal was to characterize the relationships between CSMD1 somatic mutations, allele imbalance, DNA methylation, and the clinical characteristics in colorectal cancer patients.
Methods
We sequenced the CSMD1 coding regions in 54 colorectal tumors using the 454FLX pyrosequencing platform to interrogate 72 amplicons covering the entire coding sequence. We used heterozygous SNP allele ratios at multiple CSMD1 loci to determine allelic balance and infer loss of heterozygosity. Finally, we performed methylation-specific PCR on 76 colorectal tumors to determine DNA methylation status for CSMD1 and known methylation targets ALX4, RUNX3, NEUROG1, and CDKN2A.
Results
Using 454FLX sequencing and confirming with Sanger sequencing, 16 CSMD1 somatic mutations were identified in 6 of the 54 colorectal tumors (11%). The nonsynonymous to synonymous mutation ratio of the 16 somatic mutations was 15∶1, a ratio significantly higher than the expected 2∶1 ratio (p = 0.014). This ratio indicates a presence of positive selection for mutations in the CSMD1 protein sequence. CSMD1 allelic imbalance was present in 19 of 37 informative cases (56%). Patients with allelic imbalance and CSMD1 mutations were significantly younger (average age, 41 years) than those without somatic mutations (average age, 68 years). The majority of tumors were methylated at one or more CpG loci within the CSMD1 coding sequence, and CSMD1 methylation significantly correlated with two known methylation targets ALX4 and RUNX3. C:G>T:A substitutions were significantly overrepresented (47%), suggesting extensive cytosine methylation predisposing to somatic mutations.
Conclusions
Deep amplicon sequencing and methylation-specific PCR reveal that CSMD1 alterations can correlate with earlier clinical presentation in colorectal tumors, thus further implicating CSMD1 as a tumor suppressor gene.
doi:10.1371/journal.pone.0058731
PMCID: PMC3591376  PMID: 23505554
8.  Polymorphisms, Mutations, and Amplification of the EGFR Gene in Non-Small Cell Lung Cancers 
PLoS Medicine  2007;4(4):e125.
Background
The epidermal growth factor receptor (EGFR) gene is the prototype member of the type I receptor tyrosine kinase (TK) family and plays a pivotal role in cell proliferation and differentiation. There are three well described polymorphisms that are associated with increased protein production in experimental systems: a polymorphic dinucleotide repeat (CA simple sequence repeat 1 [CA-SSR1]) in intron one (lower number of repeats) and two single nucleotide polymorphisms (SNPs) in the promoter region, −216 (G/T or T/T) and −191 (C/A or A/A). The objective of this study was to examine distributions of these three polymorphisms and their relationships to each other and to EGFR gene mutations and allelic imbalance (AI) in non-small cell lung cancers.
Methods and Findings
We examined the frequencies of the three polymorphisms of EGFR in 556 resected lung cancers and corresponding non-malignant lung tissues from 336 East Asians, 213 individuals of Northern European descent, and seven of other ethnicities. We also studied the EGFR gene in 93 corresponding non-malignant lung tissue samples from European-descent patients from Italy and in peripheral blood mononuclear cells from 250 normal healthy US individuals enrolled in epidemiological studies including individuals of European descent, African–Americans, and Mexican–Americans. We sequenced the four exons (18–21) of the TK domain known to harbor activating mutations in tumors and examined the status of the CA-SSR1 alleles (presence of heterozygosity, repeat number of the alleles, and relative amplification of one allele) and allele-specific amplification of mutant tumors as determined by a standardized semiautomated method of microsatellite analysis. Variant forms of SNP −216 (G/T or T/T) and SNP −191 (C/A or A/A) (associated with higher protein production in experimental systems) were less frequent in East Asians than in individuals of other ethnicities (p < 0.001). Both alleles of CA-SSR1 were significantly longer in East Asians than in individuals of other ethnicities (p < 0.001). Expression studies using bronchial epithelial cultures demonstrated a trend towards increased mRNA expression in cultures having the variant SNP −216 G/T or T/T genotypes. Monoallelic amplification of the CA-SSR1 locus was present in 30.6% of the informative cases and occurred more often in individuals of East Asian ethnicity. AI was present in 44.4% (95% confidence interval: 34.1%–54.7%) of mutant tumors compared with 25.9% (20.6%–31.2%) of wild-type tumors (p = 0.002). The shorter allele in tumors with AI in East Asian individuals was selectively amplified (shorter allele dominant) more often in mutant tumors (75.0%, 61.6%–88.4%) than in wild-type tumors (43.5%, 31.8%–55.2%, p = 0.003). In addition, there was a strong positive association between AI ratios of CA-SSR1 alleles and AI of mutant alleles.
Conclusions
The three polymorphisms associated with increased EGFR protein production (shorter CA-SSR1 length and variant forms of SNPs −216 and −191) were found to be rare in East Asians as compared to other ethnicities, suggesting that the cells of East Asians may make relatively less intrinsic EGFR protein. Interestingly, especially in tumors from patients of East Asian ethnicity, EGFR mutations were found to favor the shorter allele of CA-SSR1, and selective amplification of the shorter allele of CA-SSR1 occurred frequently in tumors harboring a mutation. These distinct molecular events targeting the same allele would both be predicted to result in greater EGFR protein production and/or activity. Our findings may help explain to some of the ethnic differences observed in mutational frequencies and responses to TK inhibitors.
Masaharu Nomura and colleagues examine the distribution ofEGFR polymorphisms in different populations and find differences that might explain different responses to tyrosine kinase inhibitors in lung cancer patients.
Editors' Summary
Background.
Most cases of lung cancer—the leading cause of cancer deaths worldwide—are “non-small cell lung cancer” (NSCLC), which has a very low cure rate. Recently, however, “targeted” therapies have brought new hope to patients with NSCLC. Like all cancers, NSCLC occurs when cells begin to divide uncontrollably because of changes (mutations) in their genetic material. Chemotherapy drugs treat cancer by killing these rapidly dividing cells, but, because some normal tissues are sensitive to these agents, it is hard to kill the cancer completely without causing serious side effects. Targeted therapies specifically attack the changes in cancer cells that allow them to divide uncontrollably, so it might be possible to kill the cancer cells selectively without damaging normal tissues. Epidermal growth factor receptor (EGRF) was one of the first molecules for which a targeted therapy was developed. In normal cells, messenger proteins bind to EGFR and activate its “tyrosine kinase,” an enzyme that sticks phosphate groups on tyrosine (an amino acid) in other proteins. These proteins then tell the cell to divide. Alterations to this signaling system drive the uncontrolled growth of some cancers, including NSCLC.
Why Was This Study Done?
Molecules that inhibit the tyrosine kinase activity of EGFR (for example, gefitinib) dramatically shrink some NSCLCs, particularly those in East Asian patients. Tumors shrunk by tyrosine kinase inhibitors (TKIs) often (but not always) have mutations in EGFR's tyrosine kinase. However, not all tumors with these mutations respond to TKIs, and other genetic changes—for example, amplification (multiple copies) of the EGFR gene—also affect tumor responses to TKIs. It would be useful to know which genetic changes predict these responses when planning treatments for NSCLC and to understand why the frequency of these changes varies between ethnic groups. In this study, the researchers have examined three polymorphisms—differences in DNA sequences that occur between individuals—in the EGFR gene in people with and without NSCLC. In addition, they have looked for associations between these polymorphisms, which are present in every cell of the body, and the EGFR gene mutations and allelic imbalances (genes occur in pairs but amplification or loss of one copy, or allele, often causes allelic imbalance in tumors) that occur in NSCLCs.
What Did the Researchers Do and Find?
The researchers measured how often three EGFR polymorphisms (the length of a repeat sequence called CA-SSR1, and two single nucleotide variations [SNPs])—all of which probably affect how much protein is made from the EGFR gene—occurred in normal tissue and NSCLC tissue from East Asians and individuals of European descent. They also looked for mutations in the EGFR tyrosine kinase and allelic imbalance in the tumors, and then determined which genetic variations and alterations tended to occur together in people with the same ethnicity. Among many associations, the researchers found that shorter alleles of CA-SSR1 and the minor forms of the two SNPs occurred less often in East Asians than in individuals of European descent. They also confirmed that EGFR kinase mutations were more common in NSCLCs in East Asians than in European-descent individuals. Furthermore, mutations occurred more often in tumors with allelic imbalance, and in tumors where there was allelic imbalance and an EGFR mutation, the mutant allele was amplified more often than the wild-type allele.
What Do These Findings Mean?
The researchers use these associations between gene variants and tumor-associated alterations to propose a model to explain the ethnic differences in mutational frequencies and responses to TKIs seen in NSCLC. They suggest that because of the polymorphisms in the EGFR gene commonly seen in East Asians, people from this ethnic group make less EGFR protein than people from other ethnic groups. This would explain why, if a threshold level of EGFR is needed to drive cells towards malignancy, East Asians have a high frequency of amplified EGFR tyrosine kinase mutations in their tumors—mutation followed by amplification would be needed to activate EGFR signaling. This model, though speculative, helps to explain some clinical findings, such as the frequency of EGFR mutations and of TKI sensitivity in NSCLCs in East Asians. Further studies of this type in different ethnic groups and in different tumors, as well as with other genes for which targeted therapies are available, should help oncologists provide personalized cancer therapies for their patients.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040125.
US National Cancer Institute information on lung cancer and on cancer treatment for patients and professionals
MedlinePlus encyclopedia entries on NSCLC
Cancer Research UK information for patients about all aspects of lung cancer, including treatment with TKIs
Wikipedia pages on lung cancer, EGFR, and gefitinib (note that Wikipedia is a free online encyclopedia that anyone can edit)
doi:10.1371/journal.pmed.0040125
PMCID: PMC1876407  PMID: 17455987
9.  Peanut gene expression profiling in developing seeds at different reproduction stages during Aspergillus parasiticus infection 
Background
Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination.
Results
We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression patterns in different libraries and genotypes. A number of sequences were expressed throughout all of the libraries, representing constitutive expressed sequences. In order to identify resistance-related genes with significantly differential expression, a statistical analysis to estimate the relative abundance (R) was used to compare the relative abundance of each gene transcripts in each cDNA library. Thirty six and forty seven unique EST sequences with threshold of R > 4 from libraries of 'GT-C20' and 'Tifrunner', respectively, were selected for examination of temporal gene expression patterns according to EST frequencies. Nine and eight resistance-related genes with significant up-regulation were obtained in 'GT-C20' and 'Tifrunner' libraries, respectively. Among them, three genes were common in both genotypes. Furthermore, a comparison of our EST sequences with other plant sequences in the TIGR Gene Indices libraries showed that the percentage of peanut EST matched to Arabidopsis thaliana, maize (Zea mays), Medicago truncatula, rapeseed (Brassica napus), rice (Oryza sativa), soybean (Glycine max) and wheat (Triticum aestivum) ESTs ranged from 33.84% to 79.46% with the sequence identity ≥ 80%. These results revealed that peanut ESTs are more closely related to legume species than to cereal crops, and more homologous to dicot than to monocot plant species.
Conclusion
The developed ESTs can be used to discover novel sequences or genes, to identify resistance-related genes and to detect the differences among alleles or markers between these resistant and susceptible peanut genotypes. Additionally, this large collection of cultivated peanut EST sequences will make it possible to construct microarrays for gene expression studies and for further characterization of host resistance mechanisms. It will be a valuable genomic resource for the peanut community. The 21,777 ESTs have been deposited to the NCBI GenBank database with accession numbers ES702769 to ES724546.
doi:10.1186/1471-213X-8-12
PMCID: PMC2257936  PMID: 18248674
10.  Identification and evolutionary analysis of novel exons and alternative splicing events using cross-species EST-to-genome comparisons in human, mouse and rat 
BMC Bioinformatics  2006;7:136.
Background
Alternative splicing (AS) is important for evolution and major biological functions in complex organisms. However, the extent of AS in mammals other than human and mouse is largely unknown, making it difficult to study AS evolution in mammals and its biomedical implications.
Results
Here we describe a cross-species EST-to-genome comparison algorithm (ENACE) that can identify novel exons for EST-scanty species and distinguish conserved and lineage-specific exons. The identified exons represent not only novel exons but also evolutionarily meaningful AS events that are not previously annotated. A genome-wide AS analysis in human, mouse and rat using ENACE reveals a total of 758 novel cassette-on exons and 167 novel retained introns that have no EST evidence from the same species. RT-PCR-sequencing experiments validated ~50 ~80% of the tested exons, indicating high presence of exons predicted by ENACE. ENACE is particularly powerful when applied to closely related species. In addition, our analysis shows that the ENACE-identified AS exons tend not to pass the nonsynonymous-to-synonymous substitution ratio test and not to contain protein domain, implying that such exons may be under positive selection or relaxed negative selection. These AS exons may contribute to considerable inter-species functional divergence. Our analysis further indicates that a large number of exons may have been gained or lost during mammalian evolution. Moreover, a functional analysis shows that inter-species divergence of AS events may be substantial in protein carriers and receptor proteins in mammals. These exons may be of interest to studies of AS evolution. The ENACE programs and sequences of the ENACE-identified AS events are available for download.
Conclusion
ENACE can identify potential novel cassette exons and retained introns between closely related species using a comparative approach. It can also provide information regarding lineage- or species-specificity in transcript isoforms, which are important for evolutionary and functional studies.
doi:10.1186/1471-2105-7-136
PMCID: PMC1479377  PMID: 16536879
11.  Mutational Signatures of De-Differentiation in Functional Non-Coding Regions of Melanoma Genomes 
PLoS Genetics  2012;8(8):e1002871.
Much emphasis has been placed on the identification, functional characterization, and therapeutic potential of somatic variants in tumor genomes. However, the majority of somatic variants lie outside coding regions and their role in cancer progression remains to be determined. In order to establish a system to test the functional importance of non-coding somatic variants in cancer, we created a low-passage cell culture of a metastatic melanoma tumor sample. As a foundation for interpreting functional assays, we performed whole-genome sequencing and analysis of this cell culture, the metastatic tumor from which it was derived, and the patient-matched normal genomes. When comparing somatic mutations identified in the cell culture and tissue genomes, we observe concordance at the majority of single nucleotide variants, whereas copy number changes are more variable. To understand the functional impact of non-coding somatic variation, we leveraged functional data generated by the ENCODE Project Consortium. We analyzed regulatory regions derived from multiple different cell types and found that melanocyte-specific regions are among the most depleted for somatic mutation accumulation. Significant depletion in other cell types suggests the metastatic melanoma cells de-differentiated to a more basal regulatory state. Experimental identification of genome-wide regulatory sites in two different melanoma samples supports this observation. Together, these results show that mutation accumulation in metastatic melanoma is nonrandom across the genome and that a de-differentiated regulatory architecture is common among different samples. Our findings enable identification of the underlying genetic components of melanoma and define the differences between a tissue-derived tumor sample and the cell culture created from it. Such information helps establish a broader mechanistic understanding of the linkage between non-coding genomic variations and the cellular evolution of cancer.
Author Summary
Here we investigate the relationship between somatic variants and non-coding regulatory regions. To do this, we develop a new algorithm for identifying single nucleotide somatic variants in whole-genome sequencing data and apply it to a metastatic melanoma sample and a cell culture derived from this sample. Our results show that the two genomes are similar at the level of single nucleotide changes and more variable at larger copy number changes. We further observe that patterns of somatic mutation accumulation in non-coding regulatory regions suggests that the metastatic melanoma cells de-differentiated into a more basal regulatory state. That is, by simply looking at mutation accumulation across cell-type-specific non-coding functional regions, one can clearly see patterns that are indicative of cell state de-differentiation. Results from genome-wide functional regulatory region experimental mapping support this observation.
doi:10.1371/journal.pgen.1002871
PMCID: PMC3415438  PMID: 22912592
12.  A Global View of Cancer-Specific Transcript Variants by Subtractive Transcriptome-Wide Analysis 
PLoS ONE  2009;4(3):e4732.
Background
Alternative pre-mRNA splicing (AS) plays a central role in generating complex proteomes and influences development and disease. However, the regulation and etiology of AS in human tumorigenesis is not well understood.
Methodology/Principal Findings
A Basic Local Alignment Search Tool database was constructed for the expressed sequence tags (ESTs) from all available databases of human cancer and normal tissues. An insertion or deletion in the alignment of EST/EST was used to identify alternatively spliced transcripts. Alignment of the ESTs with the genomic sequence was further used to confirm AS. Alternatively spliced transcripts in each tissue were then subtractively cross-screened to obtain tissue-specific variants. We systematically identified and characterized cancer/tissue-specific and alternatively spliced variants in the human genome based on a global view. We identified 15,093 cancer-specific variants of 9,989 genes from 27 types of human cancers and 14,376 normal tissue-specific variants of 7,240 genes from 35 normal tissues, which cover the main types of human tumors and normal tissues. Approximately 70% of these transcripts are novel. These data were integrated into a database HCSAS (http://202.114.72.39/database/human.html, pass:68756253). Moreover, we observed that the cancer-specific AS of both oncogenes and tumor suppressor genes are associated with specific cancer types. Cancer shows a preference in the selection of alternative splice-sites and utilization of alternative splicing types.
Conclusions/Significance
These features of human cancer, together with the discovery of huge numbers of novel splice forms for cancer-associated genes, suggest an important and global role of cancer-specific AS during human tumorigenesis. We advise the use of cancer-specific alternative splicing as a potential source of new diagnostic, prognostic, predictive, and therapeutic tools for human cancer. The global view of cancer-specific AS is not only useful for exploring the complexity of the cancer transcriptome but also widens the eyeshot of clinical research.
doi:10.1371/journal.pone.0004732
PMCID: PMC2648985  PMID: 19266097
13.  Mutational hotspots in the TP53 gene and, possibly, other tumor suppressors evolve by positive selection 
Biology Direct  2006;1:4.
Background
The mutation spectra of the TP53 gene and other tumor suppressors contain multiple hotspots, i.e., sites of non-random, frequent mutation in tumors and/or the germline. The origin of the hotspots remains unclear, the general view being that they represent highly mutable nucleotide contexts which likely reflect effects of different endogenous and exogenous factors shaping the mutation process in specific tissues. The origin of hotspots is of major importance because it has been suggested that mutable contexts could be used to infer mechanisms of mutagenesis contributing to tumorigenesis.
Results
Here we apply three independent tests, accounting for non-uniform base compositions in synonymous and non-synonymous sites, to test whether the hotspots emerge via selection or due to mutational bias. All three tests consistently indicate that the hotspots in the TP53 gene evolve, primarily, via positive selection. The results were robust to the elimination of the highly mutable CpG dinucleotides. By contrast, only one, the least conservative test reveals the signature of positive selection in BRCA1, BRCA2, and p16. Elucidation of the origin of the hotspots in these genes requires more data on somatic mutations in tumors.
Conclusion
The results of this analysis seem to indicate that positive selection for gain-of-function in tumor suppressor genes is an important aspect of tumorigenesis, blurring the distinction between tumor suppressors and oncogenes.
Reviewers
This article was reviewed by Sandor Pongor, Christopher Lee and Mikhail Blagosklonny.
doi:10.1186/1745-6150-1-4
PMCID: PMC1403748  PMID: 16542006
14.  Important role of indels in somatic mutations of human cancer genes 
BMC Medical Genetics  2010;11:128.
Background
Cancer is clonal proliferation that arises owing to mutations in a subset of genes that confer growth advantage. More and more cancer related genes are found to have accumulated somatic mutations. However, little has been reported about mutational patterns of insertions/deletions (indels) in these genes.
Results
We analyzed indels' abundance and distribution, the relative ratio between indels and somatic base substitutions and the association between those two forms of mutations in a large number of somatic mutations in the Catalogue of Somatic Mutations in Cancer database. We found a strong correlation between indels and base substitutions in cancer-related genes and showed that they tend to concentrate at the same locus in the coding sequences within the same samples. More importantly, a much higher proportion of indels were observed in somatic mutations, as compared to meiotic ones. Furthermore, our analysis demonstrated a great diversity of indels at some loci of cancer-related genes. Particularly in the genes with abundant mutations, the proportion of 3n indels in oncogenes is 7.9 times higher than that in tumor suppressor genes.
Conclusions
There are three distinct patterns of indel distribution in somatic mutations: high proportion, great abundance and non-random distribution. Because of the great influence of indels on gene function (e.g., the effect of frameshift mutation), these patterns indicate that indels are frequently under positive selection and can often be the 'driver mutations' in oncogenesis. Such driver forces can better explain why much less frameshift mutations are in oncogenes while much more in tumor suppressor genes, because of their different function in oncogenesis. These findings contribute to our understanding of mutational patterns and the relationship between indels and cancer.
doi:10.1186/1471-2350-11-128
PMCID: PMC2940769  PMID: 20807447
15.  Development and characterisation of an expressed sequence tags (EST)-derived single nucleotide polymorphisms (SNPs) resource in rainbow trout 
BMC Genomics  2012;13:238.
Background
There is considerable interest in developing high-throughput genotyping with single nucleotide polymorphisms (SNPs) for the identification of genes affecting important ecological or economical traits. SNPs are evenly distributed throughout the genome and are likely to be functionally relevant. In rainbow trout, in silico screening of EST databases represents an attractive approach for de novo SNP identification. Nevertheless, EST sequencing errors and assembly of EST paralogous sequences can lead to the identification of false positive SNPs which renders the reliability of EST-derived SNPs relatively low. Further validation of EST-derived SNPs is therefore required. The objective of this work was to assess the quality of and to validate a large number of rainbow trout EST-derived SNPs.
Results
A panel of 1,152 EST-derived SNPs was selected from the INRA Sigenae SNP database and was genotyped in standard and double haploid individuals from several populations using the Illumina GoldenGate BeadXpress assay. High-quality genotyping data were obtained for 958 SNPs representing a genotyping success rate of 83.2 %, out of which, 350 SNPs (36.5 %) were polymorphic in at least one population and were designated as true SNPs. They also proved to be a potential tool to investigate genetic diversity of the species, as the set of SNP successfully sorted individuals into three main groups using STRUCTURE software. Functional annotations revealed 28 non-synonymous SNPs, out of which four substitutions were predicted to affect protein functions. A subset of 223 true SNPs were polymorphic in the two INRA mapping reference families and were integrated into the INRA microsatellite-based linkage map.
Conclusions
Our results represent the first study of EST-derived SNPs validation in rainbow trout, a species whose genome sequences is not yet available. We designed several specific filters in order to improve the genotyping yield. Nevertheless, our selection criteria should be further improved in order to reduce the observed high rate of false positive SNPs which results from the occurrence of whole genome duplications.
doi:10.1186/1471-2164-13-238
PMCID: PMC3536561  PMID: 22694767
16.  A systems approach defining constraints of the genome architecture on lineage selection and evolvability during somatic cancer evolution 
Biology Open  2012;2(1):49-62.
Summary
Most clinically distinguishable malignant tumors are characterized by specific mutations, specific patterns of chromosomal rearrangements and a predominant mechanism of genetic instability but it remains unsolved whether modifications of cancer genomes can be explained solely by mutations and selection through the cancer microenvironment.
It has been suggested that internal dynamics of genomic modifications as opposed to the external evolutionary forces have a significant and complex impact on Darwinian species evolution. A similar situation can be expected for somatic cancer evolution as molecular key mechanisms encountered in species evolution also constitute prevalent mutation mechanisms in human cancers. This assumption is developed into a systems approach of carcinogenesis which focuses on possible inner constraints of the genome architecture on lineage selection during somatic cancer evolution. The proposed systems approach can be considered an analogy to the concept of evolvability in species evolution.
The principal hypothesis is that permissive or restrictive effects of the genome architecture on lineage selection during somatic cancer evolution exist and have a measurable impact. The systems approach postulates three classes of lineage selection effects of the genome architecture on somatic cancer evolution: i) effects mediated by changes of fitness of cells of cancer lineage, ii) effects mediated by changes of mutation probabilities and iii) effects mediated by changes of gene designation and physical and functional genome redundancy. Physical genome redundancy is the copy number of identical genetic sequences. Functional genome redundancy of a gene or a regulatory element is defined as the number of different genetic elements, regardless of copy number, coding for the same specific biological function within a cancer cell. Complex interactions of the genome architecture on lineage selection may be expected when modifications of the genome architecture have multiple and possibly opposed effects which manifest themselves at disparate times and progression stages.
Dissection of putative mechanisms mediating constraints exerted by the genome architecture on somatic cancer evolution may provide an algorithm for understanding and predicting as well as modifying somatic cancer evolution in individual patients.
doi:10.1242/bio.20122543
PMCID: PMC3545268  PMID: 23336076
Carcinogenesis; Evolvability; Genome architecture; Somatic cancer evolution
17.  Cancer Evolution Is Associated with Pervasive Positive Selection on Globally Expressed Genes 
PLoS Genetics  2014;10(3):e1004239.
Cancer is an evolutionary process in which cells acquire new transformative, proliferative and metastatic capabilities. A full understanding of cancer requires learning the dynamics of the cancer evolutionary process. We present here a large-scale analysis of the dynamics of this evolutionary process within tumors, with a focus on breast cancer. We show that the cancer evolutionary process differs greatly from organismal (germline) evolution. Organismal evolution is dominated by purifying selection (that removes mutations that are harmful to fitness). In contrast, in the cancer evolutionary process the dominance of purifying selection is much reduced, allowing for a much easier detection of the signals of positive selection (adaptation). We further show that, as a group, genes that are globally expressed across human tissues show a very strong signal of positive selection within tumors. Indeed, known cancer genes are enriched for global expression patterns. Yet, positive selection is prevalent even on globally expressed genes that have not yet been associated with cancer, suggesting that globally expressed genes are enriched for yet undiscovered cancer related functions. We find that the increased positive selection on globally expressed genes within tumors is not due to their expression in the tissue relevant to the cancer. Rather, such increased adaptation is likely due to globally expressed genes being enriched in important housekeeping and essential functions. Thus, our results suggest that tumor adaptation is most often mediated through somatic changes to those genes that are important for the most basic cellular functions. Together, our analysis reveals the uniqueness of the cancer evolutionary process and the particular importance of globally expressed genes in driving cancer initiation and progression.
Author Summary
Cancer is a short-term evolutionary process that occurs within our bodies. Here, we demonstrate that the cancer evolutionary process differs greatly from other evolutionary processes. Most evolutionary processes are dominated by purifying selection (that removes harmful mutations). In contrast, in cancer evolution the dominance of purifying selection is much reduced, allowing for an easier detection of the signals of positive selection (that increases the likelihood beneficial mutations will persist). Mutations affected by positive selection within tumors are particularly interesting, as these are the mutations that allow cancer cells to acquire new capabilities important for transformation, tumor maintenance, drug resistance and metastasis. We demonstrate that, within tumors, positive selection strongly affects somatic mutations occurring within genes that are expressed globally, across all human tissues. Fitting with this, we show that genes that are already known to be involved in cancer tend to more often be globally expressed across tissues. However, even when such known cancer genes are removed from consideration, there is significantly more positive selection on the remaining globally expressed genes, suggesting that they are enriched for yet undiscovered cancer related functions. The results we present are important both for understanding cancer as an evolutionary process and to the continuing quest to identify new genes and pathways contributing to cancer.
doi:10.1371/journal.pgen.1004239
PMCID: PMC3945297  PMID: 24603726
18.  Meta-analytical biomarker search of EST expression data reveals three differentially expressed candidates 
BMC Genomics  2012;13(Suppl 7):S12.
Background
Researches have been conducted for the identification of differentially expressed genes (DEGs) by generating and mining of cDNA expressed sequence tags (ESTs) for more than a decade. Although the availability of public databases make possible the comprehensive mining of DEGs among the ESTs from multiple tissue types, existing studies usually employed statistics suitable only for two categories. Multi-class test has been developed to enable the finding of tissue specific genes, but subsequent search for cancer genes involves separate two-category test only on the ESTs of the tissue of interest. This constricts the amount of data used. On the other hand, simple pooling of cancer and normal genes from multiple tissue types runs the risk of Simpson's paradox. Here we presented a different approach which searched for multi-cancer DEG candidates by analyzing all pertinent ESTs in all categories and narrowing down the cancer biomarker candidates via integrative analysis with microarray data and selection of secretory and membrane protein genes as well as incorporation of network analysis. Finally, the differential expression patterns of three selected cancer biomarker candidates were confirmed by real-time qPCR analysis.
Results
Seven hundred and twenty three primary DEG candidates (p-value < 0.05 and lower bound of confidence interval of odds ratio ≧ 1.65) were selected from a curated EST database with the application of Cochran-Mantel-Haenszel statistic (CMH). GeneGO analysis results indicated this set as neoplasm enriched. Cross-examination with microarray data further narrowed the list down to 235 genes, among which 96 had membrane or secretory annotations. After examined the candidates in protein interaction network, public tissue expression databases, and literatures, we selected three genes for further evaluation by real-time qPCR with eight major normal and cancer tissues. The higher-than-normal tissue expression of COL3A1, DLG3, and RNF43 in some of the cancer tissues is in agreement with our in silico predictions.
Conclusions
Searching digitized transcriptome using CMH enabled us to identify multi-cancer differentially expressed gene candidates. Our methodology demonstrated simultaneously analysis for cancer biomarkers of multiple tissue types with the EST data. With the revived interest in digitizing the transcriptomes by NGS, cancer biomarkers could be more precisely detected from the ESTs. The three candidates identified in this study, COL3A1, DLG3, and RNF43, are valuable targets for further evaluation with a larger sample size of normal and cancer tissue or serum samples.
doi:10.1186/1471-2164-13-S7-S12
PMCID: PMC3521215  PMID: 23282184
19.  Verification of predicted alternatively spliced Wnt genes reveals two new splice variants (CTNNB1 and LRP5) and altered Axin-1 expression during tumour progression 
BMC Genomics  2006;7:148.
Background
Splicing processes might play a major role in carcinogenesis and tumour progression. The Wnt pathway is of crucial relevance for cancer progression. Therefore we focussed on the Wnt/β-catenin signalling pathway in order to validate the expression of sequences predicted as alternatively spliced by bioinformatic methods. Splice variants of its key molecules were selected, which may be critical components for the understanding of colorectal tumour progression and may have the potential to act as biological markers. For some of the Wnt pathway genes the existence of splice variants was either proposed (e.g. β-Catenin and CTNNB1) or described only in non-colon tissues (e.g. GSK3β) or hitherto not published (e.g. LRP5).
Results
Both splice variants – normal and alternative form – of all selected Wnt pathway components were found to be expressed in cell lines as well as in samples derived from tumour, normal and healthy tissues. All splice positions corresponded totally with the bioinformatical prediction as shown by sequencing. Two hitherto not described alternative splice forms (CTNNB1 and LRP5) were detected. Although the underlying EST data used for the bioinformatic analysis suggested a tumour-specific expression neither a qualitative nor a significant quantitative difference between the expression in tumour and healthy tissues was detected. Axin-1 expression was reduced in later stages and in samples from carcinomas forming distant metastases.
Conclusion
We were first to describe that splice forms of crucial genes of the Wnt-pathway are expressed in human colorectal tissue. Newly described splicefoms were found for β-Catenin, LRP5, GSK3β, Axin-1 and CtBP1. However, the predicted cancer specificity suggested by the origin of the underlying ESTs was neither qualitatively nor significant quantitatively confirmed. That let us to conclude that EST sequence data can give adequate hints for the existence of alternative splicing in tumour tissues. That no difference in the expression of these splice forms between cancerous tissues and normal mucosa was found, may indicate that the existence of different splice forms is of less significance for cancer formation as suggested by the available EST data. The currently available EST source is still insufficient to clearly deduce colon cancer specificity. More EST data from colon (tumour and healthy) is required to make reliable predictions.
doi:10.1186/1471-2164-7-148
PMCID: PMC1523213  PMID: 16772034
20.  Multi Step Selection in Ig H Chains is Initially Focused on CDR3 and Then on Other CDR Regions 
Affinity maturation occurs through two selection processes: the choice of appropriate clones (clonal selection), and the internal evolution within clones, induced by somatic hyper-mutations, where high affinity mutants are selected for. When a final population of immunoglobulin sequences is observed, the genetic composition of this population is affected by a combination of these two processes. Different immune induced diseases can result from the failure of regulation of clonal selection or of the regulation of the within clone affinity maturation. In order to understand each of these processes separately, we propose a mixed lineage tree/sequence based method to detect within clone selection as defined by the effect of mutations on the average number of offspring. Specifically, we measure the imbalance in the number of leaves in lineage trees branches following synonymous and non-synonymous (NS) mutations. If a mutation is positively selected, we expect the number of leaves in the sub-tree below this mutation to be larger than in the parallel sub-tree without the mutation. The ratio between the number of leaves in such branches following NS mutations can be used to measure selection within a clone. We apply this method to the sampled Ig repertoire from multiple healthy volunteers and show that within clone selection is positive in the CDR2 region and either positive or negative in the CDR3 and FWR3 regions. Selection occurs already at the IgM isotype level mainly in the DH gene region, with a strong negative selection in the join region. This is followed in the later memory stages in the CDR2 region. We have not studied here the FWR1 and CDR1 regions. An important advantage of this method is that it is very weakly affected by the baseline mutation model or by sampling biases, as are most synonymous to NS mutations ratio based methods.
doi:10.3389/fimmu.2013.00274
PMCID: PMC3775539  PMID: 24062742
adaptive evolution; phylogenetic tree; immune system; micro-evolution; tree shapes
21.  Intra-tumor Genetic Heterogeneity and Mortality in Head and Neck Cancer: Analysis of Data from The Cancer Genome Atlas 
PLoS Medicine  2015;12(2):e1001786.
Background
Although the involvement of intra-tumor genetic heterogeneity in tumor progression, treatment resistance, and metastasis is established, genetic heterogeneity is seldom examined in clinical trials or practice. Many studies of heterogeneity have had prespecified markers for tumor subpopulations, limiting their generalizability, or have involved massive efforts such as separate analysis of hundreds of individual cells, limiting their clinical use. We recently developed a general measure of intra-tumor genetic heterogeneity based on whole-exome sequencing (WES) of bulk tumor DNA, called mutant-allele tumor heterogeneity (MATH). Here, we examine data collected as part of a large, multi-institutional study to validate this measure and determine whether intra-tumor heterogeneity is itself related to mortality.
Methods and Findings
Clinical and WES data were obtained from The Cancer Genome Atlas in October 2013 for 305 patients with head and neck squamous cell carcinoma (HNSCC), from 14 institutions. Initial pathologic diagnoses were between 1992 and 2011 (median, 2008). Median time to death for 131 deceased patients was 14 mo; median follow-up of living patients was 22 mo. Tumor MATH values were calculated from WES results. Despite the multiple head and neck tumor subsites and the variety of treatments, we found in this retrospective analysis a substantial relation of high MATH values to decreased overall survival (Cox proportional hazards analysis: hazard ratio for high/low heterogeneity, 2.2; 95% CI 1.4 to 3.3). This relation of intra-tumor heterogeneity to survival was not due to intra-tumor heterogeneity’s associations with other clinical or molecular characteristics, including age, human papillomavirus status, tumor grade and TP53 mutation, and N classification. MATH improved prognostication over that provided by traditional clinical and molecular characteristics, maintained a significant relation to survival in multivariate analyses, and distinguished outcomes among patients having oral-cavity or laryngeal cancers even when standard disease staging was taken into account. Prospective studies, however, will be required before MATH can be used prognostically in clinical trials or practice. Such studies will need to examine homogeneously treated HNSCC at specific head and neck subsites, and determine the influence of cancer therapy on MATH values. Analysis of MATH and outcome in human-papillomavirus-positive oropharyngeal squamous cell carcinoma is particularly needed.
Conclusions
To our knowledge this study is the first to combine data from hundreds of patients, treated at multiple institutions, to document a relation between intra-tumor heterogeneity and overall survival in any type of cancer. We suggest applying the simply calculated MATH metric of heterogeneity to prospective studies of HNSCC and other tumor types.
In this study, Rocco and colleagues examine data collected as part of a large, multi-institutional study, to validate a measure of tumor heterogeneity called MATH and determine whether intra-tumor heterogeneity is itself related to mortality.
Editors’ Summary
Background
Normally, the cells in human tissues and organs only reproduce (a process called cell division) when new cells are needed for growth or to repair damaged tissues. But sometimes a cell somewhere in the body acquires a genetic change (mutation) that disrupts the control of cell division and allows the cell to grow continuously. As the mutated cell grows and divides, it accumulates additional mutations that allow it to grow even faster and eventually from a lump, or tumor (cancer). Other mutations subsequently allow the tumor to spread around the body (metastasize) and destroy healthy tissues. Tumors can arise anywhere in the body—there are more than 200 different types of cancer—and about one in three people will develop some form of cancer during their lifetime. Many cancers can now be successfully treated, however, and people often survive for years after a diagnosis of cancer before, eventually, dying from another disease.
Why Was This Study Done?
The gradual acquisition of mutations by tumor cells leads to the formation of subpopulations of cells, each carrying a different set of mutations. This “intra-tumor heterogeneity” can produce tumor subclones that grow particularly quickly, that metastasize aggressively, or that are resistant to cancer treatments. Consequently, researchers have hypothesized that high intra-tumor heterogeneity leads to worse clinical outcomes and have suggested that a simple measure of this heterogeneity would be a useful addition to the cancer staging system currently used by clinicians for predicting the likely outcome (prognosis) of patients with cancer. Here, the researchers investigate whether a measure of intra-tumor heterogeneity called “mutant-allele tumor heterogeneity” (MATH) is related to mortality (death) among patients with head and neck squamous cell carcinoma (HNSCC)—cancers that begin in the cells that line the moist surfaces inside the head and neck, such as cancers of the mouth and the larynx (voice box). MATH is based on whole-exome sequencing (WES) of tumor and matched normal DNA. WES uses powerful DNA-sequencing systems to determine the variations of all the coding regions (exons) of the known genes in the human genome (genetic blueprint).
What Did the Researchers Do and Find?
The researchers obtained clinical and WES data for 305 patients who were treated in 14 institutions, primarily in the US, after diagnosis of HNSCC from The Cancer Genome Atlas, a catalog established by the US National Institutes of Health to map the key genomic changes in major types and subtypes of cancer. They calculated tumor MATH values for the patients from their WES results and retrospectively analyzed whether there was an association between the MATH values and patient survival. Despite the patients having tumors at various subsites and being given different treatments, every 10% increase in MATH value corresponded to an 8.8% increased risk (hazard) of death. Using a previously defined MATH-value cutoff to distinguish high- from low-heterogeneity tumors, compared to patients with low-heterogeneity tumors, patients with high-heterogeneity tumors were more than twice as likely to die (a hazard ratio of 2.2). Other statistical analyses indicated that MATH provided improved prognostic information compared to that provided by established clinical and molecular characteristics and human papillomavirus (HPV) status (HPV-positive HNSCC at some subsites has a better prognosis than HPV-negative HNSCC). In particular, MATH provided prognostic information beyond that provided by standard disease staging among patients with mouth or laryngeal cancers.
What Do These Findings Mean?
By using data from more than 300 patients treated at multiple institutions, these findings validate the use of MATH as a measure of intra-tumor heterogeneity in HNSCC. Moreover, they provide one of the first large-scale demonstrations that intra-tumor heterogeneity is clinically important in the prognosis of any type of cancer. Before the MATH metric can be used in clinical trials or in clinical practice as a prognostic tool, its ability to predict outcomes needs to be tested in prospective studies that examine the relation between MATH and the outcomes of patients with identically treated HNSCC at specific head and neck subsites, that evaluate the use of MATH for prognostication in other tumor types, and that determine the influence of cancer treatments on MATH values. Nevertheless, these findings suggest that MATH should be considered as a biomarker for survival in HNSCC and other tumor types, and raise the possibility that clinicians could use MATH values to decide on the best treatment for individual patients and to choose patients for inclusion in clinical trials.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001786.
The US National Cancer Institute (NCI) provides information about cancer and how it develops and about head and neck cancer (in English and Spanish)
Cancer Research UK, a not-for-profit organization, provides general information about cancer and how it develops, and detailed information about head and neck cancer; the Merseyside Regional Head and Neck Cancer Centre provides patient stories about HNSCC
Wikipedia provides information about tumor heterogeneity, and about whole-exome sequencing (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
Information about The Cancer Genome Atlas is available
A PLOS Blog entry by Jessica Wapner explains more about MATH
doi:10.1371/journal.pmed.1001786
PMCID: PMC4323109  PMID: 25668320
22.  Rarity of Somatic Mutation and Frequency of Normal Sequence Variation Detected in Sporadic Colon Adenocarcinoma Using High-Throughput cDNA Sequencing 
We performed high-throughput cDNA sequencing in colorectal adenocarcinoma and matching normal colorectal epithelium. All six hundred three genes in the UCSC database that were expressed in colon cancers and contained open reading frames of 1000 nucleotides or less were selected for study (total basepairs/bp, 366,686). 304,350 of these 366,686 bp (83.0%) were amplified and sequenced successfully. Seventy-eight sequence variants present in germline (i.e. normal) as well as matching somatic (i.e. tumor) DNA were discovered, yielding a frequency of 1 variant per 3,902 bp. Fifty-one of these sequence variants were homozygous (26 synonymous, 25 non-synonymous), while 27 were heterozygous (11 synonymous, 16 non-synonymous). Cancer tissue contained only one sequence-altered allele of the gene ATP50, which was present heterozygously alongside the wild-type allele in matching normal epithelium. Despite this relatively large number of bp and genes sequenced, no somatic mutations unique to tumor were found. High-throughput cDNA sequencing is a practical approach for detecting novel sequence variations and alterations in human tumors, such as those of the colon.
PMCID: PMC2287164  PMID: 18389087
23.  Rarity of Somatic Mutation and Frequency of Normal Sequence Variation Detected in Sporadic Colon Adenocarcinoma Using High-Throughput cDNA Sequencing 
We performed high-throughput cDNA sequencing in colorectal adenocarcinoma and matching normal colorectal epithelium. All six hundred three genes in the UCSC database that were expressed in colon cancers and contained open reading frames of 1000 nucleotides or less were selected for study (total basepairs/bp, 366,686). 304,350 of these 366,686 bp (83.0%) were amplified and sequenced successfully. Seventy-eight sequence variants present in germline (i.e. normal) as well as matching somatic (i.e. tumor) DNA were discovered, yielding a frequency of 1 variant per 3,902 bp. Fifty-one of these sequence variants were homozygous (26 synonymous, 25 non-synonymous), while 27 were heterozygous (11 synonymous, 16 non-synonymous). Cancer tissue contained only one sequence-altered allele of the gene ATP50, which was present heterozygously alongside the wild-type allele in matching normal epithelium. Despite this relatively large number of bp and genes sequenced, no somatic mutations unique to tumor were found. High-throughput cDNA sequencing is a practical approach for detecting novel sequence variations and alterations in human tumors, such as those of the colon.
PMCID: PMC2287164  PMID: 18389087
24.  Analysis of Gene Expression Profiles in Leaf Tissues of Cultivated Peanuts and Development of EST-SSR Markers and Gene Discovery 
Peanut is vulnerable to a range of foliar diseases such as spotted wilt caused by Tomato spotted wilt virus (TSWV), early (Cercospora arachidicola) and late (Cercosporidium personatum) leaf spots, southern stem rot (Sclerotium rolfsii), and sclerotinia blight (Sclerotinia minor). In this study, we report the generation of 17,376 peanut expressed sequence tags (ESTs) from leaf tissues of a peanut cultivar (Tifrunner, resistant to TSWV and leaf spots) and a breeding line (GT-C20, susceptible to TSWV and leaf spots). After trimming vector and discarding low quality sequences, a total of 14,432 high-quality ESTs were selected for further analysis and deposition to GenBank. Sequence clustering resulted in 6,888 unique ESTs composed of 1,703 tentative consensus (TCs) sequences and 5185 singletons. A large number of ESTs (5717) representing genes of unknown functions were also identified. Among the unique sequences, there were 856 EST-SSRs identified. A total of 290 new EST-based SSR markers were developed and examined for amplification and polymorphism in cultivated peanut and wild species. Resequencing information of selected amplified alleles revealed that allelic diversity could be attributed mainly to differences in repeat type and length in the SSR regions. In addition, a few additional INDEL mutations and substitutions were observed in the regions flanking the microsatellite regions. In addition, some defense-related transcripts were also identified, such as putative oxalate oxidase (EU024476) and NBS-LRR domains. EST data in this study have provided a new source of information for gene discovery and development of SSR markers in cultivated peanut. A total of 16931 ESTs have been deposited to the NCBI GenBank database with accession numbers ES751523 to ES768453.
doi:10.1155/2009/715605
PMCID: PMC2703745  PMID: 19584933
25.  NSAIDs Modulate CDKN2A, TP53, and DNA Content Risk for Progression to Esophageal Adenocarcinoma 
PLoS Medicine  2007;4(2):e67.
Background
Somatic genetic CDKN2A, TP53, and DNA content abnormalities are common in many human cancers and their precursors, including esophageal adenocarcinoma (EA) and Barrett's esophagus (BE), conditions for which aspirin and other nonsteroidal anti-inflammatory drugs (NSAIDs) have been proposed as possible chemopreventive agents; however, little is known about the ability of a biomarker panel to predict progression to cancer nor how NSAID use may modulate progression. We aimed to evaluate somatic genetic abnormalities with NSAIDs as predictors of EA in a prospective cohort study of patients with BE.
Methods and Findings
Esophageal biopsies from 243 patients with BE were evaluated at baseline for TP53 and CDKN2A (p16) alterations, tetraploidy, and aneuploidy using sequencing; loss of heterozygosity (LOH); methylation-specific PCR; and flow cytometry. At 10 y, all abnormalities, except CDKN2A mutation and methylation, contributed to EA risk significantly by univariate analysis, ranging from 17p LOH (relative risk [RR] = 10.6; 95% confidence interval [CI] 5.2–21.3, p < 0.001) to 9p LOH (RR = 2.6; 95% CI 1.1–6.0, p = 0.03). A panel of abnormalities including 17p LOH, DNA content tetraploidy and aneuploidy, and 9p LOH was the best predictor of EA (RR = 38.7; 95% CI 10.8–138.5, p < 0.001). Patients with no baseline abnormality had a 12% 10-y cumulative EA incidence, whereas patients with 17p LOH, DNA content abnormalities, and 9p LOH had at least a 79.1% 10-y EA incidence. In patients with zero, one, two, or three baseline panel abnormalities, there was a significant trend toward EA risk reduction among NSAID users compared to nonusers (p = 0.01). The strongest protective effect was seen in participants with multiple genetic abnormalities, with NSAID nonusers having an observed 10-y EA risk of 79%, compared to 30% for NSAID users (p < 0.001).
Conclusions
A combination of 17p LOH, 9p LOH, and DNA content abnormalities provided better EA risk prediction than any single TP53, CDKN2A, or DNA content lesion alone. NSAIDs are associated with reduced EA risk, especially in patients with multiple high-risk molecular abnormalities.
In a ten-year study of people with Barrett's esophagus, nonsteroidal anti-inflamatory drugs were associated with reduced risk of esophageal adenocarcinoma, especially in patients with multiple high-risk molecular abnormalities.
Editors' Summary
Background.
Normally, the cells in the human body divide only when extra cells are needed, after an injury, for example. Sometimes, however, cells accumulate genetic changes (mutations) that allow them to divide uncontrollably to form a disorganized mass or tumor. If these altered cells also acquire mutations that allow them to spread around the body, a malignant tumor or cancer results. Scientists have identified numerous genetic changes that occur in tumors and are now investigating whether these molecular abnormalities can be used as “biomarkers” to choose the best treatments for patients, to identify who will benefit from cancer-prevention strategies, to detect cancer early, and to predict which cancers are most likely to become life-threatening. This last application is particularly important for cancers with a well-defined premalignant stage. Because the cells in premalignant tissues have acquired some of the genetic changes required for cancer development, they are more likely to become malignant than normal cells. Barrett's esophagus, for example, is a premalignant disorder of the muscular tube that takes food from the mouth to the stomach. People with Barrett's esophagus are much more likely to develop esophageal cancer than the general population.
Why Was This Study Done?
Esophageal cancer is often incurable by the time it is detected, so it would be helpful to know which people with Barrett's esophagus are most likely to develop esophageal cancer—only 1 in 200 of them develop cancer each year. In this study, the researchers evaluated whether a panel of genetic alterations could identify this subset of patients. They also investigated whether the regular use of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs) affects the risk of developing esophageal cancer in people with Barrett's esophagus—other evidence suggests that NSAIDs may help to prevent several types of cancer, including esophageal cancer.
What Did the Researchers Do and Find?
The researchers took esophageal tissue samples from patients with Barrett's esophagus and looked for alterations in the genes encoding the tumor-suppressor proteins TP53 and CDKN2A. These proteins normally stop cells dividing but are often inactivated in cancer cells by mutation of one of the two gene copies that encode each of them and also loss of the other copy (so-called “loss of heterozygosity” or LOH). The researchers also looked for changes in the cellular DNA content of the samples (tumor cells often contain unusual amounts of DNA) and asked the study participants about their NSAID use before waiting to see which participants developed esophageal cancer. After 10 y, the participants whose tissue samples had LOH of the short arms (p) of Chromosome 17 or 9 (the sites of the genes encoding TP53 and CDKN2A, respectively), or an altered DNA content, were more likely to have developed esophageal cancer than those without these abnormalities; those whose samples contained all three abnormalities had the highest risk of developing esophageal cancer. Overall, just 12% of patients with no abnormalities but nearly 80% of patients with three abnormalities developed esophageal cancer. NSAID use reduced the risk of cancer development in all the participants, but its effect was greatest in those with three genetic abnormalities.
What Do These Findings Mean?
These findings suggest that the combined measurement of 17pLOH, 9pLOH, and cellular DNA content might be a powerful way to identify those patients with Barrett's esophagus who are most likely to develop esophageal cancer. They also suggest that NSAID use is associated with a reduced risk of esophageal cancer, particularly in patients with multiple genetic abnormalities. Because very few participants developed cancer during the study, these results need confirming in more patients. Also, the ability of NSAIDs to prevent the progression of Barrett's esophagus to esophageal cancer needs testing in multicenter randomized trials; the use of the panel of abnormalities described here to identify the people with Barrett's esophagus most at risk of developing esophageal cancer should facilitate such studies.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040067.
CancerQuest information from Emory University (Atlanta, Georgia, United States) on cancer biology, including the role of tumor suppressor proteins
US National Cancer Institute patient and physician information on esophageal cancer and its prevention
MedlinePlus encyclopedia pages on Barrett's esophagus and esophageal cancer
Cancerbackup (UK charity) patient information on esophageal cancer and Barrett's esophagus
doi:10.1371/journal.pmed.0040067
PMCID: PMC1808095  PMID: 17326708

Results 1-25 (1382877)