Introduction. Cell surface proteins are ideal targets for cancer therapy and diagnosis. We have identified a set of more than 3700 genes that code for transmembrane proteins believed to be at human cell surface. Methods. We used a high-throuput qPCR system for the analysis of 573 cell surface protein-coding genes in 12 primary breast tumors, 8 breast cell lines, and 21 normal human tissues including breast. To better understand the role of these genes in breast tumors, we used a series of bioinformatics strategies to integrates different type, of the datasets, such as KEGG, protein-protein interaction databases, ONCOMINE, and data from, literature. Results. We found that at least 77 genes are overexpressed in breast primary tumors while at least 2 of them have also a restricted expression pattern in normal tissues. We found common signaling pathways that may be regulated in breast tumors through the overexpression of these cell surface protein-coding genes. Furthermore, a comparison was made between the genes found in this report and other genes associated with features clinically relevant for breast tumorigenesis. Conclusions. The expression profiling generated in this study, together with an integrative bioinformatics analysis, allowed us to identify putative targets for breast tumors.
The era of whole-genome sequencing has revealed that gene copy-number changes caused by duplication and deletion events have important evolutionary, functional, and phenotypic consequences. Recent studies have therefore focused on revealing the extent of variation in copy-number within natural populations of humans and other species. These studies have found a large number of copy-number variants (CNVs) in humans, many of which have been shown to have clinical or evolutionary importance. For the most part, these studies have failed to detect an important class of gene copy-number polymorphism: gene duplications caused by retrotransposition, which result in a new intron-less copy of the parental gene being inserted into a random location in the genome. Here we describe a computational approach leveraging next-generation sequence data to detect gene copy-number variants caused by retrotransposition (retroCNVs), and we report the first genome-wide analysis of these variants in humans. We find that retroCNVs account for a substantial fraction of gene copy-number differences between any two individuals. Moreover, we show that these variants may often result in expressed chimeric transcripts, underscoring their potential for the evolution of novel gene functions. By locating the insertion sites of these duplicates, we are able to show that retroCNVs have had an important role in recent human adaptation, and we also uncover evidence that positive selection may currently be driving multiple retroCNVs toward fixation. Together these findings imply that retroCNVs are an especially important class of polymorphism, and that future studies of copy-number variation should search for these variants in order to illuminate their potential evolutionary and functional relevance.
Recent studies of human genetic variation have revealed that, in addition to differing at single nucleotide polymorphisms, individuals differ in copy-number at many regions of the genome. These copy-number variants (CNVs) are caused by duplication or deletion events and often affect functional sequences such as genes. Efforts to reveal the functional impact of CNVs have identified many variants increasing the risk of various disorders, and some that are adaptive. However, these studies mostly fail to detect gene duplications caused by retrotransposition, in which an mRNA transcript is reverse-transcribed and reinserted into the genome, yielding a new intron-less gene copy. Here we describe a method leveraging next-generation sequence data to accurately detect gene copy-number variants caused by retrotransposition, or retroCNVs, and apply this method to hundreds of whole-genome sequences from three different human subpopulations. We find that these variants account for a substantial number of gene copy-number differences between individuals, and that gene retrotransposition may often result in both deleterious and beneficial mutations. Indeed, we present evidence that two of these new gene duplications may be adaptive. These results imply that retroCNVs are an especially important class of CNV and should be included in future studies of human copy-number variation.
Chemokines and their receptors are involved in the development and cancer progression. The chemokine CXCL12 interacts with its receptor, CXCR4, to promote cellular adhesion, survival, proliferation and migration. The CXCR4 gene is upregulated in several types of cancers, including skin, lung, pancreas, brain and breast tumors. In pancreatic cancer and melanoma, CXCR4 expression is regulated by DNA methylation within its promoter region. In this study we examined the role of cytosine methylation in the regulation of CXCR4 expression in breast cancer cell lines and also correlated the methylation pattern with the clinicopathological aspects of sixty-nine primary breast tumors from a cohort of Brazilian women. RT-PCR showed that the PMC-42, MCF7 and MDA-MB-436 breast tumor cell lines expressed high levels of CXCR4. Conversely, the MDA-MB-435 cell line only expressed CXCR4 after treatment with 5-Aza-CdR, which suggests that CXCR4 expression is regulated by DNA methylation. To confirm this hypothesis, a 184 bp fragment of the CXCR4 gene promoter region was cloned after sodium bisulfite DNA treatment. Sequencing data showed that cell lines that expressed CXCR4 had only 15% of methylated CpG dinucleotides, while the cell line that not have CXCR4 expression, had a high density of methylation (91%). Loss of DNA methylation in the CXCR4 promoter was detected in 67% of the breast cancer analyzed. The absence of CXCR4 methylation was associated with the tumor stage, size, histological grade, lymph node status, ESR1 methylation and CXCL12 methylation, metastasis and patient death. Kaplan-Meier curves demonstrated that patients with an unmethylated CXCR4 promoter had a poorer overall survival and disease-free survival. Furthermore, patients with both CXCL12 methylation and unmethylated CXCR4 had a shorter overall survival and disease-free survival. These findings suggest that the DNA methylation status of both CXCR4 and CXCL12 genes could be used as a biomarker for prognosis in breast cancer.
The MAGE-C1/CT7 encodes a cancer/testis antigen (CTA), is located on the chromosomal region Xq26–27 and is highly polymorphic in humans. MAGE-C1/CT7 is frequently expressed in multiple myeloma (MM) that may be a potential target for immunotherapy in this still incurable disease. MAGEC1/CT7 expression is restricted to malignant plasma cells and it has been suggested that MAGE-C1/CT7 might play a pathogenic role in MM; however, the exact function this protein in the pathophysiology of MM is not yet understood. Our objectives were (1) to clarify the role of MAGE-C1/CT7 in the control of cellular proliferation and cell cycle in myeloma and (2) to evaluate the impact of silencing MAGE-C1/CT7 on myeloma cells treated with bortezomib. Myeloma cell line SKO-007 was transduced for stable expression of shRNA-MAGE-C1/CT7. Downregulation of MAGE-C1/CT7 was confirmed by real time quantitative PCR and western blot. Functional assays included cell proliferation, cell invasion, cell cycle analysis and apoptosis. Western blot showed a 70–80% decrease in MAGE-C1/CT7 protein expression in inhibited cells (shRNA-MAGE-C1/CT7) when compared with controls. Functional assays did not indicate a difference in cell proliferation and DNA synthesis when inhibited cells were compared with controls. However, we found a decreased percentage of cells in the G2/M phase of the cell cycle among inhibited cells, but not in the controls (p<0.05). When myeloma cells were treated with bortezomib, we observed a 48% reduction of cells in the G2/M phase among inhibited cells while controls showed 13% (empty vector) and 9% (ineffective shRNA) reduction, respectively (p<0.01). Furthermore, inhibited cells treated with bortezomib showed an increased percentage of apoptotic cells (Annexin V+/PI-) in comparison with bortezomib-treated controls (p<0.001). We found that MAGE-C1/CT7 protects SKO-007 cells against bortezomib-induced apoptosis. Therefore, we could speculate that MAGE-C1/CT7 gene therapy could be a strategy for future therapies in MM, in particular in combination with proteasome inhibitors.
1) To correlate the methylation status of the O6-methylguanine-DNA-methyltransferase (MGMT) promoter to its gene and protein expression levels in glioblastoma and 2) to determine the most reliable method for using MGMT to predict the response to adjuvant therapy in patients with glioblastoma.
The MGMT gene is epigenetically silenced by promoter hypermethylation in gliomas, and this modification has emerged as a relevant predictor of therapeutic response.
Fifty-one cases of glioblastoma were analyzed for MGMT promoter methylation by methylation-specific PCR and pyrosequencing, gene expression by real time polymerase chain reaction, and protein expression by immunohistochemistry.
MGMT promoter methylation was found in 43.1% of glioblastoma by methylation-specific PCR and 38.8% by pyrosequencing. A low level of MGMT gene expression was correlated with positive MGMT promoter methylation (p = 0.001). However, no correlation was found between promoter methylation and MGMT protein expression (p = 0.297). The mean survival time of glioblastoma patients submitted to adjuvant therapy was significantly higher among patients with MGMT promoter methylation (log rank = 0.025 by methylation-specific PCR and 0.004 by pyrosequencing), and methylation was an independent predictive factor that was associated with improved prognosis by multivariate analysis.
DISCUSSION AND CONCLUSION:
MGMT promoter methylation status was a more reliable predictor of susceptibility to adjuvant therapy and prognosis of glioblastoma than were MGMT protein or gene expression levels. Methylation-specific polymerase chain reaction and pyrosequencing methods were both sensitive methods for determining MGMT promoter methylation status using DNA extracted from frozen tissue.
Glioblastoma; MGMT promoter methylation; MGMT gene; MGMT protein; Prognosis
Although patterns of somatic alterations have been reported for tumor genomes, little is known on how they compare with alterations present in non-tumor genomes. A comparison of the two would be crucial to better characterize the genetic alterations driving tumorigenesis. We sequenced the genomes of a lymphoblastoid (HCC1954BL) and a breast tumor (HCC1954) cell line derived from the same patient and compared the somatic alterations present in both. The lymphoblastoid genome presents a comparable number and similar spectrum of nucleotide substitutions to that found in the tumor genome. However, a significant difference in the ratio of non-synonymous to synonymous substitutions was observed between both genomes (P = 0.031). Protein–protein interaction analysis revealed that mutations in the tumor genome preferentially affect hub-genes (P = 0.0017) and are co-selected to present synergistic functions (P < 0.0001). KEGG analysis showed that in the tumor genome most mutated genes were organized into signaling pathways related to tumorigenesis. No such organization or synergy was observed in the lymphoblastoid genome. Our results indicate that endogenous mutagens and replication errors can generate the overall number of mutations required to drive tumorigenesis and that it is the combination rather than the frequency of mutations that is crucial to complete tumorigenic transformation.
To identify potential tumor suppressor genes, genome-wide data from exome and transcriptome sequencing were combined to search for genes with loss of heterozygosity and allele-specific expression. The analysis was conducted on the breast cancer cell line HCC1954, and a lymphoblast cell line from the same individual, HCC1954BL.
By comparing exome sequences from the two cell lines, we identified loss of heterozygosity events at 403 genes in HCC1954 and at one gene in HCC1954BL. The combination of exome and transcriptome sequence data also revealed 86 and 50 genes with allele specific expression events in HCC1954 and HCC1954BL, which comprise 5.4% and 2.6% of genes surveyed, respectively. Many of these genes identified by loss of heterozygosity and allele-specific expression are known or putative tumor suppressor genes, such as BRCA1, MSH3 and SETX, which participate in DNA repair pathways.
Our results demonstrate that the combined application of high throughput sequencing to exome and allele-specific transcriptome analysis can reveal genes with known tumor suppressor characteristics, and a shortlist of novel candidates for the study of tumor suppressor activities.
Technological advances have enabled a better characterization of all the genetic alterations in tumors. A picture that emerges is that tumor cells are much more genetically heterogeneous than originally expected. Thus, a critical issue in cancer genomics is the identification of the genetic alterations that drive the genesis of a tumor. Recently, a systems biology approach has been used to characterize such alterations and find associations between them and the process of gliomagenesis. Here, we discuss some implications of this strategy for the development of new therapeutic and diagnostic protocols for cancer.
Imprinted inactivation of the paternal X chromosome in marsupials is the primordial mechanism of dosage compensation for X-linked genes between females and males in Therians. In Eutherian mammals, X chromosome inactivation (XCI) evolved into a random process in cells from the embryo proper, where either the maternal or paternal X can be inactivated. However, species like mouse and bovine maintained imprinted XCI exclusively in extraembryonic tissues. The existence of imprinted XCI in humans remains controversial, with studies based on the analyses of only one or two X-linked genes in different extraembryonic tissues. Here we readdress this issue in human term placenta by performing a robust analysis of allele-specific expression of 22 X-linked genes, including XIST, using 27 SNPs in transcribed regions. We show that XCI is random in human placenta, and that this organ is arranged in relatively large patches of cells with either maternal or paternal inactive X. In addition, this analysis indicated heterogeneous maintenance of gene silencing along the inactive X, which combined with the extensive mosaicism found in placenta, can explain the lack of agreement among previous studies. Our results illustrate the differences of XCI mechanism between humans and mice, and highlight the importance of addressing the issue of imprinted XCI in other species in order to understand the evolution of dosage compensation in placental mammals.
CXCL12 is a chemokine that is constitutively expressed in many organs and tissues. CXCL12 promoter hypermethylation has been detected in primary breast tumours and contributes to their metastatic potential. It has been shown that the oestrogen receptor α (ESR1) gene can also be silenced by DNA methylation. In this study, we used methylation-specific PCR (MSP) to analyse the methylation status in two regions of the CXCL12 promoter and ESR1 in tumour cell lines and in primary breast tumour samples, and correlated our results with clinicopathological data.
First, we analysed CXCL12 expression in breast tumour cell lines by RT-PCR. We also used 5-aza-2'-deoxycytidine (5-aza-CdR) treatment and DNA bisulphite sequencing to study the promoter methylation for a specific region of CXCL12 in breast tumour cell lines. We evaluated CXCL12 and ESR1 methylation in primary tumour samples by methylation-specific PCR (MSP). Finally, promoter hypermethylation of these genes was analysed using Fisher's exact test and correlated with clinicopathological data using the Chi square test, Kaplan-Meier survival analysis and Cox regression analysis.
CXCL12 promoter hypermethylation in the first region (island 2) and second region (island 4) was correlated with lack of expression of the gene in tumour cell lines. In the primary tumours, island 2 was hypermethylated in 14.5% of the samples and island 4 was hypermethylated in 54% of the samples. The ESR1 promoter was hypermethylated in 41% of breast tumour samples. In addition, the levels of ERα protein expression diminished with increased frequency of ESR1 methylation (p < 0.0001). This study also demonstrated that CXCL12 island 4 and ESR1 methylation occur simultaneously at a high frequency (p = 0.0220).
This is the first study showing a simultaneous involvement of epigenetic regulation for both CXCL12 and ESR1 genes in Brazilian women. The methylation status of both genes was significantly correlated with histologically advanced disease, the presence of metastases and death. Therefore, the methylation pattern of these genes could be used as a molecular marker for the prediction of breast cancer outcome.
Treacher Collins syndrome (TCS) is an autosomal dominant craniofacial disorder caused by frameshift deletions or duplications in the TCOF1 gene. These mutations cause premature termination codons, which are predicted to lead to mRNA degradation by nonsense mediated mRNA decay (NMD). Haploinsufficiency of the gene product (treacle) during embryonic development is the proposed molecular mechanism underlying TCS. However, it is still unknown if TCOF1 expression levels are decreased in post-embryonic human cells.
We have estimated TCOF1 transcript levels through real time PCR in mRNA obtained from leucocytes and mesenchymal cells of TCS patients (n = 23) and controls (n = 18). Mutational screening and analysis of NMD were performed by direct sequencing of gDNA and cDNA, respectively.
All the 23 patients had typical clinical features of the syndrome and pathogenic mutations were detected in 19 of them. We demonstrated that the expression level of TCOF1 is 18-31% lower in patients than in controls (p < 0.05), even if we exclude the patients in whom we did not detect the pathogenic mutation. We also observed that the mutant allele is usually less abundant than the wild type one in mesenchymal cells.
This is the first study to report decreased expression levels of TCOF1 in TCS adult human cells, but it is still unknown if this finding is associated to any phenotype in adulthood. In addition, as we demonstrated that alleles harboring the pathogenic mutations have lower expression, we herein corroborate the current hypothesis of NMD of the mutant transcript as the explanation for diminished levels of TCOF1 expression. Further, considering that TCOF1 deficiency in adult cells could be associated to pathologic clinical findings, it will be important to verify if TCS patients have an impairment in adult stem cell properties, as this can reduce the efficiency of plastic surgery results during rehabilitation of these patients.
Head and neck squamous cell carcinoma (HNSCC) is a heterogeneous disease affecting the epithelium of the oral cavity, pharynx and larynx. Conditions of most patients are diagnosed at late stages of the disease, and no sensitive and specific predictors of aggressive behavior have been identified yet. Therefore, early detection and prognostic biomarkers are highly desirable for a more rational management of the disease. Hypermethylation of CpG islands is one of the most important epigenetic mechanisms that leads to gene silencing in tumors and has been extensively used for the identification of biomarkers. In this study, we combined rapid subtractive hybridization and microarray analysis in a hierarchical manner to select genes that are putatively reactivated by the demethylating agent 5-aza-2′-deoxycytidine (5Aza-dC) in HNSCC cell lines (FaDu, UM-SCC-14A, UM-SCC-17A, UM-SCC-38A). This combined analysis identified 78 genes, 35 of which were reactivated in at least 2 cell lines and harbored a CpG island at their 5′ region. Reactivation of 3 of these 35 genes (CRABP2, MX1, and SLC15A3) was confirmed by quantitative real-time polymerase chain reaction (PCR; fold change, ≥3). Bisulfite sequencing of their CpG islands revealed that they are indeed differentially methylated in the HNSCC cell lines. Using methylation-specific PCR, we detected a higher frequency of CRABP2 (58.1% for region 1) and MX1 (46.3%) hypermethylation in primary HNSCC when compared with lymphocytes from healthy individuals. Finally, absence of the CRABP2 protein was associated with decreased disease-free survival rates, supporting a potential use of CRABP2 expression as a prognostic biomarker for HNSCC patients.
ADAM33 protein is a member of the family of transmembrane glycoproteins composed of multidomains. ADAM family members have different activities, such as proteolysis and adhesion, making them good candidates to mediate the extracellular matrix remodelling and changes in cellular adhesion that characterise certain pathologies and cancer development. It was reported that one family member, ADAM23, is down-regulated by promoter hypermethylation. This seems to correlate with tumour progression and metastasis in breast cancer. In this study, we explored the involvement of ADAM33, another ADAM family member, in breast cancer.
First, we analysed ADAM33 expression in breast tumour cell lines by RT-PCR and western blotting. We also used 5-aza-2'-deoxycytidine (5azadCR) treatment and DNA bisulphite sequencing to study the promoter methylation of ADAM33 in breast tumour cell lines. We evaluated ADAM33 methylation in primary tumour samples by methylation specific PCR (MSP). Finally, ADAM33 promoter hypermethylation was correlated with clinicopathological data using the chi-square test and Fisher's exact test.
The expression analysis of ADAM33 in breast tumour cell lines by RT-PCR revealed gene silencing in 65% of tumour cell lines. The corresponding lack of ADAM33 protein was confirmed by western blotting. We also used 5-aza-2'-deoxycytidine (5-aza-dCR) demethylation and bisulphite sequencing methodologies to confirm that gene silencing is due to ADAM33 promoter hypermethylation. Using MSP, we detected ADAM33 promoter hypermethylation in 40% of primary breast tumour samples. The correlation between methylation pattern and patient's clinicopathological data was not significantly associated with histological grade; tumour stage (TNM); tumour size; ER, PR or ERBB2 status; lymph node status; metastasis or recurrence. Methylation frequency in invasive lobular carcinoma (ILC) was 76.2% compared with 25.5% in invasive ductal carcinoma (IDC), and this difference was statistically significant (p = 0.0002).
ADAM33 gene silencing may be related to the discohesive histological appearance of ILCs. We suggest that ADAM33 promoter methylation may be a useful molecular marker for differentiating ILC and IDC.
The potency of the immune response has still to be harnessed effectively to combat human cancers. However, the discovery of T-cell targets in melanomas and other tumors has raised the possibility that cancer vaccines can be used to induce a therapeutically effective immune response against cancer. The targets, cancer-testis (CT) antigens, are immunogenic proteins preferentially expressed in normal gametogenic tissues and different histological types of tumors. Therapeutic cancer vaccines directed against CT antigens are currently in late-stage clinical trials testing whether they can delay or prevent recurrence of lung cancer and melanoma following surgical removal of primary tumors. CT antigens constitute a large, but ill-defined, family of proteins that exhibit a remarkably restricted expression. Currently, there is a considerable amount of information about these proteins, but the data are scattered through the literature and in several bioinformatic databases. The database presented here, CTdatabase (http://www.cta.lncc.br), unifies this knowledge to facilitate both the mining of the existing deluge of data, and the identification of proteins alleged to be CT antigens, but that do not have their characteristic restricted expression pattern. CTdatabase is more than a repository of CT antigen data, since all the available information was carefully curated and annotated with most data being specifically processed for CT antigens and stored locally. Starting from a compilation of known CT antigens, CTdatabase provides basic information including gene names and aliases, RefSeq accession numbers, genomic location, known splicing variants, gene duplications and additional family members. Gene expression at the mRNA level in normal and tumor tissues has been collated from publicly available data obtained by several different technologies. Manually curated data related to mRNA and protein expression, and antigen-specific immune responses in cancer patients are also available, together with links to PubMed for relevant CT antigen articles.
Leishmania (Leishmania) amazonensis infection in man results in a clinical spectrum of disease manifestations ranging from cutaneous to mucosal or visceral involvement. In the present study, we have investigated the genetic variability of 18 L. amazonensis strains isolated in northeastern Brazil from patients with different clinical manifestations of leishmaniasis. Parasite DNA was analyzed by sequencing of the ITS flanking the 5.8 S subunit of the ribosomal RNA genes, by RAPD and SSR-PCR and by PFGE followed by hybridization with gene-specific probes.
ITS sequencing and PCR-based methods revealed genetic heterogeneity among the L. amazonensis isolates examined and molecular karyotyping also showed variation in the chromosome size of different isolates. Unrooted genetic trees separated strains into different groups.
These results indicate that L. amazonensis strains isolated from leishmaniasis patients from northeastern Brazil are genetically diverse, however, no correlation between genetic polymorphism and phenotype were found.
Analysis of a catalog of S-AS pairs in the human and mouse genomes revealed several putative roles for natural antisense transcripts and showed that some are artifacts of cDNA library construction.
A significant number of genes in mammalian genomes are being found to have natural antisense transcripts (NATs). These sense-antisense (S-AS) pairs are believed to be involved in several cellular phenomena.
Here, we generated a catalog of S-AS pairs occurring in the human and mouse genomes by analyzing different sources of expressed sequences available in the public domain plus 122 massively parallel signature sequencing (MPSS) libraries from a variety of human and mouse tissues. Using this dataset of almost 20,000 S-AS pairs in both genomes we investigated, in a computational and experimental way, several putative roles that have been assigned to NATs, including gene expression regulation. Furthermore, these global analyses allowed us to better dissect and propose new roles for NATs. Surprisingly, we found that a significant fraction of NATs are artifacts produced by genomic priming during cDNA library construction.
We propose an evolutionary and functional model in which alternative polyadenylation and retroposition account for the origin of a significant number of functional S-AS pairs in mammalian genomes.
This work reports the results of analyses of three complete mycoplasma genomes, a pathogenic (7448) and a nonpathogenic (J) strain of the swine pathogen Mycoplasma hyopneumoniae and a strain of the avian pathogen Mycoplasma synoviae; the genome sizes of the three strains were 920,079 bp, 897,405 bp, and 799,476 bp, respectively. These genomes were compared with other sequenced mycoplasma genomes reported in the literature to examine several aspects of mycoplasma evolution. Strain-specific regions, including integrative and conjugal elements, and genome rearrangements and alterations in adhesin sequences were observed in the M. hyopneumoniae strains, and all of these were potentially related to pathogenicity. Genomic comparisons revealed that reduction in genome size implied loss of redundant metabolic pathways, with maintenance of alternative routes in different species. Horizontal gene transfer was consistently observed between M. synoviae and Mycoplasma gallisepticum. Our analyses indicated a likely transfer event of hemagglutinin-coding DNA sequences from M. gallisepticum to M. synoviae.
It has been proposed that human colorectal tumors can be classified into two groups: one in which methylation is rare, and another with methylation of several loci associated with a “CpG island methylated phenotype (CIMP),” characterized by preferential proximal location in the colon, but otherwise poorly defined. There is considerable overlap between this putative methylator phenotype and the well-known mutator phenotype associated with microsatellite instability (MSI). We have examined hypermethylation of the promoter region of five genes (DAPK, MGMT, hMLH1, p16INK4a, and p14ARF) in 106 primary colorectal cancers. A graph depicting the frequency of methylated loci in the series of tumors showed a continuous, monotonically decreasing distribution quite different from the previously claimed discontinuity. We observed a significant association between the presence of three or more methylated loci and the proximal location of the tumors. However, if we remove from analysis the tumors with hMLH1 methylation or those with MSI, the significance vanishes, suggesting that the association between multiple methylations and proximal location was indirect due to the correlation with MSI. Thus, our data do not support the independent existence of the so-called methylator phenotype and suggest that it rather may represent a statistical artifact caused by confounding of associations.
CpG methylation; phenotype; colorectal; cancer; microsatellite instability
Serial Analysis of Gene Expression (SAGE) and Massively Parallel Signature Sequencing (MPSS) are powerful techniques for gene expression analysis. A crucial step in analyzing SAGE and MPSS data is the assignment of experimentally obtained tags to a known transcript. However, tag to transcript assignment is not a straightforward process since alternative tags for a given transcript can also be experimentally obtained. Here, we have evaluated the impact of Single Nucleotide Polymorphisms (SNPs) on the generation of alternative SAGE and MPSS tags. This was achieved through the construction of a reference database of SNP-associated alternative tags, which has been integrated with SAGE Genie. A total of 2020 SNP-associated alternative tags were catalogued in our reference database and at least one SNP-associated alternative tag was observed for ∼8.6% of all known human genes. A significant fraction (61.9%) of these alternative tags matched a list of experimentally obtained tags, validating their existence. In addition, the origin of four out of five SNP-associated alternative MPSS tags was experimentally confirmed through the use of the GLGI-MPSS protocol (Generation of Long cDNA fragments for Gene Identification). The availability of our SNP-associated alternative tag database will certainly improve the interpretation of SAGE and MPSS experiments.
Massively Parallel Signature Sequencing (MPSS) is a powerful technique for genome-wide gene expression analysis, which, similar to SAGE, relies on the production of short tags proximal to the 3′end of transcripts. A single MPSS experiment can generate over 107 tags, providing a 10-fold coverage of the transcripts expressed in a human cell. A significant fraction of MPSS tags cannot be assigned to known transcripts (orphan tags) and are likely to be derived from transcripts expressed at very low levels (∼1 copy per cell). In order to explore the potential of MPSS for the characterization of the human transcriptome, we have adapted the GLGI protocol (Generation of Longer cDNA fragments from SAGE tags for Gene Identification) to convert MPSS tags into their corresponding 3′ cDNA fragments. GLGI-MPSS was applied to 83 orphan tags and 41 cDNA fragments were obtained. The analysis of these 41 fragments allowed the identification of novel transcripts, alternative tags generated from polymorphic and alternatively spliced transcripts, as well as the detection of artefactual MPSS tags. A systematic large-scale analysis of the genome by MPSS, in combination with the use of GLGI-MPSS protocol, will certainly provide a complementary approach to generate the complete catalog of human transcripts.
Brazil was heralded for completion of the first genome sequence of a plant pathogen following the development of a virtual research center — a collaborative network of laboratories throughout the state of São Paulo, drawing on the expertise of a dispersed and diverse scientific community and on investment from both the government and the private sector. Strategies key to the success of this model are discussed here in the context of continuing collaborative scientific endeavors in both developed and developing countries.
Based on the analysis of the drafts of the human genome sequence, it is being speculated
that our species may possess an unexpectedly low number of genes. The quality of the
drafts, the impossibility of accurate gene prediction and the lack of sufficient transcript
sequence data, however, render such speculations very premature. The complexity of
human gene structure requires additional and extensive experimental verification of
transcripts that may result in major revisions of these early estimates of the number
of human genes.
A cosmid library was made of the 2.7 Mb genome of the Gram-negative plant pathogenic bacterium Xylella fastidiosa and analysed by hybridisation mapping. Clones taken from the library as well as genomic restriction fragments of rarely cutting enzymes were used as probes. The latter served as a backbone for ordering the initial map contigs and thus facilitated gap closure. Also, the co-linearity of the cosmid map, and thus the eventual sequence, could be confirmed by this process. A subset of the eventual clone coverage was distributed to the Brazilian X.fastidiosa sequencing network. Data from this effort confirmed more quantitatively initial results from the hybridisation mapping that the redundancy of clone coverage ranged between 0 and 45-fold across the genome, while the average was 15-fold by experimental design. Reasons for this not unexpected fluctuation and the actual gaps are being discussed, as is the use of this effect for functional studies.