We propose a Poisson-compound gamma approach for species richness estimation. Based on the denseness and nesting properties of the gamma mixture, we fix the shape parameter of each gamma component at a unified value, and estimate the mixture using nonparametric maximum likelihood. A least-squares crossvalidation procedure is proposed for the choice of the common shape parameter. The performance of the resulting estimator of N is assessed using numerical studies and genomic data.
Crossvalidation; Nesting property of gamma mixtures; Nonparametric maximum likelihood estimation; Poisson-compound gamma model; Species richness estimation
The exact positions of nucleosomes along genomic DNA can influence many aspects of chromosome function, yet existing methods for mapping nucleosomes do not provide the necessary single base pair accuracy to determine these positions. Here we develop and apply a new approach for direct mapping of nucleosome centers based on chemical modification of engineered histones. The resulting map locates nucleosome positions genome-wide in unprecedented detail and accuracy. It reveals novel aspects of the in vivo nucleosome organization that are linked to transcription factor binding, RNA polymerase pausing, and the higher order structure of the chromatin fiber.
In Drosophila, post-transcriptional gene silencing occurs when exogenous or endogenous double stranded RNA (dsRNA) is processed into small interfering RNAs (siRNAs) by Dicer-2 (Dcr-2) in association with a dsRNA-binding protein (dsRBP) cofactor called Loquacious (Loqs-PD). siRNAs are then loaded onto Argonaute-2 (Ago2) by the action of Dcr-2 with another dsRBP cofactor called R2D2. Loaded Ago2 executes the destruction of target RNAs that have sequence complementarity to siRNAs. Although Dcr-2, R2D2, and Ago2 are essential for innate antiviral defense, the mechanism of virus-derived siRNA (vsiRNA) biogenesis and viral target inhibition remains unclear. Here, we characterize the response mechanism mediated by siRNAs against two different RNA viruses that infect Drosophila. In both cases, we show that vsiRNAs are generated by Dcr-2 processing of dsRNA formed during viral genome replication and, to a lesser extent, viral transcription. These vsiRNAs seem to preferentially target viral polyadenylated RNA to inhibit viral replication. Loqs-PD is completely dispensable for silencing of the viruses, in contrast to its role in silencing endogenous targets. Biogenesis of vsiRNAs is independent of both Loqs-PD and R2D2. R2D2, however, is required for sorting and loading of vsiRNAs onto Ago2 and inhibition of viral RNA expression. Direct injection of viral RNA into Drosophila results in replication that is also independent of Loqs-PD. This suggests that triggering of the antiviral pathway is not related to viral mode of entry but recognition of intrinsic features of virus RNA. Our results indicate the existence of a vsiRNA pathway that is separate from the endogenous siRNA pathway and is specifically triggered by virus RNA. We speculate that this unique framework might be necessary for a prompt and efficient antiviral response.
The RNA interference (RNAi) pathway utilizes small non-coding RNAs to silence gene expression. In insects, RNAi regulates endogenous genes and functions as an RNA-based immune system against viral infection. Here we have uncovered details of how RNAi is triggered by RNA viruses. Double-stranded RNA (dsRNA) generated as a replication intermediate or from transcription of the RNA virus can be used as substrate for the biogenesis of virus-derived small interfering RNAs (vsiRNAs). Unlike other dsRNAs, virus RNA processing involves Dicer but not its canonical partner protein Loqs-PD. Thus, vsiRNA biogenesis is mechanistically different from biogenesis of endogenous siRNAs or siRNAs derived from other exogenous RNA sources. Our results suggest a specialization of the pathway dedicated to silencing of RNA viruses versus other types of RNAi silencing. The understanding of RNAi mechanisms during viral infection could have implications for the control of insect-borne viruses and the use of siRNAs to treat viral infections in humans.
The human brain possesses a remarkable capacity to interpret and recall novel sounds as spoken language. These linguistic abilities arise from complex processing spanning a widely distributed cortical network and are characterized by marked individual variation. Recently, graph theoretical analysis has facilitated the exploration of how such aspects of large-scale brain functional organization may underlie cognitive performance. Brain functional networks are known to possess small-world topologies characterized by efficient global and local information transfer, but whether these properties relate to language learning abilities remains unknown. Here we applied graph theory to construct large-scale cortical functional networks from cerebral hemodynamic (fMRI) responses acquired during an auditory pitch discrimination task and found that such network properties were associated with participants’ future success in learning words of an artificial spoken language. Successful learners possessed networks with reduced local efficiency but increased global efficiency relative to less successful learners and had a more cost-efficient network organization. Regionally, successful and less successful learners exhibited differences in these network properties spanning bilateral prefrontal, parietal, and right temporal cortex, overlapping a core network of auditory language areas. These results suggest that efficient cortical network organization is associated with sound-to-word learning abilities among healthy, younger adults.
Histone wrapping of DNA into nucleosomes almost certainly evolved in the Archaea, and predates Eukaryotes. In Eukaryotes, nucleosome positioning plays a central role in regulating gene expression and is directed by primary sequence motifs that together form a nucleosome positioning code. The experiments reported were undertaken to determine if archaeal histone assembly conforms to the nucleosome positioning code.
Eukaryotic nucleosome positioning is favored and directed by phased helical repeats of AA/TT/AT/TA and CC/GG/CG/GC dinucleotides, and disfavored by longer AT-rich oligonucleotides. Deep sequencing of genomic DNA protected from micrococcal nuclease digestion by assembly into archaeal nucleosomes has established that archaeal nucleosome assembly is also directed and positioned by these sequence motifs, both in vivo in Methanothermobacter thermautotrophicus and Thermococcus kodakarensis and in vitro in reaction mixtures containing only one purified archaeal histone and genomic DNA. Archaeal nucleosomes assembled at the same locations in vivo and in vitro, with much reduced assembly immediately upstream of open reading frames and throughout the ribosomal rDNA operons. Providing further support for a common positioning code, archaeal histones assembled into nucleosomes on eukaryotic DNA and eukaryotic histones into nucleosomes on archaeal DNA at the same locations. T. kodakarensis has two histones, designated HTkA and HTkB, and strains with either but not both histones deleted grow normally but do exhibit transcriptome differences. Comparisons of the archaeal nucleosome profiles in the intergenic regions immediately upstream of genes that exhibited increased or decreased transcription in the absence of HTkA or HTkB revealed substantial differences but no consistent pattern of changes that would correlate directly with archaeal nucleosome positioning inhibiting or stimulating transcription.
The results obtained establish that an archaeal histone and a genome sequence together are sufficient to determine where archaeal nucleosomes preferentially assemble and where they avoid assembly. We confirm that the same nucleosome positioning code operates in Archaea as in Eukaryotes and presumably therefore evolved with the histone-fold mechanism of DNA binding and compaction early in the archaeal lineage, before the divergence of Eukaryotes.
Archaea; Nucleosome positioning; Dinucleotide repeats; Histone deletions; rDNA expression; Chromatin evolution
We report a target enrichment method to map nucleosomes of large genomes at unprecedented coverage and resolution by deeply sequencing locus-specific mononucleosomal DNA enriched via hybridization with bacterial artificial chromosomes. We achieved ∼10 000-fold enrichment of specific loci, which enabled sequencing nucleosomes at up to ∼500-fold higher coverage than has been reported in a mammalian genome. We demonstrate the advantages of generating high-sequencing coverage for mapping the center of discrete nucleosomes, and we show the use of the method by mapping nucleosomes during T cell differentiation using nuclei from effector T-cells differentiated from clonal, isogenic, naïve, primary murine CD4 and CD8 T lymphocytes. The analysis reveals that discrete nucleosomes exhibit cell type-specific occupancy and positioning depending on differentiation status and transcription. This method is widely applicable to mapping many features of chromatin and discerning its landscape in large genomes at unprecedented resolution.
Nucleosome positioning on the chromatin strand plays a critical role in regulating accessibility of DNA to transcription factors and chromatin modifying enzymes. Hence, detailed information on nucleosome depletion or movement at cis-acting regulatory elements has the potential to identify predicted binding sites for trans-acting factors. Using a novel method based on enrichment of mononucleosomal DNA by bacterial artificial chromosome hybridization, we mapped nucleosome positions by deep sequencing across 250 kb, encompassing the cystic fibrosis transmembrane conductance regulator (CFTR) gene. CFTR shows tight tissue-specific regulation of expression, which is largely determined by cis-regulatory elements that lie outside the gene promoter. Although multiple elements are known, the repertoire of transcription factors that interact with these sites to activate or repress CFTR expression remains incomplete. Here, we show that specific nucleosome depletion corresponds to well-characterized binding sites for known trans-acting factors, including hepatocyte nuclear factor 1, Forkhead box A1 and CCCTC-binding factor. Moreover, the cell-type selective nucleosome positioning is effective in predicting binding sites for novel interacting factors, such as BAF155. Finally, we identify transcription factor binding sites that are overrepresented in regions where nucleosomes are depleted in a cell-specific manner. This approach recognizes the glucocorticoid receptor as a novel trans-acting factor that regulates CFTR expression in vivo.
As breast cancer cells develop secondary resistance to estrogen deprivation therapy, they increase their utilization of non-genomic signaling pathways. Our prior work demonstrated that estradiol causes an association of ERα with Shc, Src and the IGF-1-R. In cells developing resistance to estrogen deprivation (surrogate for aromatase inhibition) and to the anti-estrogens tamoxifen, 4-OH-tamoxifen, and fulvestrant, an increased association of ERα with c-Src and the EGF-R occurs. At the same time, there is a translocation of ERα out of the nucleus and into the cytoplasm and cell membrane. Blockade of cSrc with the Src kinase inhibitor, PP-2 causes relocation of ERα into the nucleus. While these changes are not identical in response to each anti- estrogen, ERα binding to the EGF-R is increased in response to 4-OH-Tamoxifen when compared with tamoxifen. The changes in EGF-R interactions with ERα impart an enhanced sensitivity of tamoxifen resistant cells to the inhibitory properties of the specific EGF-R tyrosine kinase inhibitor, AG 1478. However, with long term exposure of tamoxifen-resistant cells to AG 1478, the cells begin to re-grow but can now be inhibited by the IGF-R tyrosine kinase inhibitor, AG 1024. These data suggest that the IGF-R system becomes the predominant signaling mechanism as an adaptive response to the EGF-R inhibitor. Taken together, this information suggests that both the EGF-R and IGF-R pathways can mediate ERα signaling.
To further examine the effects of fulvestrant on ERα function, we examined the acute effects of fulvestrant, on non-genomic functionality. Fulvestrant enhanced ERα association with the membrane IGF-1 receptor (IGF-1R). Using siRNA or expression vectors to knock-down or knock-in selective proteins, we further demonstrated that the ERα/IGF-1R association is Src-dependent. Fulvestrant rapidly induced IGF-1R and MAPK phosphorylation. The Src inhibitor PP2 and IGF-1R inhibitor AG1024 greatly blocked fulvestrant-induced ERα/IGF-1R interaction leading to a further depletion of total cellular ERα induced by fulvestrant and further enhanced fulvestrant-induced cell growth arrest. More dramatic was the translocation of ERα to the plasma membrane in combination with the IGF-1-R as shown by confocal microscopy. Taken in aggregate, these studies suggest that secondary resistance to hormonal therapy results in usage of both IGF-R and EGF-R for non-genomic signaling.
Aging is accompanied by substantial changes in brain function, including functional reorganization of large-scale brain networks. Such differences in network architecture have been reported both at rest and during cognitive task performance, but an open question is whether these age-related differences show task-dependent effects or represent only task-independent changes attributable to a common factor (i.e., underlying physiological decline). To address this question, we used graph theoretic analysis to construct weighted cortical functional networks from hemodynamic (functional MRI) responses in 12 younger and 12 older adults during a speech perception task performed in both quiet and noisy listening conditions. Functional networks were constructed for each subject and listening condition based on inter-regional correlations of the fMRI signal among 66 cortical regions, and network measures of global and local efficiency were computed. Across listening conditions, older adult networks showed significantly decreased global (but not local) efficiency relative to younger adults after normalizing measures to surrogate random networks. Although listening condition produced no main effects on whole-cortex network organization, a significant age group x listening condition interaction was observed. Additionally, an exploratory analysis of regional effects uncovered age-related declines in both global and local efficiency concentrated exclusively in auditory areas (bilateral superior and middle temporal cortex), further suggestive of specificity to the speech perception tasks. Global efficiency also correlated positively with mean cortical thickness across all subjects, establishing gross cortical atrophy as a task-independent contributor to age-related differences in functional organization. Together, our findings provide evidence of age-related disruptions in cortical functional network organization during speech perception tasks, and suggest that although task-independent effects such as cortical atrophy clearly underlie age-related changes in cortical functional organization, age-related differences also demonstrate sensitivity to task domains.
The nucleosome is the fundamental packing unit of DNAs in eukaryotic cells. Its detailed positioning on the genome is closely related to chromosome functions. Increasing evidence has shown that genomic DNA sequence itself is highly predictive of nucleosome positioning genome-wide. Therefore a fast software tool for predicting nucleosome positioning can help understanding how a genome's nucleosome organization may facilitate genome function.
We present a duration Hidden Markov model for nucleosome positioning prediction by explicitly modeling the linker DNA length. The nucleosome and linker models trained from yeast data are re-scaled when making predictions for other species to adjust for differences in base composition. A software tool named NuPoP is developed in three formats for free download.
Simulation studies show that modeling the linker length distribution and utilizing a base composition re-scaling method both improve the prediction of nucleosome positioning regarding sensitivity and false discovery rate. NuPoP provides a user-friendly software tool for predicting the nucleosome occupancy and the most probable nucleosome positioning map for genomic sequences of any size. When compared with two existing methods, NuPoP shows improved performance in sensitivity.
AIM: To investigate the relationship between 90-kuD ribosomal S6 kinase (p90RSK) and collagen type I expression during the development of hepatic fibrosis in vivo and in vitro.
METHODS: Rat hepatic fibrosis was induced by intraperitoneal injection of dimethylnitrosamine. The protein expression and cell location of p90RSK and their relationship with collagen type I were determined by co-immunofluoresence and confocal microscopy. Subsequently, RNAi strategy was employed to silence p90RSK mRNA expression in HSC-T6, an activated hepatic stellate cell (HSC) line. The expression of collagen type I in HSC-T6 cells was assessed by Western blotting and real-time polymerase chain reaction. Furthermore, HSCs were transfected with expression vectors or RNAi constructs of p90RSK to increase or decrease the p90RSK expression, then collagen type I promoter activity in the transfected HSCs was examined by reporter assay. Lastly HSC-T6 cells transfected with p90RSK siRNA was treated with or without platelet-derived growth factor (PDGF)-BB at a final concentration of 20 μg/L and the cell growth was determined by MTS conversion.
RESULTS: In fibrotic liver tissues, p90RSK was over-expressed in activated HSCs and had a significant positive correlation with collagen type I levels. In HSC-T6 cells transfected with RNAi targeted to p90RSK, the expression of collagen type I was down-regulated (61.8% in mRNA, P < 0.01, 89.1% in protein, P < 0.01). However, collagen type I promoter activity was not increased with over-expression of p90RSK and not decreased with low expression either, compared with controls in the same cell line (P = 0.076). Furthermore, p90RSK siRNA exerted the inhibition of HSC proliferation, and also abolished the effect of PDGF on the HSC proliferation.
CONCLUSION: p90RSK is over-expressed in activated HSCs and involved in regulating the abnormal expression of collagen type I through initiating the proliferation of HSCs.
90-kuD ribosomal S6 kinase; Collagen type I; Hepatic fibrosis; Hepatic stellate cell; RNAi
Eukaryotic genomes are packaged into nucleosome particles that occlude the DNA from interacting with most DNA binding proteins. Nucleosomes have higher affinity for particular DNA sequences, reflecting the ability of the sequence to bend sharply, as required by the nucleosome structure. However, it is not known whether these sequence preferences have a significant influence on nucleosome position in vivo, and thus regulate the access of other proteins to DNA. Here we isolated nucleosome-bound sequences at high resolution from yeast and used these sequences in a new computational approach to construct and validate experimentally a nucleosome-DNA interaction model, and to predict the genome-wide organization of nucleosomes. Our results demonstrate that genomes encode an intrinsic nucleosome organization and that this intrinsic organization can explain ∼50% of the in vivo nucleosome positions. This nucleosome positioning code may facilitate specific chromosome functions including transcription factor binding, transcription initiation, and even remodelling of the nucleosomes themselves.
Measurement of breast tissue estradiol levels could provide a powerful method to predict the risk of developing breast cancer but obtaining sufficient amounts of tissue from women is difficult from a practical standpoint. Assessment of aromatase in ductal lavage fluid or fine needle aspirates from breast might provide a surrogate marker for tissue estrogen levels but highly sensitive methods would be required. These considerations prompted us to develop an ultra-sensitive, “nested” PCR assay for aromatase which is up to one million fold more sensitive than standard PCR methods. We initially validated this assay using multiple tissues from the aromatase transgenic mouse and found that coefficients of variation for measurement of replicate samples averaged less than 5%. We demonstrated a 60-fold enhancement in aromatase message in the transgenic versus the wild type mouse breast but surprisingly, levels in the transgenic animals were highly variable, ranging from 0.4 to 27 relative units. The variability of aromatase expression in the transgenic breast did not correlate with the degree of breast development and did not appear to relate to hormonal manipulation of the MMTV promoter but probably related to lack of exhaustive inbreeding and mixed zygocity of transgenic animals. Extensive validation in mouse tissues provided confidence regarding the assay in human tissues, since nearly identical methods were used. The human assay was sufficiently sensitive to detect aromatase in a single human JAR (choriocarcinoma) cell, in all breast biopsies measured, and in 7/23 ductal lavage fluids.
Aromatase; Breast tissue; Ductal lavage; Estrogen; Fine needle aspirate (FNA); Nested PCR
The exact lengths of linker DNAs connecting adjacent nucleosomes specify the intrinsic three-dimensional structures of eukaryotic chromatin fibers. Some studies suggest that linker DNA lengths preferentially occur at certain quantized values, differing one from another by integral multiples of the DNA helical repeat, ∼10 bp; however, studies in the literature are inconsistent. Here, we investigate linker DNA length distributions in the yeast Saccharomyces cerevisiae genome, using two novel methods: a Fourier analysis of genomic dinucleotide periodicities adjacent to experimentally mapped nucleosomes and a duration hidden Markov model applied to experimentally defined dinucleosomes. Both methods reveal that linker DNA lengths in yeast are preferentially periodic at the DNA helical repeat (∼10 bp), obeying the forms 10n+5 bp (integer n). This 10 bp periodicity implies an ordered superhelical intrinsic structure for the average chromatin fiber in yeast.
Eukaryotic genomic DNA exists as chromatin, with the DNA wrapped locally into a repeating array of protein–DNA complexes (“nucleosomes”) separated by short stretches of unwrapped “linker” DNA. Nucleosome arrays further compact into ∼30-nm-wide higher-order chromatin structures. Despite decades of work, there remains no agreement about the structure of the 30 nm fiber, or even if the structure is ordered or random. The helical symmetry of DNA couples the one-dimensional distribution of nucleosomes along the DNA to an intrinsic three-dimensional structure for the chromatin fiber. Random linker length distributions imply random three-dimensional intrinsic fiber structures, whereas different possible nonrandom length distributions imply different ordered structures. Here we use two independent computational methods, with two independent kinds of experimental data, to experimentally define the probability distribution of linker DNA lengths in yeast. Both methods agree that linker DNA lengths in yeast come in a set of preferentially quantized lengths that differ one from another by ∼10 bp, the DNA helical repeat, with a preferred phase offset of 5 bp. The preferential quantization of lengths implies that the intrinsic three-dimensional structure for the average chromatin fiber is ordered, not random. The 5 bp offset implies a particular geometry for this intrinsic structure.
Five independent groups have reported microarray studies that identify dozens of rhythmically expressed genes in the fruit fly Drosophila melanogaster. Limited overlap among the lists of discovered genes makes it difficult to determine which, if any, exhibit truly rhythmic patterns of expression. We reanalyzed data from all five reports and found two sources for the observed discrepancies, the use of different expression pattern detection algorithms and underlying variation among the datasets. To improve upon the methods originally employed, we developed a new analysis that involves compilation of all existing data, application of identical transformation and standardization procedures followed by ANOVA-based statistical prescreening, and three separate classes of post hoc analysis: cross-correlation to various cycling waveforms, autocorrelation, and a previously described fast Fourier transform–based technique [1–3]. Permutation-based statistical tests were used to derive significance measures for all post hoc tests. We find application of our method, most significantly the ANOVA prescreening procedure, significantly reduces the false discovery rate relative to that observed among the results of the original five reports while maintaining desirable statistical power. We identify a set of 81 cycling transcripts previously found in one or more of the original reports as well as a novel set of 133 transcripts not found in any of the original studies. We introduce a novel analysis method that compensates for variability observed among the original five Drosophila circadian array reports. Based on the statistical fidelity of our meta-analysis results, and the results of our initial validation experiments (quantitative RT-PCR), we predict many of our newly found genes to be bona fide cyclers, and suggest that they may lead to new insights into the pathways through which clock mechanisms regulate behavioral rhythms.
Circadian genes regulate many of life's most essential processes, from sleeping and eating to cellular metabolism, learning, and much more. Many of these genes exhibit cyclic transcript expression, a characteristic utilized by an ever-expanding corpus of microarray-based studies to discover additional circadian genes. While these attempts have identified hundreds of transcripts in a variety of organisms, they exhibit a striking lack of agreement, making it difficult to determine which, if any, are truly cycling. Here, we examine one group of these reports (those performed on the fruit fly—Drosophila melanogaster) to identify the sources of observed differences and present a means of analyzing the data that drastically reduces their impact. We demonstrate the fidelity of our method through its application to the original fruit fly microarray data, detecting more than 200 (133 novel) transcripts with a level of statistical fidelity better than that found in any of the original reports. Initial validation experiments (quantitative RT-PCR) suggest these to be truly cycling genes, one of which is now known to be a bona fide circadian gene (cwo). We report the discovery of 133 novel candidate circadian genes as well as the highly adaptable method used to identify them.
In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed.
We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four Arabidopsis thaliana EST sets suggests that the number of expressed genes present in four different cDNA libraries of Arabidopsis thaliana varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%.
The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing.
DNA sequences that are present in nucleosomes have a preferential ∼10 bp periodicity of certain dinucleotide signals (1,2), but the overall sequence similarity of the nucleosomal DNA is weak, and traditional multiple sequence alignment tools fail to yield meaningful alignments. We develop a mixture model that characterizes the known dinucleotide periodicity probabilistically to improve the alignment of nucleosomal DNAs. We assume that a periodic dinucleotide signal of any type emits according to a probability distribution around a series of ‘hot spots’ that are equally spaced along nucleosomal DNA with 10 bp period, but with a 1 bp phase shift across the middle of the nucleosome. We model the three statistically most significant dinucleotide signals, AA/TT, GC and TA, simultaneously, while allowing phase shifts between the signals. The alignment is obtained by maximizing the likelihood of both Watson and Crick strands simultaneously. The resulting alignment of 177 chicken nucleosomal DNA sequences revealed that all 10 distinct dinucleotides are periodic, however, with only two distinct phases and varying intensity. By Fourier analysis, we show that our new alignment has enhanced periodicity and sequence identity compared with center alignment. The significance of the nucleosomal DNA sequence alignment is evaluated by comparing it with that obtained using the same model on non-nucleosomal sequences.