Centromeres, the sites of spindle attachment during mitosis and meiosis, are located in specific positions in the human genome, normally coincident with diverse subsets of alpha satellite DNA. While there is strong evidence supporting the association of some subfamilies of alpha satellite with centromere function, the basis for establishing whether a given alpha satellite sequence is or is not designated a functional centromere is unknown, and attempts to understand the role of particular sequence features in establishing centromere identity have been limited by the near identity and repetitive nature of satellite sequences. Utilizing a broadly applicable experimental approach to test sequence competency for centromere specification, we have carried out a genomic and epigenetic functional analysis of endogenous human centromere sequences available in the current human genome assembly. The data support a model in which functionally competent sequences confer an opportunity for centromere specification, integrating genomic and epigenetic signals and promoting the concept of context-dependent centromere inheritance.
The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.
At least 5–10% of the human genome remains unassembled, unmapped, and poorly characterized. The reference assembly annotates these missing regions as multi-megabase heterochromatic gaps, found primarily near centromeres and on the short arms of the acrocentric chromosomes. This missing fraction of the genome consists predominantly of long arrays of near-identical tandem repeats called satellite DNA. Due to the repetitive nature of satellite DNA, sequence assembly algorithms cannot uniquely align overlapping sequence reads, and thus satellite-rich domains have been omitted from the reference assembly and from most genome-wide studies of variation and function. Existing methods for analyzing some satellite DNAs cannot be easily extended to a large portion of satellites whose repeat structures are complex and largely uncharacterized, such as Human Satellites 2 and 3 (HSat2,3). Here we characterize HSat2,3 using a novel approach that does not depend on having a well-defined repeat structure. By classifying genome-wide HSat2,3 sequences into subfamilies and localizing them to chromosomes, we have generated an initial HSat2,3 genomic reference, which serves as a critical foundation for future studies of variation and function in these regions. This approach should be generally applicable to other classes of satellite DNA, in both the human genome and other complex genomes.
Background: Variable health literacy and genetic knowledge may pose significant challenges to engaging the general public in personal genomics, specifically with respect to promoting risk comprehension and healthy behaviors. Methods: We are conducting a multistage study of individual responses to genomic risk information for Type 2 diabetes mellitus. A total of 300 individuals were recruited from the general public in Durham, North Carolina: 60% self-identified as White; 70% female; and 65% have a college degree. As part of the baseline survey, we assessed genetic knowledge and attitudes toward genetic testing. Results: Scores of factual knowledge of genetics ranged from 50% to 100% (average=84%), with significant differences in relation to racial groups, the education level, and age. Scores were significantly higher on questions pertaining to the inheritance and causes of disease (mean score 90%) compared to scientific questions (mean score 77.4%). Scores on the knowledge survey were significantly higher than scores from European populations. Participants' perceived knowledge of the social consequences of genetic testing was significantly lower than their perceived knowledge of the medical uses of testing. More than half agreed with the statement that testing may affect a person's ability to obtain health insurance (51.3%) and 16% were worried about the consequences of testing for chances of finding a job. Conclusions: Despite the relatively high educational status and genetic knowledge of the study population, we find an imbalance of knowledge between scientific and medical concepts related to genetics as well as between the medical applications and societal consequences of testing, suggesting that more effort is needed to present the benefits, risks, and limitations of genetic testing, particularly, at the social and personal levels, to ensure informed decision making.
There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.
A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization.
The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
While the distribution of RNA polymerase II (PolII) in a variety of complex genomes is correlated with gene expression, the presence of PolII at a gene does not necessarily indicate active expression. Various patterns of PolII binding have been described genome wide; however, whether or not PolII binds at transcriptionally inactive sites remains uncertain. The two X chromosomes in female cells in mammals present an opportunity to examine each of the two alleles of a given locus in both active and inactive states, depending on which X chromosome is silenced by X chromosome inactivation. Here, we investigated PolII occupancy and expression of the associated genes across the active (Xa) and inactive (Xi) X chromosomes in human female cells to elucidate the relationship of gene expression and PolII binding. We find that, while PolII in the pseudoautosomal region occupies both chromosomes at similar levels, it is significantly biased toward the Xa throughout the rest of the chromosome. The general paucity of PolII on the Xi notwithstanding, detectable (albeit significantly reduced) binding can be observed, especially on the evolutionarily younger short arm of the X. PolII levels at genes that escape inactivation correlate with the levels of their expression; however, additional PolII sites can be found at apparently silenced regions, suggesting the possibility of a subset of genes on the Xi that are poised for expression. Consistent with this hypothesis, we show that a high proportion of genes associated with PolII-accessible sites, while silenced in GM12878, are expressed in other female cell lines.
Combinations of histone variants and modifications, conceptually representing a histone code, have been proposed to play a significant role in gene regulation and developmental processes in complex organisms. While various mechanisms have been implicated in establishing and maintaining epigenetic patterns at specific locations in the genome, they are generally believed to be independent of primary DNA sequence on a more global scale.
To address this systematically in the case of the human genome, we have analyzed primary DNA sequences underlying patterns of 19 different methylated histones in human primary T-cells and patterns of three methylated histones across additional human cell lines. We report strong sequence biases associated with most of these histone marks genome-wide in each cell type. Furthermore, the sequence characteristics for such association are distinct for different groups of histone marks.
These findings provide evidence of an influence of genomic sequence on patterns of histone modification associated with gene expression and chromatin programming, and they suggest that the mechanisms responsible for global histone modifications may interpret genomic sequence in various ways.
Centromeres are sites of chromosomal spindle attachment during mitosis and meiosis. While the sequence basis for centromere identity remains a subject of considerable debate, one approach is to examine the genomic organization at these active sites that are correlated with epigenetic marks of centromere function.
We have developed an approach to characterize both satellite and non-satellite centromeric sequences that are missing from current assemblies in complex genomes, using the dog genome as an example. Combining this genomic reference with an epigenetic dataset corresponding to sequences associated with the histone H3 variant centromere protein A (CENP-A), we identify active satellite sequence domains that appear to be both functionally and spatially distinct within the overall definition of satellite families.
These findings establish a genomic and epigenetic foundation for exploring the functional role of centromeric sequences in the previously sequenced dog genome and provide a model for similar studies within the context of less-characterized genomes.
Centromere; Satellite DNAs; CENP-A; Centromere protein A; Canis familiaris (dog)
Many essential aspects of genome function, including gene expression and chromosome segregation, are mediated throughout development and differentiation by changes in the chromatin state. Along with genomic signals encoded in the DNA, epigenetic processes regulate heritable gene expression patterns. Genomic signals such as enhancers, silencers, and repetitive DNA, while required for the establishment of alternative chromatin states, have an unclear role in epigenetic processes that underlie the persistence of chromatin states throughout development. Here, we demonstrate in fission yeast that the maintenance and inheritance of ectopic heterochromatin domains are independent of the genomic sequences necessary for their de novo establishment. We find that both structural heterochromatin and gene silencing can be stably maintained over an ∼10-kb domain for up to hundreds of cell divisions in the absence of genomic sequences required for heterochromatin establishment, demonstrating the long-term persistence and stability of this chromatin state. The de novo heterochromatin, despite the absence of nucleation sequences, is also stably inherited through meiosis. Together, these studies provide evidence for chromatin-dependent, epigenetic control of gene silencing that is heritable, stable, and self-sustaining, even in the absence of the originating genomic signals.
The methylation of cytosines in CpG dinucleotides is essential for cellular differentiation and the progression of many cancers, and it plays an important role in gametic imprinting. To assess variation and inheritance of genome-wide patterns of DNA methylation simultaneously in humans, we applied reduced representation bisulfite sequencing (RRBS) to somatic DNA from six members of a three-generation family. We observed that 8.1% of heterozygous SNPs are associated with differential methylation in cis, which provides a robust signature for Mendelian transmission and relatedness. The vast majority of differential methylation between homologous chromosomes (>92%) occurs on a particular haplotype as opposed to being associated with the gender of the parent of origin, indicating that genotype affects DNA methylation of far more loci than does gametic imprinting. We found that 75% of genotype-dependent differential methylation events in the family are also seen in unrelated individuals and that overall genotype can explain 80% of the variation in DNA methylation. These events are under-represented in CpG islands, enriched in intergenic regions, and located in regions of low evolutionary conservation. Even though they are generally not in functionally constrained regions, 22% (twice as many as expected by chance) of genes harboring genotype-dependent DNA methylation exhibited allele-specific gene expression as measured by RNA-seq of a lymphoblastoid cell line, indicating that some of these events are associated with gene expression differences. Overall, our results demonstrate that the influence of genotype on patterns of DNA methylation is widespread in the genome and greatly exceeds the influence of imprinting on genome-wide methylation patterns.
DNA methylation is a dynamic epigenetic mark that is essential for mammalian organismal development. DNA methylation levels can be influenced by environment, a chromosome's parental origin, and genome sequence. In this study, we evaluated the impact that DNA sequence has on DNA methylation by analyzing methylation levels in a three-generation family as well as unrelated individuals. By following DNA methylation patterns through the family along with nearby SNPs, we found that allelic differences between chromosomes play a much larger role in determining DNA methylation than the parental origin of the chromosome, indicating that DNA sequence has a larger impact on DNA methylation than gametic imprinting. We also found that allelic differences in DNA methylation found in the family can also be observed in unrelated individuals. In fact, the majority of variation in DNA methylation can be explained by genotype. Our results emphasize the importance of genome sequence in setting patterns of DNA methylation and indicate that genotype will need to be taken into account when assessing DNA methylation in the context of disease.
Centromeric regions in many complex eukaryotic species contain highly repetitive satellite DNAs. Despite the diversity of centromeric DNA sequences among species, the functional centromeres in all species studied to date are marked by CENP-A, a centromere-specific histone H3 variant. Although it is well established that families of multimeric higher-order alpha satellite are conserved at the centromeres of human and great ape chromosomes and that diverged monomeric alpha satellite is found in old and new world monkey genomes, little is known about the organization, function, and evolution of centromeric sequences in more distant primates, including lemurs. Aye-Aye (Daubentonia madagascariensis) is a basal primate and is located at a key position in the evolutionary tree to study centromeric satellite transitions in primate genomes. Using the approach of chromatin immunoprecipitation with antibodies directed to CENP-A, we have identified two satellite families, Daubentonia madagascariensis Aye-Aye 1 (DMA1) and Daubentonia madagascariensis Aye-Aye 2 (DMA2), related to each other but unrelated in sequence to alpha satellite or any other previously described primate or mammalian satellite DNA families. Here, we describe the initial genomic and phylogenetic organization of DMA1 and DMA2 and present evidence of higher-order repeats in Aye-Aye centromeric domains, providing an opportunity to study the emergence of chromosome-specific modes of satellite DNA evolution in primate genomes.
centromere; CENP-A; satellite repeats; chromatin immunoprecipitation
The extent to which variation in chromatin structure and transcription factor binding may influence gene expression, and thus underlie or contribute to variation in phenotype, is unknown. To address this question, we cataloged both individual-to-individual variation and differences between homologous chromosomes within the same individual (allele-specific variation) in chromatin structure and transcription factor binding in lymphoblastoid cells derived from individuals of geographically diverse ancestry. Ten percent of active chromatin sites were individual-specific; a similar proportion were allele-specific. Both individual-specific and allele-specific sites were commonly transmitted from parent to child, which suggests that they are heritable features of the human genome. Our study shows that heritable chromatin status and transcription factor binding differ as a result of genetic variation and may underlie phenotypic variation in humans.
Characterizing how genomic sequence interacts with trans-acting regulatory factors to implement a program of gene expression in eukaryotic organisms is critical to understanding genome function. One means by which patterns of gene expression are achieved is through the differential packaging of DNA into distinct types of chromatin. While chromatin state exerts a major influence on gene expression, the extent to which cis-acting DNA sequences contribute to the specification of chromatin state remains incompletely understood. To address this, we have used a fission yeast sequence element (L5), known to be sufficient to nucleate heterochromatin, to establish de novo heterochromatin domains in the Schizosaccharomyces pombe genome. The resulting heterochromatin domains were queried for the presence of H3K9 di-methylation and Swi6p, both hallmarks of heterochromatin, and for levels of gene expression. We describe a major effect of genomic sequences in determining the size and extent of such de novo heterochromatin domains. Heterochromatin spreading is antagonized by the presence of genes, in a manner that can occur independent of strength of transcription. Increasing the dosage of Swi6p results in increased heterochromatin proximal to the L5 element, but does not result in an expansion of the heterochromatin domain, suggesting that in this context genomic effects are dominant over trans effects. Finally, we show that the ratio of Swi6p to H3K9 di-methylation is sequence-dependent and correlates with the extent of gene repression. Taken together, these data demonstrate that the sequence content of a genomic region plays a significant role in shaping its response to encroaching heterochromatin and suggest a role of DNA sequence in specifying chromatin state.
Epigenetic packaging of DNA sequence into chromatin is a major force in shaping the function of complex genomes. Different types of chromatin have distinct effects on gene expression, and thus chromatin state imparts distinct features on the associated genomic DNA. Our study focuses on the transition between two opposing chromatin states: euchromatin, which generally correlates with gene expression, and heterochromatin, which is typically refractive to gene expression. While heterochromatin is capable of spreading into euchromatic domains, the parameters that influence such spreading are unknown. We established heterochromatin at ectopic sites in the genome and evaluated whether specific DNA sequences affected the extent of heterochromatin spreading and the transition between heterochromatin and euchromatin. We found that the nature of the genomic DNA neighboring the heterochromatic sequence dramatically affected the extent of heterochromatin spreading. In particular, the presence of genes antagonized the spread of heterochromatin, whereas neutral sequence elements were incorporated into the domain. This study demonstrates that genome sequence and chromatin identity are inextricably linked; features of both interact to determine the structural and functional fate of underlying DNA sequences.
The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence.
In recent years, the completion of the Human Genome Project and other rapid advances in genomics have led to increasing anticipation of an era of genomic and personalized medicine, in which an individual's health is optimized through the use of all available patient data, including data on the individual's genome and its downstream products. Genomic and personalized medicine could transform healthcare systems and catalyze significant reductions in morbidity, mortality, and overall healthcare costs.
Critical to the achievement of more efficient and effective healthcare enabled by genomics is the establishment of a robust, nationwide clinical decision support infrastructure that assists clinicians in their use of genomic assays to guide disease prevention, diagnosis, and therapy. Requisite components of this infrastructure include the standardized representation of genomic and non-genomic patient data across health information systems; centrally managed repositories of computer-processable medical knowledge; and standardized approaches for applying these knowledge resources against patient data to generate and deliver patient-specific care recommendations. Here, we provide recommendations for establishing a national decision support infrastructure for genomic and personalized medicine that fulfills these needs, leverages existing resources, and is aligned with the Roadmap for National Action on Clinical Decision Support commissioned by the U.S. Office of the National Coordinator for Health Information Technology. Critical to the establishment of this infrastructure will be strong leadership and substantial funding from the federal government.
A national clinical decision support infrastructure will be required for reaping the full benefits of genomic and personalized medicine. Essential components of this infrastructure include standards for data representation; centrally managed knowledge repositories; and standardized approaches for leveraging these knowledge repositories to generate patient-specific care recommendations at the point of care.
Heterochromatin formation involves the nucleation and spreading of structural and epigenetic features along the chromatin fiber. Chromatin barriers and associated proteins counteract the spreading of heterochromatin, thereby restricting it to specific regions of the genome. We have performed gene expression studies and chromatin immunoprecipitation on strains in which native centromere sequences have been mutated to study the mechanism by which a tRNAAlanine gene barrier (cen1 tDNAAla) blocks the spread of pericentromeric heterochromatin at the centromere of chromosome 1 (cen1) in the fission yeast, Schizosaccharomyces pombe. Within the centromere, barrier activity is a general property of tDNAs and, unlike previously characterized barriers, requires the association of both transcription factor IIIC and RNA Polymerase III. Although the cen1 tDNAAla gene is actively transcribed, barrier activity is independent of transcriptional orientation. These findings provide experimental evidence for the involvement of a fully assembled RNA polymerase III transcription complex in defining independent structural and functional domains at a eukaryotic centromere.
A significant number of human X-linked genes escape X chromosome inactivation and are thus expressed from both the active and inactive X chromosomes. The basis for escape from inactivation and the potential role of the X chromosome primary DNA sequence in determining a gene's X inactivation status is unclear. Using a combination of the X chromosome sequence and a comprehensive X inactivation profile of more than 600 genes, two independent yet complementary approaches were used to systematically investigate the relationship between X inactivation and DNA sequence features. First, statistical analyses revealed that a number of repeat features, including long interspersed nuclear element (LINE) and mammalian-wide interspersed repeat repetitive elements, are significantly enriched in regions surrounding transcription start sites of genes that are subject to inactivation, while Alu repetitive elements and short motifs containing ACG/CGT are significantly enriched in those that escape inactivation. Second, linear support vector machine classifiers constructed using primary DNA sequence features were used to correctly predict the X inactivation status for >80% of all X-linked genes. We further identified a small set of features that are important for accurate classification, among which LINE-1 and LINE-2 content show the greatest individual discriminatory power. Finally, as few as 12 features can be used for accurate support vector machine classification. Taken together, these results suggest that features of the underlying primary DNA sequence of the human X chromosome may influence the spreading and/or maintenance of X inactivation.
Female mammals have two X chromosomes while males have one X and one Y chromosome. To equalize dosage of X chromosome genes in males and females, one X in female cells is inactivated, repressing the expression of most genes on the chromosome. Despite the chromosome-wide nature of X inactivation, at least 10%–15% of genes “escape” this inactivation in human females and are still expressed on the inactivated X. Whether a gene escapes or is subject to inactivation is thought to be determined epigenetically, and it is unknown to what extent, if at all, the underlying genomic DNA sequence of the chromosome plays a role. In this work, the authors show that the DNA sequence surrounding genes that escape inactivation is significantly different from the sequence surrounding genes that are subject to inactivation. In fact, a small number of DNA sequence features can be used to predict with high accuracy whether a gene will escape or be subject to this silencing. This establishes strong evidence that epigenetic regulation is, at least in part, dependent on genomic sequence and organization and provides a list of candidate sequence features whose role(s) in X inactivation can now be explored.
Advances in genome technology and other fruits of the Human Genome Project are playing a growing role in the delivery of health care. With the development of new technologies and opportunities for large-scale analysis of the genome, transcriptome, proteome and metabolome, the genome sciences are poised to have a profound impact on clinical medicine. Cancer prognostics will be among the first major test cases for a genomic medicine paradigm, given that all cancer is caused by genomic instability, and microarrays allow assessment of patients' entire expressed genomes. Analysis of breast cancer patients' expression patterns can already be highly correlated with recurrence risks. By integrating clinical data with gene expression profiles, imaging, metabolomic profiles and proteomic data, the prospect for developing truly individualized care becomes ever more real. Notwithstanding these promises, daunting challenges remain for genomic medicine. Success will require planning robust prospective trials, analysing health care economic and outcome data, assuaging insurance and privacy concerns, developing health delivery models that are commercially viable and scaling up to meet the needs of the whole population.
genomic medicine; gene expression profiling; microarrays; proteomics; metabolomics; personalized health care
Human Artificial Chromosomes (HACs) are potentially useful vectors for gene transfer studies and for functional annotation of the genome because of their suitability for cloning, manipulating and transferring large segments of the genome. However, development of HACs for the transfer of large genomic loci into mammalian cells has been limited by difficulties in manipulating high-molecular weight DNA, as well as by the low overall frequencies of de novo HAC formation. Indeed, to date, only a small number of large (>100 kb) genomic loci have been reported to be successfully packaged into de novo HACs.
We have developed novel methodologies to enable efficient assembly of HAC vectors containing any genomic locus of interest. We report here the creation of a novel, bimolecular system based on bacterial artificial chromosomes (BACs) for the construction of HACs incorporating any defined genomic region. We have utilized this vector system to rapidly design, construct and validate multiple de novo HACs containing large (100–200 kb) genomic loci including therapeutically significant genes for human growth hormone (HGH), polycystic kidney disease (PKD1) and ß-globin. We report significant differences in the ability of different genomic loci to support de novo HAC formation, suggesting possible effects of cis-acting genomic elements. Finally, as a proof of principle, we have observed sustained ß-globin gene expression from HACs incorporating the entire 200 kb ß-globin genomic locus for over 90 days in the absence of selection.
Taken together, these results are significant for the development of HAC vector technology, as they enable high-throughput assembly and functional validation of HACs containing any large genomic locus. We have evaluated the impact of different genomic loci on the frequency of HAC formation and identified segments of genomic DNA that appear to facilitate de novo HAC formation. These genomic loci may be useful for identifying discrete functional elements that may be incorporated into future generations of HAC vectors.
Efficient construction of BAC-based human artificial chromosomes (HACs) requires optimization of each key functional unit as well as development of techniques for the rapid and reliable manipulation of high-molecular weight BAC vectors. Here, we have created synthetic chromosome 17-derived alpha-satellite arrays, based on the 16-monomer repeat length typical of natural D17Z1 arrays, in which the consensus CENP-B box elements are either completely absent (0/16 monomers) or increased in density (16/16 monomers) compared to D17Z1 alpha-satellite (5/16 monomers). Using these vectors, we show that the presence of CENP-B box elements is a requirement for efficient de novo centromere formation and that increasing the density of CENP-B box elements may enhance the efficiency of de novo centromere formation. Furthermore, we have developed a novel, high-throughput methodology that permits the rapid conversion of any genomic BAC target into a HAC vector by transposon-mediated modification with synthetic alpha-satellite arrays and other key functional units. Taken together, these approaches offer the potential to significantly advance the utility of BAC-based HACs for functional annotation of the genome and for applications in gene transfer.
An assay of the formation of heterochromatin and euchromatin on de novo human artificial chromosomes containing alpha satellite DNA revealed that only a small amount of heterochromatin may be required for centromere function and that replication late in S phase is not a requirement for centromere function.
Human centromere regions are characterized by the presence of alpha-satellite DNA, replication late in S phase and a heterochromatic appearance. Recent models propose that the centromere is organized into conserved chromatin domains in which chromatin containing CenH3 (centromere-specific H3 variant) at the functional centromere (kinetochore) forms within regions of heterochromatin. To address these models, we assayed formation of heterochromatin and euchromatin on de novo human artificial chromosomes containing alpha-satellite DNA. We also examined the relationship between chromatin composition and replication timing of artificial chromosomes.
Heterochromatin factors (histone H3 lysine 9 methylation and HP1α) were enriched on artificial chromosomes estimated to be larger than 3 Mb in size but depleted on those smaller than 3 Mb. All artificial chromosomes assembled markers of euchromatin (histone H3 lysine 4 methylation), which may partly reflect marker-gene expression. Replication timing studies revealed that the replication timing of artificial chromosomes was heterogeneous. Heterochromatin-depleted artificial chromosomes replicated in early S phase whereas heterochromatin-enriched artificial chromosomes replicated in mid to late S phase.
Centromere regions on human artificial chromosomes and host chromosomes have similar amounts of CenH3 but exhibit highly varying degrees of heterochromatin, suggesting that only a small amount of heterochromatin may be required for centromere function. The formation of euchromatin on all artificial chromosomes demonstrates that they can provide a chromosome context suitable for gene expression. The earlier replication of the heterochromatin-depleted artificial chromosomes suggests that replication late in S phase is not a requirement for centromere function.
Human artificial chromosomes have been used to model requirements for human chromosome segregation and to explore the nature of sequences competent for centromere function. Normal human centromeres require specialized chromatin that consists of alpha satellite DNA complexed with epigenetically modified histones and centromere-specific proteins. While several types of alpha satellite DNA have been used to assemble de novo centromeres in artificial chromosome assays, the extent to which they fully recapitulate normal centromere function has not been explored. Here, we have used two kinds of alpha satellite DNA, DXZ1 (from the X chromosome) and D17Z1 (from chromosome 17), to generate human artificial chromosomes. Although artificial chromosomes are mitotically stable over many months in culture, when we examined their segregation in individual cell divisions using an anaphase assay, artificial chromosomes exhibited more segregation errors than natural human chromosomes (P < 0.001). Naturally occurring, but abnormal small ring chromosomes derived from chromosome 17 and the X chromosome also missegregate more than normal chromosomes, implicating overall chromosome size and/or structure in the fidelity of chromosome segregation. As different artificial chromosomes missegregate over a fivefold range, the data suggest that variable centromeric DNA content and/or epigenetic assembly can influence the mitotic behavior of artificial chromosomes.
One of several features acquired by chromatin of the inactive X chromosome (Xi) is enrichment for the core histone H2A variant macroH2A within a distinct nuclear structure referred to as a macrochromatin body (MCB). In addition to localizing to the MCB, macroH2A accumulates at a perinuclear structure centered at the centrosome. To better understand the association of macroH2A1 with the centrosome and the formation of an MCB, we investigated the distribution of macroH2A1 throughout the somatic cell cycle. Unlike Xi-specific RNA, which associates with the Xi throughout interphase, the appearance of an MCB is predominantly a feature of S phase. Although the MCB dissipates during late S phase and G2 before reforming in late G1, macroH2A1 remains associated during mitosis with specific regions of the Xi, including at the X inactivation center. This association yields a distinct macroH2A banding pattern that overlaps with the site of histone H3 lysine-4 methylation centered at the DXZ4 locus in Xq24. The centrosomal pool of macroH2A1 accumulates in the presence of an inhibitor of the 20S proteasome. Therefore, targeting of macroH2A1 to the centrosome is likely part of a degradation pathway, a mechanism common to a variety of other chromatin proteins.
XIST; macroH2A; chromatin; centrosome; aggresome
Chromatin on the mammalian inactive X chromosome differs in a number of ways from that on the active X. One protein, macroH2A, whose amino terminus is closely related to histone H2A, is enriched on the heterochromatic inactive X chromosome in female cells. Here, we report the identification and localization of a novel and more distant histone variant, designated H2A-Bbd, that is only 48% identical to histone H2A. In both interphase and metaphase female cells, using either a myc epitope–tagged or green fluorescent protein–tagged H2A-Bbd construct, the inactive X chromosome is markedly deficient in H2A-Bbd staining, while the active X and the autosomes stain throughout. In double-labeling experiments, antibodies to acetylated histone H4 show a pattern of staining indistinguishable from H2A-Bbd in interphase nuclei and on metaphase chromosomes. Chromatin fractionation demonstrates association of H2A-Bbd with the histone proteins. Separation of micrococcal nuclease–digested chromatin by sucrose gradient ultracentrifugation shows cofractionation of H2A-Bbd with nucleosomes, supporting the idea that H2A-Bbd is incorporated into nucleosomes as a substitute for the core histone H2A. This finding, in combination with the overlap with acetylated forms of H4, raises the possibility that H2A-Bbd is enriched in nucleosomes associated with transcriptionally active regions of the genome. The distribution of H2A-Bbd thus distinguishes chromatin on the active and inactive X chromosomes.
histones; X chromosome inactivation; euchromatin; histone H4 acetylation; macroH2A