It is a worthy goal to completely characterize all human proteins in terms of their domains. Here, using the Pfam database, we asked how far we have progressed in this endeavour. Ninety per cent of proteins in the human proteome matched at least one of 5494 manually curated Pfam-A families. In contrast, human residue coverage by Pfam-A families was <45%, with 9418 automatically generated Pfam-B families adding a further 10%. Even after excluding predicted signal peptide regions and short regions (<50 consecutive residues) unlikely to harbour new families, for ∼38% of the human protein residues, there was no information in Pfam about conservation and evolutionary relationship with other protein regions. This uncovered portion of the human proteome was found to be distributed over almost 25 000 distinct protein regions. Comparison with proteins in the UniProtKB database suggested that the human regions that exhibited similarity to thousands of other sequences were often either divergent elements or N- or C-terminal extensions of existing families. Thirty-four per cent of regions, on the other hand, matched fewer than 100 sequences in UniProtKB. Most of these did not appear to share any relationship with existing Pfam-A families, suggesting that thousands of new families would need to be generated to cover them. Also, these latter regions were particularly rich in amino acid compositional bias such as the one associated with intrinsic disorder. This could represent a significant obstacle toward their inclusion into new Pfam families. Based on these observations, a major focus for increasing Pfam coverage of the human proteome will be to improve the definition of existing families. New families will also be built, prioritizing those that have been experimentally functionally characterized.
Database URL: http://pfam.sanger.ac.uk/
Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.
We have identified a new bacterial protein domain that we hypothesise binds to peptidoglycan. This domain is called the YARHG domain after the most highly conserved sequence-segment. The domain is found in the extracellular space and is likely to be composed of four alpha-helices. The domain is found associated with protein kinase domains, suggesting it is associated with signalling in some bacteria. The domain is also found associated with three different families of peptidases. The large number of different domains that are found associated with YARHG suggests that it is a useful functional module that nature has recombined multiple times.
Pfam is a widely used database of protein families, currently containing more than 13 000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the ‘sunburst’ representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
The major histocompatibility complex (MHC) is a group of genes with a variety of roles in the innate and adaptive immune responses. MHC genes form a genetically linked cluster in eutherian mammals, an organization that is thought to confer functional and evolutionary advantages to the immune system. The tammar wallaby (Macropus eugenii), an Australian marsupial, provides a unique model for understanding MHC gene evolution, as many of its antigen presenting genes are not linked to the MHC, but are scattered around the genome.
Here we describe the 'core' tammar wallaby MHC region on chromosome 2q by ordering and sequencing 33 BAC clones, covering over 4.5 MB and containing 129 genes. When compared to the MHC region of the South American opossum, eutherian mammals and non-mammals, the wallaby MHC has a novel gene organization. The wallaby has undergone an expansion of MHC class II genes, which are separated into two clusters by the class III genes. The antigen processing genes have undergone duplication, resulting in two copies of TAP1 and three copies of TAP2. Notably, Kangaroo Endogenous Retroviral Elements are present within the region and may have contributed to the genomic instability.
The wallaby MHC has been extensively remodeled since the American and Australian marsupials last shared a common ancestor. The instability is characterized by the movement of antigen presenting genes away from the core MHC, most likely via the presence and activity of retroviral elements. We propose that the movement of class II genes away from the ancestral class II region has allowed this gene family to expand and diversify in the wallaby. The duplication of TAP genes in the wallaby MHC makes this species a unique model organism for studying the relationship between MHC gene organization and function.
Domains of unknown function (DUFs) are a large set of uncharacterized protein families that structural genomics is helping biologists to understand functionally.
Domains of unknown function (DUFs) are a large set of uncharacterized protein families that are found in the Pfam database. Here, the scale and growth of functionally uncharacterized families in biological databases are surveyed and the prospects for discovering their function are examined. In particular, the important role that structural genomics can play in identifying potential function is evaluated.
structural genomics; domain of unknown function (DUF); uncharacterized protein family (UPF); Pfam
Many Gram-positive lactic acid bacteria (LAB) produce anti-bacterial peptides and small proteins called bacteriocins, which enable them to compete against other bacteria in the environment. These peptides fall structurally into three different classes, I, II, III, with class IIa being pediocin-like single entities and class IIb being two-peptide bacteriocins. Self-protective cognate immunity proteins are usually co-transcribed with these toxins. Several examples of cognates for IIa have already been solved structurally. Streptococcus pyogenes, closely related to LAB, is one of the most common human pathogens, so knowledge of how it competes against other LAB species is likely to prove invaluable.
We have solved the crystal structure of the gene-product of locus Spy_2152 from S. pyogenes, (PDB:2fu2), and found it to comprise an anti-parallel four-helix bundle that is structurally similar to other bacteriocin immunity proteins. Sequence analyses indicate this protein to be a possible immunity protein protective against class IIa or IIb bacteriocins. However, given that S. pyogenes appears to lack any IIa pediocin-like proteins but does possess class IIb bacteriocins, we suggest this protein confers immunity to IIb-like peptides.
Combined structural, genomic and proteomic analyses have allowed the identification and in silico characterization of a new putative immunity protein from S. pyogenes, possibly the first structure of an immunity protein protective against potential class IIb two-peptide bacteriocins. We have named the two pairs of putative bacteriocins found in S. pyogenes pyogenecin 1, 2, 3 and 4.
Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is ∼100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11 912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).
There are two main classes of natural killer (NK) cell receptors in mammals, the killer cell immunoglobulin-like receptors (KIR) and the structurally unrelated killer cell lectin-like receptors (KLR). While KIR represent the most diverse group of NK receptors in all primates studied to date, including humans, apes, and Old and New World monkeys, KLR represent the functional equivalent in rodents. Here, we report a first digression from this rule in lemurs, where the KLR (CD94/NKG2) rather than KIR constitute the most diverse group of NK cell receptors. We demonstrate that natural selection contributed to such diversification in lemurs and particularly targeted KLR residues interacting with the peptide presented by MHC class I ligands. We further show that lemurs lack a strict ortholog or functional equivalent of MHC-E, the ligands of non-polymorphic KLR in “higher” primates. Our data support the existence of a hitherto unknown system of polymorphic and diverse NK cell receptors in primates and of combinatorial diversity as a novel mechanism to increase NK cell receptor repertoire.
Most receptors of natural killer (NK) cells interact with highly polymorphic major histocompatibility complex (MHC) class I molecules and thereby regulate the activity of NK cells against infected or malignant target cells. Whereas humans, apes, and Old and New World monkeys use the family of killer cell immunoglobulin-like receptors (KIR) as highly diverse NK cell receptors, this function is performed in rodents by the diverse family of lectin-like receptors Ly49. When did this functional separation occur in evolution? We followed this by investigating lemurs, primates that are distantly related to humans. We show here that lemurs employ the CD94/NKG2 family as their highly diversified NK cell receptors. The CD94/NKG2 receptors also belong to the lectin-like receptor family, but are rather conserved in “higher” primates and rodents. We could further demonstrate that lemurs have a single Ly49 gene like other primates but lack functional KIR genes of the KIR3DL lineage and show major deviations in their MHC class I genomic organisation. Thus, lemurs have evolved a “third way” of polymorphic and diverse NK cell receptors. In addition, the multiplied lemur CD94/NKG2 receptors can be freely combined, thereby forming diverse receptors. This is, therefore, the first description of some combinatorial diversity of NK cell receptors.
MHC class I antigens are encoded by a rapidly evolving gene family comprising classical and non-classical genes that are found in all vertebrates and involved in diverse immune functions. However, there is a fundamental difference between the organization of class I genes in mammals and non-mammals. Non-mammals have a single classical gene responsible for antigen presentation, which is linked to the antigen processing genes, including TAP. This organization allows co-evolution of advantageous class Ia/TAP haplotypes. In contrast, mammals have multiple classical genes within the MHC, which are separated from the antigen processing genes by class III genes. It has been hypothesized that separation of classical class I genes from antigen processing genes in mammals allowed them to duplicate. We investigated this hypothesis by characterizing the class I genes of the tammar wallaby, a model marsupial that has a novel MHC organization, with class I genes located within the MHC and 10 other chromosomal locations.
Sequence analysis of 14 BACs containing 15 class I genes revealed that nine class I genes, including one to three classical class I, are not linked to the MHC but are scattered throughout the genome. Kangaroo Endogenous Retroviruses (KERVs) were identified flanking the MHC un-linked class I. The wallaby MHC contains four non-classical class I, interspersed with antigen processing genes. Clear orthologs of non-classical class I are conserved in distant marsupial lineages.
We demonstrate that classical class I genes are not linked to antigen processing genes in the wallaby and provide evidence that retroviral elements were involved in their movement. The presence of retroviral elements most likely facilitated the formation of recombination hotspots and subsequent diversification of class I genes. The classical class I have moved away from antigen processing genes in eutherian mammals and the wallaby independently, but both lineages appear to have benefited from this loss of linkage by increasing the number of classical genes, perhaps enabling response to a wider range of pathogens. The discovery of non-classical orthologs between distantly related marsupial species is unusual for the rapidly evolving class I genes and may indicate an important marsupial specific function.
Numerous genetic association studies have implicated the KIAA0319 gene on human chromosome 6p22 in dyslexia susceptibility. The causative variant(s) remains unknown but may modulate gene expression, given that (1) a dyslexia-associated haplotype has been implicated in the reduced expression of KIAA0319, and (2) the strongest association has been found for the region spanning exon 1 of KIAA0319. Here, we test the hypothesis that variant(s) responsible for reduced KIAA0319 expression resides on the risk haplotype close to the gene's transcription start site. We identified seven single-nucleotide polymorphisms on the risk haplotype immediately upstream of KIAA0319 and determined that three of these are strongly associated with multiple reading-related traits. Using luciferase-expressing constructs containing the KIAA0319 upstream region, we characterized the minimal promoter and additional putative transcriptional regulator regions. This revealed that the minor allele of rs9461045, which shows the strongest association with dyslexia in our sample (max p-value = 0.0001), confers reduced luciferase expression in both neuronal and non-neuronal cell lines. Additionally, we found that the presence of this rs9461045 dyslexia-associated allele creates a nuclear protein-binding site, likely for the transcriptional silencer OCT-1. Knocking down OCT-1 expression in the neuronal cell line SHSY5Y using an siRNA restores KIAA0319 expression from the risk haplotype to nearly that seen from the non-risk haplotype. Our study thus pinpoints a common variant as altering the function of a dyslexia candidate gene and provides an illustrative example of the strategic approach needed to dissect the molecular basis of complex genetic traits.
Dyslexia, or reading disability, is a common disorder caused by both genetic and environmental factors. Genetic studies have implicated a number of genes as candidates for playing a role in dyslexia. We functionally characterized one such gene (KIAA0319) to identify variant(s) that might affect gene expression and contribute to the disorder. We discovered a variant residing outside of the protein-coding region of KIAA0319 that reduces expression of the gene. This variant creates a binding site for the transcription factor OCT-1. Previous studies have shown that OCT-1 binding to a specific DNA sequence upstream of a gene can reduce the expression of that gene. In this case, reduced KIAA0319 expression could lead to improper development of regions of the brain involved in reading ability. This is the first study to identify a functional variant implicated in dyslexia. More broadly, our study illustrates the steps that can be utilized for identifying mutations causing other complex genetic disorders.
The major histocompatibility complex (MHC) is essential for human immunity and is highly associated with common diseases, including cancer. While the genetics of the MHC has been studied intensively for many decades, very little is known about the epigenetics of this most polymorphic and disease-associated region of the genome.
To facilitate comprehensive epigenetic analyses of this region, we have generated a genomic tiling array of 2 Kb resolution covering the entire 4 Mb MHC region. The array has been designed to be compatible with chromatin immunoprecipitation (ChIP), methylated DNA immunoprecipitation (MeDIP), array comparative genomic hybridization (aCGH) and expression profiling, including of non-coding RNAs. The array comprises 7832 features, consisting of two replicates of both forward and reverse strands of MHC amplicons and appropriate controls.
Using MeDIP, we demonstrate the application of the MHC array for DNA methylation profiling and the identification of tissue-specific differentially methylated regions (tDMRs). Based on the analysis of two tissues and two cell types, we identified 90 tDMRs within the MHC and describe their characterisation.
A tiling array covering the MHC region was developed and validated. Its successful application for DNA methylation profiling indicates that this array represents a useful tool for molecular analyses of the MHC in the context of medical genomics.
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.
Major histocompatibility complex; Haplotype; Polymorphism; Retroelement; Genetic predisposition to disease; Population genetics
Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/), as well as from mirror sites in France (http://pfam.jouy.inra.fr/) and South Korea (http://pfam.ccbb.re.kr/).
The Major Histocompatibility Complex (MHC) is essential for immune function. Historically, it has been subdivided into three regions (Class I, II, and III), but a cluster of functionally related genes within the Class III region has also been referred to as the Class IV region or "inflammatory region". This group of genes is involved in the inflammatory response, and includes members of the tumour necrosis family. Here we report the sequencing, annotation and comparative analysis of a tammar wallaby BAC containing the inflammatory region. We also discuss the extent of sequence conservation across the entire region and identify elements conserved in evolution.
Fourteen Class III genes from the tammar wallaby inflammatory region were characterised and compared to their orthologues in other vertebrates. The organisation and sequence of genes in the inflammatory region of both the wallaby and South American opossum are highly conserved compared to known genes from eutherian ("placental") mammals. Some minor differences separate the two marsupial species. Eight genes within the inflammatory region have remained tightly clustered for at least 360 million years, predating the divergence of the amphibian lineage. Analysis of sequence conservation identified 354 elements that are conserved. These range in size from 7 to 431 bases and cover 15.6% of the inflammatory region, representing approximately a 4-fold increase compared to the average for vertebrate genomes. About 5.5% of this conserved sequence is marsupial-specific, including three cases of marsupial-specific repeats. Highly Conserved Elements were also characterised.
Using comparative analysis, we show that a cluster of MHC genes involved in inflammation, including TNF, LTA (or its putative teleost homolog TNF-N), APOM, and BAT3 have remained together for over 450 million years, predating the divergence of mammals from fish. The observed enrichment in conserved sequences within the inflammatory region suggests conservation at the transcriptional regulatory level, in addition to the functional level.
Killer Immunoglobulin-like Receptors (KIR) are essential immuno-surveillance molecules. They are expressed on natural killer and T cells, and interact with human leukocyte antigens. KIR genes are highly polymorphic and contribute vital variability to our immune system. Numerous KIR genes, belonging to five distinct lineages, have been identified in all primates examined thus far and shown to be rapidly evolving. Since few KIR remain orthologous between species, with only one of them, KIR2DL4, shown to be common to human, apes and monkeys, the evolution of the KIR gene family in primates remains unclear.
Using comparative analyses, we have identified the ancestral KIR lineage (provisionally named KIR3DL0) in primates. We show KIR3DL0 to be highly conserved with the identification of orthologues in human (Homo sapiens), common chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), rhesus monkey (Macaca mulatta) and common marmoset (Callithrix jacchus). We predict KIR3DL0 to encode a functional molecule in all primates by demonstrating expression in human, chimpanzee and rhesus monkey. Using the rhesus monkey as a model, we further show the expression profile to be typical of KIR by quantitative measurement of KIR3DL0 from an enriched population of natural killer cells.
One reason why KIR3DL0 may have escaped discovery for so long is that, in human, it maps in between two related leukocyte immunoglobulin-like receptor clusters outside the known KIR gene cluster on Chromosome 19. Based on genomic, cDNA, expression and phylogenetic data, we report a novel lineage of immunoglobulin receptors belonging to the KIR family, which is highly conserved throughout 50 million years of primate evolution.
The innate and adaptive immune systems of vertebrates possess complementary, but intertwined functions within immune responses. Receptors of the mammalian innate immune system play an essential role in the detection of infected or transformed cells and are vital for the initiation and regulation of a full adaptive immune response. The genes for several of these receptors are clustered within the leukocyte receptor complex (LRC). The purpose of this study was to carry out a detailed analysis of the chicken (Gallus gallus domesticus) LRC. Bacterial artificial chromosomes containing genes related to mammalian leukocyte immunoglobulin-like receptors were identified in a chicken genomic library and shown to map to a single microchromosome. Sequencing revealed 103 chicken immunoglobulin-like receptor (CHIR) loci (22 inhibitory, 25 activating, 15 bifunctional, and 41 pseudogenes). A very complex splicing pattern was found using transcript analyses and seven hypervariable regions were detected in the external CHIR domains. Phylogenetic and genomic analysis showed that CHIR genes evolved mainly by block duplications from an ancestral inhibitory receptor locus, with transformation into activating receptors occurring more than once. Evolutionary selection pressure has led not only to an exceptional expansion of the CHIR cluster but also to a dramatic diversification of CHIR loci and haplotypes. This indicates that CHIRs have the potential to complement the adaptive immune system in fighting pathogens.
The immune system developed to cope with a diverse array of pathogens, including infectious organisms. The detection of these pathogens by cells of the immune system is mediated by a large set of specific receptor proteins. Here the authors seek to understand how a particular subset of cell surface receptors of the domestic chicken, the chicken Ig-like receptors (CHIR), has evolved. They demonstrate that at least 103 such receptor loci are clustered on a single microchromosome and provide the first detailed analysis of this region. The sequences of the CHIR genes suggest the presence of inhibitory, activating, and bifunctional receptors, as well as numerous incomplete loci (pseudogenes) that appear to have evolved by duplications of an ancestral inhibitory receptor gene. Multiple regions of very high sequence variability were also identified within CHIR loci which, together with considerable expansion of the number of these genes, suggest that CHIR polypeptides are involved in critical functions in the immune system of the chicken.
The major histocompatibility complex (MHC) is recognised as one of the most important genetic regions in relation to common human disease. Advancement in identification of MHC genes that confer susceptibility to disease requires greater knowledge of sequence variation across the complex. Highly duplicated and polymorphic regions of the human genome such as the MHC are, however, somewhat refractory to some whole-genome analysis methods. To address this issue, we are employing a bacterial artificial chromosome (BAC) cloning strategy to sequence entire MHC haplotypes from consanguineous cell lines as part of the MHC Haplotype Project. Here we present 4.25 Mb of the human haplotype QBL (HLA-A26-B18-Cw5-DR3-DQ2) and compare it with the MHC reference haplotype and with a second haplotype, COX (HLA-A1-B8-Cw7-DR3-DQ2), that shares the same HLA-DRB1, -DQA1, and -DQB1 alleles. We have defined the complete gene, splice variant, and sequence variation contents of all three haplotypes, comprising over 259 annotated loci and over 20,000 single nucleotide polymorphisms (SNPs). Certain coding sequences vary significantly between different haplotypes, making them candidates for functional and disease-association studies. Analysis of the two DR3 haplotypes allowed delineation of the shared sequence between two HLA class II–related haplotypes differing in disease associations and the identification of at least one of the sites that mediated the original recombination event. The levels of variation across the MHC were similar to those seen for other HLA-disparate haplotypes, except for a 158-kb segment that contained the HLA-DRB1, -DQA1, and -DQB1 genes and showed very limited polymorphism compatible with identity-by-descent and relatively recent common ancestry (<3,400 generations). These results indicate that the differential disease associations of these two DR3 haplotypes are due to sequence variation outside this central 158-kb segment, and that shuffling of ancestral blocks via recombination is a potential mechanism whereby certain DR–DQ allelic combinations, which presumably have favoured immunological functions, can spread across haplotypes and populations.
A group of genes involved in the human immune system are contained within a surprisingly short section of Chromosome 6 that has long been recognised as the most important genomic region in relation to disease susceptibility. Discerning the actual genes playing a role in disease has proved difficult mainly because the region contains numerous genes and is also the most genetically variable in the genome. Within this jungle of variation, the research reported here has identified and characterised a discrete segment shared by two individuals that is virtually devoid of variation—a polymorphism desert. The conservation of this segment amongst a background of extreme variation suggests both an ancient origin and genetic exchange in early human history. These observations are important in evolutionary terms as they reveal a potential mechanism whereby certain genetic segments associated with favourable immune functions have spread across human populations. Within medical terms this may also explain contrasting disease risks in people from different ethnic backgrounds. Public access to these data will help researchers find specific variants conferring disease susceptibility or resistance and, as in this report, rule out regions for conveying specificity to certain diseases.