Chromosome 13 shows striking features of low gene density compared to the other finished, annotated autosomes, and also a very variable gene distribution. The overall gene density is the lowest of all the sequenced autosomes (see ), with an average (excluding pseudogenes and ncRNA genes) of 6.5 genes per Mb (refs 2-5,15,16
). This analysis extends and confirms previous observations10,17,18
. Consistent with the low gene density, the G+C content of 38.5% and the predicted CpG island density of 5.4 Mb-1
are considerably below the genome averages (). Exon coverage of chromosome 13 sequence is also substantially lower (1.3%) than that of other autosomes, except chromosome 7. However, the average gene length on chromosome 13 is 57 kb, which is almost double that of other chromosomes (31 and 26 kb, for chromosomes 6 and 22, respectively). As a result, the 633 genes cover 37% of the sequence, which is only slightly lower than that of the other finished, annotated autosomes ().
Comparison of chromosome 13 features with those of other sequenced autosomes
Gene density varies considerably along the chromosome, as do other characteristics of the gene-rich and gene-poor regions (see and Supplementary Fig. 1). Here we define ‘gene-rich’ as containing more than 15 genes per Mb, and ‘gene-poor’ as containing fewer than 5 genes per Mb. A detailed picture of the characteristics of an example of each regional class is shown in . A comparison of the two regions can be seen in . There is a 37.8-Mb region, from 52.9–90.7 Mb, where the average gene density drops to 3 genes per Mb. This region actually comprises two gene-poor segments (52.9–71.9 Mb and 78.9–89.9 Mb) flanking a section with a gene density of 7 Mb-1 (). The first gene-poor section contains a 3-Mb region with no genes (53.9–56.9 Mb).
Figure 1 Genetic and physical characteristics of the chromosome 13 sequence. a, A comparison of physical and genetic distance along chromosome 13. Markers from the deCODE genetic map8 were localized in the sequence. Their locations in the genetic map are plotted (more ...)
Figure 2 Characteristics of a gene-rich (a) and a gene-poor (b) region. The overlapping tilepath, labelled by accession number, is shown in yellow. Genetic markers from the deCODE map have been positioned on the sequence. Occurrences of repeats are shown as vertical (more ...)
Comparison of the gene-rich (17.9–21 Mb) and gene-poor (53–56 Mb) regions
The two major gene-rich areas are at either end of the q arm, with 90 of the predicted coding genes lying within 3 Mb of either end of the euchromatic region. As observed with other completed human chromosomes, gene-poor regions have a low G+C content, a low SINE coverage and a high LINE coverage, relative to genome averages. These trends are reversed for the gene-rich regions. shows that between positions 90 and 100 Mb, although there are very few genes, a large percentage of the region is covered by transcribed gene structures (exons plus introns). This region contains the largest gene on the chromosome (GPC5) as well as two others (GPC6 and HS6ST3) each covering over 500 kb. In sharp contrast to the protein-coding genes, the ncRNA genes are distributed evenly between the gene-rich and gene-poor regions (see ).
A total of 96,894 SNPs, from the dbSNP database, were mapped onto the sequence. The coding regions of the annotated genes contain 654 SNPs (1 SNP per 1.6 kb). These can be subdivided into 345 synonymous and 309 non-synonymous cSNPs. To analyse the overall distribution along the chromosome, a subset (38,069) of SNPs identified previously by alignment of random shotgun sequence to the draft sequence were plotted separately (see and Supplementary Fig. 1). From this distribution plot, there is no obvious difference in the variation rate between the gene-rich and gene-poor areas. There is one region between positions 18.0 and 18.4 Mb (Figs and ), where the SNP density is substantially higher than the average, reaching one SNP per 0.3 kb (1,329 SNPs in 400 kb). There is a known duplication with chromosome 21 in this region, and it is possible that the apparently high SNP density is due to the presence of paralogous sequence variants, as has been suggested previously16
Around 5% of the human genome may be accounted for by segmental duplications, and this may play an important role in genetic disease and genome evolution19,20
. A study by Cheung and colleagues19
, which classified regions as segmental duplications if they show at least 90% homology over a minimum of 5 kb, suggested that there is approximately 1.8 Mb of duplicated sequence on chromosome 13. This sequence comprises 0.9 Mb of intrachromosomal and 1.2 Mb of interchromosomal duplications20
. This includes 0.3 Mb of sequence common to both categories. For example, the TPIP
gene on chromosome 13 has undergone both inter- and intrachromosomal duplications. Guipponi and colleagues21
described phylogenetic analysis of the TPTE
gene family, of which TPIP
is a member, and suggested that all family members originate from a common ancestor since the divergence of human and mouse lineages, because there is only a single Tpte
gene in mouse. There are seven other genes surrounding the mouse Tpte
gene on chromosome 8, each of which has a functional homologue on human chromosome 13. This observation and the fact that four of the genes have homology only to chromosome 13 suggest that the human orthologue of the Tpte
gene lies on chromosome 13. There have been a number of duplication events resulting in one functional copy and four pseudogenes of TPIP
on chromosome 13. In addition, there is another functional member of the gene family, TPTE
, on chromosome 21 and a number of pseudogenes on chromosomes 3, 15, 22 and Y.