Gene density has been shown to vary widely by organism and genomic region and has been measured both in terms of mean interval between genes and genes per mega base pair of DNA [1
]. It is known that gene density is positively correlated with G+C content [2
] and that the heterochromatic regions surrounding centromeres and telomeres have a lower than average gene density [3
]. In general, measurements of gene density have focused on correlations of gene density with chromosomal structure or base composition [2
]. However, to our knowledge no one has looked at the correlation of gene density with Quantitative Trait Locus (QTL) density over the genome. Furthermore, gene density on its own is a fairly crude measurement of the functional role of specific genomic domains. It would be more informative to combine this with quantitative information about the types of gene annotations found across the genome, but to date this has not been done. In this report we describe the correlation of gene density with chromosomal regions defined on the basis of their association with phenotypic traits (QTL regions) and we have determined if gene annotations associated with the phenotypes in question are over represented in these same regions. Our model system is the bovine genome because it has a wealth of well annotated QTL [7
] and gene models that have been anchored to a high quality draft genome sequence assembly.
While quantifying gene annotations on the basis of gene descriptions is virtually impossible, quantitative distributions of gene function can be determined on the basis of Gene Ontology (GO) term annotations [8
]. A gene ontology is a controlled vocabulary within a structured hierarchy that describes gene products in a species independent manner. For us, GO terms provide a straightforward link from gene coordinates to phenotype. Gene ontologies have been used in many ways for the quantitative analysis of gene expression profiles, for gene set analysis and for general annotation analyses [9
]. From our perspective, identifying over represented GO terms can provide insight into regional genomic function, and while statistical methods of measuring GO term distribution vary, we have adopted a commonly used method based on the hypergeometric distribution [12
Until now, GO term analysis performed on the bovine genome has focused on very specific gene expression analysis [13
]. We have carried out the first genome wide analysis of GO term use correlated with genomic regions known to control quantitatively regulated phenotypes (QTL). One of the challenges of mining GO terms is the large number of GO terms that are often not grouped very tightly by phenotype. One way of overcoming this problem is to use a GO slim, which is a cut down version of the GO. A GO slim contains a subset of terms in the whole GO and facilitates research by streamlining the ontologies for specific areas of interest [8
]. At the time we undertook this study there was no bovine GO slim, so we have created our own for this analysis and have deposited it with the GO consortium.