1.  Recent artificial selection in U.S. Jersey cattle impacts autozygosity levels of specific genomic regions 
BMC Genomics  2015;16(1):302.
Genome signatures of artificial selection in U.S. Jersey cattle were identified by examining changes in haplotype homozygosity for a resource population of animals born between 1953 and 2007. Genetic merit of this population changed dramatically during this period for a number of traits, especially milk yield. The intense selection underlying these changes was achieved through extensive use of artificial insemination (AI), which also increased consanguinity of the population to a few superior Jersey bulls. As a result, allele frequencies are shifted for many contemporary animals, and in numerous cases to a homozygous state for specific genomic regions. The goal of this study was to identify those selection signatures that occurred after extensive use of AI since the 1960, using analyses of shared haplotype segments or Runs of Homozygosity. When combined with animal birth year information, signatures of selection associated with economically important traits were identified and compared to results from an extended haplotype homozygosity analysis.
Overall, our results reveal that more recent selection increased autozygosity across the entire genome, but some specific regions increased more than others. A genome-wide scan identified more than 15 regions with a substantial change in autozygosity. Haplotypes found to be associated with increased milk, fat and protein yield in U.S. Jersey cattle also consistently increased in frequency.
The analyses used in this study was able to detect directional selection over the last few decades when individual production records for Jersey animals were available.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1500-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4409734  PMID: 25887761
SNP; Runs of homozygosity; Signatures of selection; Jersey cattle
2.  Genome wide CNV analysis reveals additional variants associated with milk production traits in Holsteins 
BMC Genomics  2014;15(1):683.
Milk production is an economically important sector of global agriculture. Much attention has been paid to the identification of quantitative trait loci (QTL) associated with milk, fat, and protein yield and the genetic and molecular mechanisms underlying them. Copy number variation (CNV) is an emerging class of variants which may be associated with complex traits.
In this study, we performed a genome-wide association between CNVs and milk production traits in 26,362 Holstein bulls and cows. A total of 99 candidate CNVs were identified using Illumina BovineSNP50 array data, and association tests for each production trait were performed using a linear regression analysis with PCA correlation. A total of 34 CNVs on 22 chromosomes were significantly associated with at least one milk production trait after false discovery rate (FDR) correction. Some of those CNVs were located within or near known QTL for milk production traits. We further investigated the relationship between associated CNVs with neighboring SNPs. For all 82 combinations of traits and CNVs (less than 400 kb in length), we found 17 cases where CNVs directly overlapped with tag SNPs and 40 cases where CNVs were adjacent to tag SNPs. In 5 cases, CNVs located were in strong linkage disequilibrium with tag SNPs, either within or adjacent to the same haplotype block. There were an additional 20 cases where CNVs did not have a significant association with SNPs, suggesting that the effects of those CNVs were probably not captured by tag SNPs.
We conclude that combining CNV with SNP analyses reveals more genetic variations underlying milk production traits than those revealed by SNPs alone.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-683) contains supplementary material, which is available to authorized users.
PMCID: PMC4152564  PMID: 25128478
Copy number variation (CNV); dPTA; Association; Milk production traits
3.  Genomic divergence of zebu and taurine cattle identified through high-density SNP genotyping 
BMC Genomics  2013;14(1):876.
Natural selection has molded evolution across all taxa. At an arguable date of around 330,000 years ago there were already at least two different types of cattle that became ancestors of nearly all modern cattle, the Bos taurus taurus more adapted to temperate climates and the tropically adapted Bos taurus indicus. After domestication, human selection exponentially intensified these differences. To better understand the genetic differences between these subspecies and detect genomic regions potentially under divergent selection, animals from the International Bovine HapMap Experiment were genotyped for over 770,000 SNP across the genome and compared using smoothed FST. The taurine sample was represented by ten breeds and the contrasting zebu cohort by three breeds.
Each cattle group evidenced similar numbers of polymorphic markers well distributed across the genome. Principal components analyses and unsupervised clustering confirmed the well-characterized main division of domestic cattle. The top 1% smoothed FST, potentially associated to positive selection, contained 48 genomic regions across 17 chromosomes. Nearly half of the top FST signals (n = 22) were previously detected using a lower density SNP assay. Amongst the strongest signals were the BTA7:~50 Mb and BTA14:~25 Mb; both regions harboring candidate genes and different patterns of linkage disequilibrium that potentially represent intrinsic differences between cattle types. The bottom 1% of the smoothed FST values, potentially associated to balancing selection, included 24 regions across 13 chromosomes. These regions often overlap with copy number variants, including the highly variable region at BTA23:~24 Mb that harbors a large number of MHC genes. Under these regions, 318 unique Ensembl genes are annotated with a significant overrepresentation of immune related pathways.
Genomic regions that are potentially linked to purifying or balancing selection processes in domestic cattle were identified. These regions are of particular interest to understand the natural and human selective pressures to which these subspecies were exposed to and how the genetic background of these populations evolved in response to environmental challenges and human manipulation.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-14-876) contains supplementary material, which is available to authorized users.
PMCID: PMC4046821  PMID: 24330634
Bos; Taurus; Indicus; FST; Selection; Speciation
4.  Effect of sample stratification on dairy GWAS results 
BMC Genomics  2012;13:536.
Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach.
Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10–15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA’s Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows.
Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection.
PMCID: PMC3496570  PMID: 23039970
5.  Whole genome SNP discovery and analysis of genetic diversity in Turkey (Meleagris gallopavo) 
BMC Genomics  2012;13:391.
The turkey (Meleagris gallopavo) is an important agricultural species and the second largest contributor to the world’s poultry meat production. Genetic improvement is attributed largely to selective breeding programs that rely on highly heritable phenotypic traits, such as body size and breast muscle development. Commercial breeding with small effective population sizes and epistasis can result in loss of genetic diversity, which in turn can lead to reduced individual fitness and reduced response to selection. The presence of genomic diversity in domestic livestock species therefore, is of great importance and a prerequisite for rapid and accurate genetic improvement of selected breeds in various environments, as well as to facilitate rapid adaptation to potential changes in breeding goals. Genomic selection requires a large number of genetic markers such as e.g. single nucleotide polymorphisms (SNPs) the most abundant source of genetic variation within the genome.
Alignment of next generation sequencing data of 32 individual turkeys from different populations was used for the discovery of 5.49 million SNPs, which subsequently were used for the analysis of genetic diversity among the different populations. All of the commercial lines branched from a single node relative to the heritage varieties and the South Mexican turkey population. Heterozygosity of all individuals from the different turkey populations ranged from 0.17-2.73 SNPs/Kb, while heterozygosity of populations ranged from 0.73-1.64 SNPs/Kb. The average frequency of heterozygous SNPs in individual turkeys was 1.07 SNPs/Kb. Five genomic regions with very low nucleotide variation were identified in domestic turkeys that showed state of fixation towards alleles different than wild alleles.
The turkey genome is much less diverse with a relatively low frequency of heterozygous SNPs as compared to other livestock species like chicken and pig. The whole genome SNP discovery study in turkey resulted in the detection of 5.49 million putative SNPs compared to the reference genome. All commercial lines appear to share a common origin. Presence of different alleles/haplotypes in the SM population highlights that specific haplotypes have been selected in the modern domesticated turkey.
PMCID: PMC3496629  PMID: 22891612
6.  Fine mapping of copy number variations on two cattle genome assemblies using high density SNP array 
BMC Genomics  2012;13:376.
Btau_4.0 and UMD3.1 are two distinct cattle reference genome assemblies. In our previous study using the low density BovineSNP50 array, we reported a copy number variation (CNV) analysis on Btau_4.0 with 521 animals of 21 cattle breeds, yielding 682 CNV regions with a total length of 139.8 megabases.
In this study using the high density BovineHD SNP array, we performed high resolution CNV analyses on both Btau_4.0 and UMD3.1 with 674 animals of 27 cattle breeds. We first compared CNV results derived from these two different SNP array platforms on Btau_4.0. With two thirds of the animals shared between studies, on Btau_4.0 we identified 3,346 candidate CNV regions representing 142.7 megabases (~4.70%) of the genome. With a similar total length but 5 times more event counts, the average CNVR length of current Btau_4.0 dataset is significantly shorter than the previous one (42.7 kb vs. 205 kb). Although subsets of these two results overlapped, 64% (91.6 megabases) of current dataset was not present in the previous study. We also performed similar analyses on UMD3.1 using these BovineHD SNP array results. Approximately 50% more and 20% longer CNVs were called on UMD3.1 as compared to those on Btau_4.0. However, a comparable result of CNVRs (3,438 regions with a total length 146.9 megabases) was obtained. We suspect that these results are due to the UMD3.1 assembly's efforts of placing unplaced contigs and removing unmerged alleles. Selected CNVs were further experimentally validated, achieving a 73% PCR validation rate, which is considerably higher than the previous validation rate. About 20-45% of CNV regions overlapped with cattle RefSeq genes and Ensembl genes. Panther and IPA analyses indicated that these genes provide a wide spectrum of biological processes involving immune system, lipid metabolism, cell, organism and system development.
We present a comprehensive result of cattle CNVs at a higher resolution and sensitivity. We identified over 3,000 candidate CNV regions on both Btau_4.0 and UMD3.1, further compared current datasets with previous results, and examined the impacts of genome assemblies on CNV calling.
PMCID: PMC3583728  PMID: 22866901
Cattle genome; Breed; Copy number variation (CNV); Single nucleotide polymorphism (SNP)
7.  Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary U.S. Holstein cows 
BMC Genomics  2011;12:408.
Genome-wide association analysis is a powerful tool for annotating phenotypic effects on the genome and knowledge of genes and chromosomal regions associated with dairy phenotypes is useful for genome and gene-based selection. Here, we report results of a genome-wide analysis of predicted transmitting ability (PTA) of 31 production, health, reproduction and body conformation traits in contemporary Holstein cows.
Genome-wide association analysis identified a number of candidate genes and chromosome regions associated with 31 dairy traits in contemporary U.S. Holstein cows. Highly significant genes and chromosome regions include: BTA13's GNAS region for milk, fat and protein yields; BTA7's INSR region and BTAX's LOC520057 and GRIA3 for daughter pregnancy rate, somatic cell score and productive life; BTA2's LRP1B for somatic cell score; BTA14's DGAT1-NIBP region for fat percentage; BTA1's FKBP2 for protein yields and percentage, BTA26's MGMT and BTA6's PDGFRA for protein percentage; BTA18's 53.9-58.7 Mb region for service-sire and daughter calving ease and service-sire stillbirth; BTA18's PGLYRP1-IGFL1 region for a large number of traits; BTA18's LOC787057 for service-sire stillbirth and daughter calving ease; BTA15's CD82, BTA23's DST and the MOCS1-LRFN2 region for daughter stillbirth; and BTAX's LOC520057 and GRIA3 for daughter pregnancy rate. For body conformation traits, BTA11, BTAX, BTA10, BTA5, and BTA26 had the largest concentrations of SNP effects, and PHKA2 of BTAX and REN of BTA16 had the most significant effects for body size traits. For body shape traits, BTAX, BTA19 and BTA3 were most significant. Udder traits were affected by BTA16, BTA22, BTAX, BTA2, BTA10, BTA11, BTA20, BTA22 and BTA25, teat traits were affected by BTA6, BTA7, BTA9, BTA16, BTA11, BTA26 and BTA17, and feet/legs traits were affected by BTA11, BTA13, BTA18, BTA20, and BTA26.
Genome-wide association analysis identified a number of genes and chromosome regions associated with 31 production, health, reproduction and body conformation traits in contemporary Holstein cows. The results provide useful information for annotating phenotypic effects on the dairy genome and for building consensus of dairy QTL effects.
PMCID: PMC3176260  PMID: 21831322
8.  Genomic characteristics of cattle copy number variations 
BMC Genomics  2011;12:127.
Copy number variation (CNV) represents another important source of genetic variation complementary to single nucleotide polymorphism (SNP). High-density SNP array data have been routinely used to detect human CNVs, many of which have significant functional effects on gene expression and human diseases. In the dairy industry, a large quantity of SNP genotyping results are becoming available and can be used for CNV discovery to understand and accelerate genetic improvement for complex traits.
We performed a systematic analysis of CNV using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the pedigree information, we identified 682 candidate CNV regions, which represent 139.8 megabases (~4.60%) of the genome. Selected CNVs were further experimentally validated and we found that copy number "gain" CNVs were predominantly clustered in tandem rather than existing as interspersed duplications. Many CNV regions (~56%) overlap with cattle genes (1,263), which are significantly enriched for immunity, lactation, reproduction and rumination. The overlap of this new dataset and other published CNV studies was less than 40%; however, our discovery of large, high frequency (> 5% of animals surveyed) CNV regions showed 90% agreement with other studies. These results highlight the differences and commonalities between technical platforms.
We present a comprehensive genomic analysis of cattle CNVs derived from SNP data which will be a valuable genomic variation resource. Combined with SNP detection assays, gene-containing CNV regions may help identify genes undergoing artificial selection in domesticated animals.
PMCID: PMC3053260  PMID: 21345189
9.  A deletion mutation in bovine SLC4A2 is associated with osteopetrosis in Red Angus cattle 
BMC Genomics  2010;11:337.
Osteopetrosis is a skeletal disorder of humans and animals characterized by the formation of overly dense bones, resulting from a deficiency in the number and/or function of bone-resorbing osteoclast cells. In cattle, osteopetrosis can either be induced during gestation by viral infection of the dam, or inherited as a recessive defect. Genetically affected calves are typically aborted late in gestation, display skull deformities and exhibit a marked reduction of osteoclasts. Although mutations in several genes are associated with osteopetrosis in humans and mice, the genetic basis of the cattle disorder was previously unknown.
We have conducted a whole-genome association analysis to identify the mutation responsible for inherited osteopetrosis in Red Angus cattle. Analysis of >54,000 SNP genotypes for each of seven affected calves and nine control animals localized the defective gene to the telomeric end of bovine chromosome 4 (BTA4). Homozygosity analysis refined the interval to a 3.4-Mb region containing the SLC4A2 gene, encoding an anion exchanger protein necessary for proper osteoclast function. Examination of SLC4A2 from normal and affected animals revealed a ~2.8-kb deletion mutation in affected calves that encompasses exon 2 and nearly half of exon 3, predicted to prevent normal protein function. Analysis of RNA from a proven heterozygous individual confirmed the presence of transcripts lacking exons 2 and 3, in addition to normal transcripts. Genotyping of additional animals demonstrated complete concordance of the homozygous deletion genotype with the osteopetrosis phenotype. Histological examination of affected tissues revealed scarce, morphologically abnormal osteoclasts displaying evidence of apoptosis.
These results indicate that a deletion mutation within bovine SLC4A2 is associated with osteopetrosis in Red Angus cattle. Loss of SLC4A2 function appears to induce premature cell death, and likely results in cytoplasmic alkalinization of osteoclasts which, in turn, may disrupt acidification of resorption lacunae.
PMCID: PMC2891616  PMID: 20507629
10.  MicroRNA transcriptome profiles during swine skeletal muscle development 
BMC Genomics  2009;10:77.
MicroRNA (miR) are a class of small RNAs that regulate gene expression by inhibiting translation of protein encoding transcripts. To evaluate the role of miR in skeletal muscle of swine, global microRNA abundance was measured at specific developmental stages including proliferating satellite cells, three stages of fetal growth, day-old neonate, and the adult.
Twelve potential novel miR were detected that did not match previously reported sequences. In addition, a number of miR previously reported to be expressed in mammalian muscle were detected, having a variety of abundance patterns through muscle development. Muscle-specific miR-206 was nearly absent in proliferating satellite cells in culture, but was the highest abundant miR at other time points evaluated. In addition, miR-1 was moderately abundant throughout developmental stages with highest abundance in the adult. In contrast, miR-133 was moderately abundant in adult muscle and either not detectable or lowly abundant throughout fetal and neonate development. Changes in abundance of ubiquitously expressed miR were also observed. MiR-432 abundance was highest at the earliest stage of fetal development tested (60 day-old fetus) and decreased throughout development to the adult. Conversely, miR-24 and miR-27 exhibited greatest abundance in proliferating satellite cells and the adult, while abundance of miR-368, miR-376, and miR-423-5p was greatest in the neonate.
These data present a complete set of transcriptome profiles to evaluate miR abundance at specific stages of skeletal muscle growth in swine. Identification of these miR provides an initial group of miR that may play a vital role in muscle development and growth.
PMCID: PMC2646747  PMID: 19208255
11.  Quality assessment parameters for EST-derived SNPs from catfish 
BMC Genomics  2008;9:450.
SNPs are abundant, codominantly inherited, and sequence-tagged markers. They are highly adaptable to large-scale automated genotyping, and therefore, are most suitable for association studies and applicable to comparative genome analysis. However, discovery of SNPs requires genome sequencing efforts through whole genome sequencing or deep sequencing of reduced representation libraries. Such genome resources are not yet available for many species including catfish. A large resource of ESTs is to become available in catfish allowing identification of large number of SNPs, but reliability of EST-derived SNPs are relatively low because of sequencing errors. This project was designed to answer some of the questions relevant to quality assessment of EST-derived SNPs.
wo factors were found to be most significant for validation of EST-derived SNPs: the contig size (number of sequences in the contig) and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contigs contain four or more EST sequences with the minor allele sequence being represented at least twice in the contigs. Sequence quality surrounding the SNP under test is also crucially important. PCR extension appeared to be limited to a very short distance, prohibiting successful genotyping when an intron was present, a surprising finding.
Stringent quality assessment measures should be used when working with EST-derived SNPs. In particular, contigs containing four or more ESTs should be used and the minor allele sequence should be represented at least twice. Genotyping primers should be designed from a single exon, completely avoiding introns. Application of such quality assessment measures, along with large resources of ESTs, should provide effective means for SNP identification in species where genome sequence resources are lacking.
PMCID: PMC2570692  PMID: 18826589
12.  Effects of increased milking frequency on gene expression in the bovine mammary gland 
BMC Genomics  2008;9:362.
Previous research has demonstrated that increased milking frequency of dairy cattle during the first few weeks of lactation enhances milk yield, and that the effect persists throughout the entire lactation period. The specific mechanisms controlling this increase in milk production are unknown, but suggested pathways include increased mammary epithelial cell number, secretory capacity, and sensitivity to lactogenic hormones. We used serial analysis of gene expression (SAGE) and microarray analysis to identify changes in gene expression in the bovine mammary gland in response to 4× daily milking beginning at d 4 of lactation (IMF4) relative to glands milked 2× daily (Control) to gain insight into physiological changes occurring within the gland during more frequent milking.
Results indicated changes in gene expression related to cell proliferation and differentiation, extracellular matrix (ECM) remodeling, metabolism, nutrient transport, and immune function in IMF4 versus Control cows. In addition, pathways expected to promote neovascularization within the gland appeared to be up regulated in IMF4 cows. To validate this finding, immunolocalization of Von Willebrandt's factor (VWF), an endothelial cell marker, and its co-localization with the nuclear proliferation antigen Ki67 were evaluated in mammary tissue sections at approximately d 7 and d 14 of lactation in cows milked 4× daily versus Controls to estimate endothelial cell abundance and proliferation within the gland. Consistent with expression of genes related to neovascularization, both abundance of VWF and its co-localization with Ki67 appeared to be elevated in cows milked 4× daily, suggesting persistent increased milk yield in response to increased milking frequency may be mediated or complemented by enhanced mammary ECM remodeling and neovascularization within the gland.
Additional study is needed to determine whether changes in ECM remodeling and neovascularization of the mammary gland result in increased milk yield during increased milking frequency, or occur in response to an increased demand for milk production. Gene pathways identified by the current study will provide a basis for future investigations to identify factors mediating the effects of milking frequency on milk yield.
PMCID: PMC2518935  PMID: 18671851
13.  A second generation radiation hybrid map to aid the assembly of the bovine genome sequence 
BMC Genomics  2006;7:283.
Several approaches can be used to determine the order of loci on chromosomes and hence develop maps of the genome. However, all mapping approaches are prone to errors either arising from technical deficiencies or lack of statistical support to distinguish between alternative orders of loci. The accuracy of the genome maps could be improved, in principle, if information from different sources was combined to produce integrated maps. The publicly available bovine genomic sequence assembly with 6× coverage (Btau_2.0) is based on whole genome shotgun sequence data and limited mapping data however, it is recognised that this assembly is a draft that contains errors. Correcting the sequence assembly requires extensive additional mapping information to improve the reliability of the ordering of sequence scaffolds on chromosomes. The radiation hybrid (RH) map described here has been contributed to the international sequencing project to aid this process.
An RH map for the 30 bovine chromosomes is presented. The map was built using the Roslin 3000-rad RH panel (BovGen RH map) and contains 3966 markers including 2473 new loci in addition to 262 amplified fragment-length polymorphisms (AFLP) and 1231 markers previously published with the first generation RH map. Sequences of the mapped loci were aligned with published bovine genome maps to identify inconsistencies. In addition to differences in the order of loci, several cases were observed where the chromosomal assignment of loci differed between maps. All the chromosome maps were aligned with the current 6× bovine assembly (Btau_2.0) and 2898 loci were unambiguously located in the bovine sequence. The order of loci on the RH map for BTA 5, 7, 16, 22, 25 and 29 differed substantially from the assembled bovine sequence. From the 2898 loci unambiguously identified in the bovine sequence assembly, 131 mapped to different chromosomes in the BovGen RH map.
Alignment of the BovGen RH map with other published RH and genetic maps showed higher consistency in marker order and chromosome assignment than with the current 6× sequence assembly. This suggests that the bovine sequence assembly could be significantly improved by incorporating additional independent mapping information.
PMCID: PMC1636650  PMID: 17087818
14.  Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences 
BMC Genomics  2006;7:140.
Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages.
Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence) were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site) for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9) change/site/year) was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9) change/site/year) was approximately half of the overall rate (1.9–2.0 × 10(-9) change/site/year). Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%.
This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.
PMCID: PMC1525190  PMID: 16759380
15.  Characterization of 954 bovine full-CDS cDNA sequences 
BMC Genomics  2005;6:166.
Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Functional genomics studies also rely on transcript sequence to create expression microarrays or interpret digital tag data produced by methods such as Serial Analysis of Gene Expression (SAGE). Transcript sequence can be predicted based on reconstruction from overlapping expressed sequence tags (EST) that are obtained by single-pass sequencing of random cDNA clones, but these reconstructions are prone to errors caused by alternative splice forms, transcripts from gene families with related sequences, and expressed pseudogenes. These errors confound genome assembly and annotation. The most useful transcript sequences are derived by complete insert sequencing of clones containing the entire length, or at least the full protein coding sequence (CDS) portion, of the source mRNA. While the bovine genome sequencing initiative is nearing completion, there is currently a paucity of bovine full-CDS mRNA and protein sequence data to support bovine genome assembly and functional genomics studies. Consequently, the production of high-quality bovine full-CDS cDNA sequences will enhance the bovine genome assembly and functional studies of bovine genes and gene products. The goal of this investigation was to identify and characterize the full-CDS sequences of bovine transcripts from clones identified in non-full-length enriched cDNA libraries. In contrast to several recent full-length cDNA investigations, these full-CDS cDNAs were selected, sequenced, and annotated without the benefit of the target organism's genomic sequence, by using comparison of bovine EST sequence to existing human mRNA to identify likely full-CDS clones for full-length insert cDNA (FLIC) sequencing.
The predicted bovine protein lengths, 5' UTR lengths, and Kozak consensus sequences from 954 bovine FLIC sequences (bFLICs; average length 1713 nt, representing 762 distinct loci) are all consistent with previously sequenced mammalian full-length transcripts.
In most cases, the bFLICs span the entire CDS of the genes, providing the basis for creating predicted bovine protein sequences to support proteomics and comparative evolutionary research as well as functional genomics and genome annotation. The results demonstrate the utility of the comparative approach in obtaining predicted protein sequences in other species.
PMCID: PMC1314900  PMID: 16305752

