PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-7 (7)
 

Clipboard (0)
None

Select a Filter Below

Journals
Authors
more »
Year of Publication
Document Types
1.  Error rate for imputation from the Illumina BovineSNP50 chip to the Illumina BovineHD chip 
Background
Imputation of genotypes from low-density to higher density chips is a cost-effective method to obtain high-density genotypes for many animals, based on genotypes of only a relatively small subset of animals (reference population) on the high-density chip. Several factors influence the accuracy of imputation and our objective was to investigate the effects of the size of the reference population used for imputation and of the imputation method used and its parameters. Imputation of genotypes was carried out from 50 000 (moderate-density) to 777 000 (high-density) SNPs (single nucleotide polymorphisms).
Methods
The effect of reference population size was studied in two datasets: one with 548 and one with 1289 Holstein animals, genotyped with the Illumina BovineHD chip (777 k SNPs). A third dataset included the 548 animals genotyped with the 777 k SNP chip and 2200 animals genotyped with the Illumina BovineSNP50 chip. In each dataset, 60 animals were chosen as validation animals, for which all high-density genotypes were masked, except for the Illumina BovineSNP50 markers. Imputation was studied in a subset of six chromosomes, using the imputation software programs Beagle and DAGPHASE.
Results
Imputation with DAGPHASE and Beagle resulted in 1.91% and 0.87% allelic imputation error rates in the dataset with 548 high-density genotypes, when scale and shift parameters were 2.0 and 0.1, and 1.0 and 0.0, respectively. When Beagle was used alone, the imputation error rate was 0.67%. If the information obtained by Beagle was subsequently used in DAGPHASE, imputation error rates were slightly higher (0.71%). When 2200 moderate-density genotypes were added and Beagle was used alone, imputation error rates were slightly lower (0.64%). The least imputation errors were obtained with Beagle in the reference set with 1289 high-density genotypes (0.41%).
Conclusions
For imputation of genotypes from the 50 k to the 777 k SNP chip, Beagle gave the lowest allelic imputation error rates. Imputation error rates decreased with increasing size of the reference population. For applications for which computing time is limiting, DAGPHASE using information from Beagle can be considered as an alternative, since it reduces computation time and increases imputation error rates only slightly.
doi:10.1186/1297-9686-46-10
PMCID: PMC3929158  PMID: 24495554
2.  Trans-Generational Effect of Maternal Lactation during Pregnancy: A Holstein Cow Model 
PLoS ONE  2012;7(12):e51816.
Epigenetic regulation in mammals begins in the first stages of embryogenesis. This prenatal programming determines, in part, phenotype expression in adult life. Some species, particularly dairy cattle, are conceived during the maternal lactation, which is a period of large energy and nutrient needs. Under these circumstances, embryo and fetal development compete for nutrients with the mammary gland, which may affect prenatal programming and predetermine phenotype at adulthood. Data from a specialized dairy breed were used to determine the transgenerational effect when embryo development coincides with maternal lactation. Longitudinal phenotypic data for milk yield (kg), ratio of fat-protein content in milk during first lactation, and lifespan (d) from 40,065 cows were adjusted for environmental and genetic effects using a Bayesian framework. Then, the effect of different maternal circumstances was determined on the residuals. The maternal-related circumstances were 1) presence of lactation, 2) maternal milk yield level, and 3) occurrence of mastitis during embryogenesis. Females born to mothers that were lactating while pregnant produced 52 kg (MonteCarlo standard error; MCs.e. = 0.009) less milk, lived 16 d (MCs.e. = 0.002) shorter and were metabolically less efficient (+0.42% milk fat/protein ratio; MCs.e.<0.001) than females whose fetal life developed in the absence of maternal lactation. The greater the maternal milk yield during embryogenesis, the larger the negative effects of prenatal programming, precluding the offspring born to the most productive cows to fully express their potential additive genetic merit during their adult life. Our data provide substantial evidence of transgenerational effect when pregnancy and lactation coincide. Although this effect is relatively low, it should not be ignored when formulating rations for lactating and pregnant cows. Furthermore, breeding, replacement, and management strategies should also take into account whether the individuals were conceived during maternal lactation because, otherwise, their performance may deviate from what it could be expected.
doi:10.1371/journal.pone.0051816
PMCID: PMC3527476  PMID: 23284777
3.  Epigenetics: A New Challenge in the Post-Genomic Era of Livestock 
Frontiers in Genetics  2012;2:106.
doi:10.3389/fgene.2011.00106
PMCID: PMC3270332  PMID: 22303400
4.  Genome-wide prediction of discrete traits using bayesian regressions and machine learning 
Background
Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates) small n (number of observations) problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance). It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context.
Methods
This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO) and two machine learning algorithms (boosting and random forest) to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models' predictive ability.
Results
The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data.
Conclusions
The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different alternatives proposed to analyze discrete traits, machine-learning showed some advantages over Bayesian regressions. Boosting with a pseudo Huber loss function showed high accuracy, whereas Random Forest produced more consistent results and an interesting predictive ability. Nonetheless, the best method may be case-dependent and a initial evaluation of different methods is recommended to deal with a particular problem.
doi:10.1186/1297-9686-43-7
PMCID: PMC3400433  PMID: 21329522
5.  Detecting single-nucleotide polymorphism by single-nucleotide polymorphism interactions in rheumatoid arthritis using a two-step approach with machine learning and a Bayesian threshold least absolute shrinkage and selection operator (LASSO) model 
BMC Proceedings  2009;3(Suppl 7):S63.
The objective of this study was to detect interactions between relevant single-nucleotide polymorphisms (SNPs) associated with rheumatoid arthritis (RA). Data from Problem 1 of the Genetic Analysis Workshop 16 were used. These data consisted of 868 cases and 1,194 controls genotyped with the 500 k Illumina chip. First, machine learning methods were applied for preselecting SNPs. One hundred SNPs outside the HLA region and 1,500 SNPs in the HLA region were preselected using information-gain theory. The software weka was used to reduce colinearity and redundancy in the HLA region, resulting in a subset of 6 SNPs out of 1,500. In a second step, a parametric approach to account for interactions between SNPs in the HLA region, as well as HLA-nonHLA interactions was conducted using a Bayesian threshold least absolute shrinkage and selection operator (LASSO) model incorporating 2,560 covariates. This approach detected some main and interaction effects for SNPs in genes that have previously been associated with RA (e.g., rs2395175, rs660895, rs10484560, and rs2476601). Further, some other SNPs detected in this study may be considered in candidate gene studies.
PMCID: PMC2795964  PMID: 20018057
6.  Genome-assisted prediction of a quantitative trait measured in parents and progeny: application to food conversion rate in chickens 
Accuracy of prediction of yet-to-be observed phenotypes for food conversion rate (FCR) in broilers was studied in a genome-assisted selection context. Data consisted of FCR measured on the progeny of 394 sires with SNP information. A Bayesian regression model (Bayes A) and a semi-parametric approach (Reproducing kernel Hilbert Spaces regression, RKHS) using all available SNPs (p = 3481) were compared with a standard linear model in which future performance was predicted using pedigree indexes in the absence of genomic data. The RKHS regression was also tested on several sets of pre-selected SNPs (p = 400) using alternative measures of the information gain provided by the SNPs. All analyses were performed using 333 genotyped sires as training set, and predictions were made on 61 birds as testing set, which were sons of sires in the training set. Accuracy of prediction was measured as the Spearman correlation (r¯S) between observed and predicted phenotype, with its confidence interval assessed through a bootstrap approach. A large improvement of genome-assisted prediction (up to an almost 4-fold increase in accuracy) was found relative to pedigree index. Bayes A and RKHS regression were equally accurate (r¯S = 0.27) when all 3481 SNPs were included in the model. However, RKHS with 400 pre-selected informative SNPs was more accurate than Bayes A with all SNPs.
doi:10.1186/1297-9686-41-3
PMCID: PMC3225922  PMID: 19284693
7.  Genetic relationship of discrete-time survival with fertility and production in dairy cattle using bivariate models 
Bivariate analyses of functional longevity in dairy cattle measured as survival to next lactation (SURV) with milk yield and fertility traits were carried out. A sequential threshold-linear censored model was implemented for the analyses of SURV. Records on 96 642 lactations from 41 170 cows were used to estimate genetic parameters, using animal models, for longevity, 305 d-standardized milk production (MY305), days open (DO) and number of inseminations to conception (INS) in the Spanish Holstein population; 31% and 30% of lactations were censored for DO and INS, respectively. Heritability estimates for SURV and MY305 were 0.11 and 0.27 respectively; while heritability estimates for fertility traits were lower (0.07 for DO and 0.03 for INS). Antagonist genetic correlations were estimated between SURV and fertility (-0.78 and -0.54 for DO and INS, respectively) or production (-0.53 for MY305), suggesting reduced functional longevity with impaired fertility and increased milk production. Longer days open seems to affect survival more than increased INS. Also, high productive cows were more problematic, less functional and more liable to being culled. The results suggest that the sequential threshold model is a method that might be considered at evaluating genetic relationship between discrete-time survival and other traits, due to its flexibility.
doi:10.1186/1297-9686-39-4-391
PMCID: PMC2682818  PMID: 17612479
discrete-time survival; fertility; production; sequential threshold model

Results 1-7 (7)