Search tips
Search criteria

Results 1-25 (166)

Clipboard (0)
Year of Publication
Document Types
1.  Mapping a gene for rheumatoid arthritis on chromosome 18q21 
BMC Proceedings  2007;1(Suppl 1):S18.
Although single chi-square analysis of the North American Rheumatoid Arthritis Consortium (NARAC) data identifies many single-nucleotide polymorphisms (SNPs) with p-values less than 0.05, none remain significant after Bonferroni correction. In contrast, CHROMSCAN evades heavy Bonferroni correction and auto-correlation between SNPs by using composite likelihood to model association across all markers in a region and permutation to assess significance. Analysis by CHROMSCAN identifies a 36-kb interval that includes the most significant SNP (msSNP) observed in a 10-Mb target suggested by linkage. Unexpectedly, stratification by gender and age of onset shows that association evidence comes almost entirely from females with age of onset less than 40. Combining evidence from a meta-analysis of linkage studies and three subsets of the NARAC data provides significant evidence for a determinant of rheumatoid arthritis in a 36-kb interval and illustrates the principle that estimates of location and its information are more powerful than estimates of p-values alone.
PMCID: PMC2367616  PMID: 18466514
2.  Normalizing a large number of quantitative traits using empirical normal quantile transformation 
BMC Proceedings  2007;1(Suppl 1):S156.
Variance-components and regression-based methods are frequently used to map quantitative trait loci. The normality of the trait values is usually assumed and violation of this assumption can have a detrimental effect on the power and type I error of such analyses. Various transformations can be used, but appropriate transformations usually require careful analysis of individual traits, which is not feasible for data sets with a large number of traits like those in Problem 1 of Genetic Analysis Workshop 15 (GAW15). A semiparametric variance-components method can estimate the transformation along with the model parameters, but existing methods are computationally intensive. In this paper, we propose the use of empirical normal quantile transformation to normalize the scaled rank of trait values using an inverse normal transformation. Despite its simplicity and potential loss of information, this transformation is shown, by extensive simulations, to have good control of power and type I error, even when compared with the semiparametric method. To investigate the impact of such a transformation on real data sets, we apply variance-components and variance-regression methods to the expression data of GAW15 and compare the results before and after transformation.
PMCID: PMC2367615  PMID: 18466501
3.  Comparison of measures for haplotype similarity 
BMC Proceedings  2007;1(Suppl 1):S128.
Measuring the association of haplotype similarities with phenotype similarities has been used to develop statistical tests of genetic association. Previously, we applied the general approach of Mantel statistics to correlate genetic and phenotype similarity, where genetic similarity was defined by the number of intervals flanked by markers identical by state for pairs of haplotypes. Here we investigated in the case-control study design the effect on power of the Mantel statistics for five different measures of genetic similarity based on haplotypes: 1) the number of shared intervals, 2) the physical length of the shared intervals, 3) the genetic length of the shared intervals in centimorgans, 4) the genetic length of the shared intervals in linkage disequilibrium units (LDU) and 5) Yu's measure that attaches more weight to the sharing of rare than common alleles. With prior knowledge of the answers of Genetic Analysis Workshop 15 Problem 3, we analyzed the simulated data sets in two genomic regions surrounding the disease loci on chromosomes 6 and 18. For the dense map on chromosome 6, all methods showed a very high power of comparable magnitude. For chromosome 18, we observed a power between 19% and 99% at the pointwise 5% significance level using 1000 cases and 1000 controls for all methods except Yu's measure. While it yielded a much lower power, Yu's measure had 80% power around the disease locus.
PMCID: PMC2367614  PMID: 18466470
4.  Statistical corrections of linkage data suggest predominantly cis regulations of gene expression 
BMC Proceedings  2007;1(Suppl 1):S145.
Morley et al. (Nature 2004, 430:743–747) detected significant linkages to the expression levels of 142 genes (of 3554) at a reported threshold of genome-wide p = 0.001 (LOD ≈ 5.3), using 14 three-generation Centre d'Etude du Polymorphisme Humain pedigrees. Most of the linkages (77%) were trans, i.e., more than 5 Mb from the expressed gene. However, the analysis did not account for the expected anti-conservative effect of the skewed distribution of score- or regression-based statistics in large sibships, or for the possible variance distortion due to correlations among tests. Therefore, we re-analyzed their data, using a robust score statistic for the entire pedigrees and correcting the p-values for skewness. We found that a LOD of 5.3 had a skewness-corrected genome-wide p-value of 0.016 instead of 0.001 (a result that we confirmed using simulation), with around 50 expected false positives. We then further corrected for correlation among the (skew-corrected) p-values by using Efron's method for obtaining the empirical null distribution. Setting a threshold of FDR = 10% (Z = 6.4, LOD = 8.9), we detected linkage for the expression levels of 22 genes, 19 of which are cis. Limiting the analysis to cis regions, linkage was detected to the expression levels of 46 genes with 4.6 expected false positives (FDR = 10%).
PMCID: PMC2367613  PMID: 18466489
5.  Functional group-based linkage analysis of gene expression trait loci 
BMC Proceedings  2007;1(Suppl 1):S117.
We explored approaches to using multiple related traits (gene expression levels) in linkage analysis. We first grouped mRNA transcripts according to their functions annotated in biological process of gene ontology (GO). We then compared using sample average, principal-components analysis (PCA), and linear discriminant analysis (LDA) to derive a univariate composite trait. Our results showed that PCA generally yielded stronger evidence for linkage, through the LDA component had the highest heritability. We also developed an algorithm to search for clusters of linkage peaks from multiple traits in the same group and a heuristic method for calculating p-value evaluating the linkage peak clustering. Future research is needed to develop rigorous methods in mapping of genes affecting the expression of a group of transcripts.
PMCID: PMC2367612  PMID: 18466458
6.  Normalization of microarray expression data using within-pedigree pool and its effect on linkage analysis 
BMC Proceedings  2007;1(Suppl 1):S152.
"Genetical genomics", the study of natural genetic variation combining data from genetic marker-based studies with gene expression analyses, has exploded with the recent development of advanced microarray technologies. To account for systematic variation known to exist in microarray data, it is critical to properly normalize gene expression traits before performing genetic linkage analyses. However, imposing equal means and variances across pedigrees can over-correct for the true biological variation by ignoring familial correlations in expression values. We applied the robust multiarray average (RMA) method to gene expression trait data from 14 Centre d'Etude du Polymorphisme Humain (CEPH) Utah pedigrees provided by GAW15 (Genetic Analysis Workshop 15). We compared the RMA normalization method using within-pedigree pools to RMA normalization using all individuals in a single pool, which ignores pedigree membership, and investigated the effects of these different methods on 18 gene expression traits previously found to be linked to regions containing the corresponding structural locus. Familial correlation coefficients of the expressed traits were stronger when traits were normalized within pedigrees. Surprisingly, the linkage plots for these traits were similar, suggesting that although heritability increases when traits are normalized within pedigrees, the strength of linkage evidence does not necessarily change substantially.
PMCID: PMC2367611  PMID: 18466497
7.  Joint linkage and association analysis for identification of potentially causal polymorphisms in GAW15 data 
BMC Proceedings  2007;1(Suppl 1):S36.
In a small chromosomal region, a number of polymorphisms may be both linked to and associated with a disease. Potentially directly associated causal loci may be distinguished from indirectly associated loci by determining which associations can explain the observed linkage signal. We apply methods for testing whether association with a particular polymorphism or haplotype can explain an observed linkage signal to the Genetic Analysis Workshop 15 simulated (Problem 3) data, to try to identify potentially causal polymorphisms. We compare the power of several methods for testing the null hypothesis that association with a particular variant can explain the observed linkage signal, and discuss scenarios under which the various methods may be advantageous.
PMCID: PMC2367610  PMID: 18466534
8.  Two-stage approach for identifying single-nucleotide polymorphisms associated with rheumatoid arthritis using random forests and Bayesian networks 
BMC Proceedings  2007;1(Suppl 1):S56.
We used the simulated data set from Genetic Analysis Workshop 15 Problem 3 to assess a two-stage approach for identifying single-nucleotide polymorphisms (SNPs) associated with rheumatoid arthritis (RA). In the first stage, we used random forests (RF) to screen large amounts of genetic data using the variable importance measure, which takes into account SNP interaction effects as well as main effects without requiring model specification. We used the simulated 9187 SNPs mimicking a 10 K SNP chip, along with covariates DR (the simulated DRB1 gentoype), smoking, and sex as input to the RF analyses with a training set consisting of 750 unrelated RA cases and 750 controls. We used an iterative RF screening procedure to identify a smaller set of variables for further analysis. In the second stage, we used the software program CaMML for producing Bayesian networks, and developed complex etiologic models for RA risk using the variables identified by our RF screening procedure. We evaluated the performance of this method using independent test data sets for up to 100 replicates.
PMCID: PMC2367609  PMID: 18466556
9.  A new score statistic to test for association given linkage in affected sibling pair-control designs 
BMC Proceedings  2007;1(Suppl 1):S39.
To detect association of the DR1 allele with rheumatoid arthritis (RA) given linkage in the affected sibling pairs of the replicates of Problem 3 of Genetic Analysis Workshop 15 (GAW15), we propose a new score statistic that takes into account the linkage information. We knew the answers. Linkage studies are often followed by case-control association studies of candidate genes located under the peak to identify the causes of a linkage peak. One strategy is to type the affected sibling pairs from the original linkage study and a set of unrelated controls for single-nuclear polymorphisms describing the genetic variation of these genes. For this affected sibling pair-control design, we propose a relative-risk model for the relationship between the disease outcomes of sibling pairs and their genotypes and identity-by-descent status at the locus of interest. From this model, we derive a score statistic to analyze genetic association given linkage. We compare the performance of the new statistic to the method of Li et al. and to a standard association analysis that neglects the information on the identity-by-descent status of the sibling pair. We conclude that for the GAW15 data the new method performs well and that methods that use the linkage information may be more efficient than standard comparisons of genotypes in cases and controls.
PMCID: PMC2367608  PMID: 18466537
10.  Detecting disease-causing genes by LASSO-Patternsearch algorithm 
BMC Proceedings  2007;1(Suppl 1):S60.
The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data generation model included nine unobserved trait loci, most of which have one or more of the generated SNPs associated with them. These data sets provide an ideal experimental test bed for evaluating new and old algorithms for selecting SNPs and covariates that can separate cases from controls, because the cases and controls are known as well as the identities of the trait loci. LASSO-Patternsearch is a new multi-step algorithm with a LASSO-type penalized likelihood method at its core specifically designed to detect and model interactions between important predictor variables. In this article the original LASSO-Patternsearch algorithm is modified to handle the large number of SNPs plus covariates. We start with a screen step within the framework of parametric logistic regression. The patterns that survived the screen step were further selected by a penalized logistic regression with the LASSO penalty. And finally, a parametric logistic regression model were built on the patterns that survived the LASSO step. In our analysis of Genetic Analysis Workshop 15 Problem 3 data we have identified most of the associated SNPs and relevant covariates. Upon using the model as a classifier, very competitive error rates were obtained.
PMCID: PMC2367607  PMID: 18466561
11.  Exploration of non-hierarchical classification methods combined with linkage analysis to identify loci influencing clusters of co-regulated transcripts 
BMC Proceedings  2007;1(Suppl 1):S48.
Extensive studies have been performed to analyze variation in gene expression data by using multistage approaches, including a combination of microarray and linkage analysis. Such a method was recently used in the analysis of normal variation in gene expression by Cheung et al. (Nat. Genet. 2003, 33: 422–425) and Morley et al. (Nature 2004, 430: 743–747). Using these data, we also explored a multistage method by first performing non-hierarchical clustering for 3554 genes, which identified 114 clusters with number of genes ranging from 2 to 113. Heritabilities of the first principal component of each cluster were then estimated and 29 highly heritable clusters (i.e., h2 > 0.35) were further analyzed using variance components linkage analysis. The highest LOD score was observed on chromosome 1 (LOD = 5.36, 111.71 cM) for a cluster containing two genes [glutathione S-transferase M1 (GSTM1) and glutathione S-transferase M2 (GSTM2)] that are both located on chromosome 1p13.3. These results show the method followed in our analysis of performing cluster analysis followed by linkage analysis is another useful approach to identify chromosomal locations for genes affecting expression levels of multiple transcripts.
PMCID: PMC2367606  PMID: 18466547
12.  A two-stage classification approach identifies seven susceptibility genes for a simulated complex disease 
BMC Proceedings  2007;1(Suppl 1):S30.
The simulated data set of the Genetic Analysis Workshop 15 provided affection status, four quantitative traits, and a covariate. After studying the relationship between these variables, linkage analysis was undertaken. Analyses were performed in the first replicate only and without any prior knowledge of the underlying model. In addition to the main effect of the DR locus on chromosome 6, significant linkage was also identified on chromosomes 8, 9, 11, and 18. Notably, the power to detect linkage increased after transforming the skewed and kurtotic IgM and anti-CCP distributions. Moreover, genes on chromosome 11 could not be discerned from noise without the transformation, thus highlighting the need in real life situations for careful examination of the phenotypic data prior to genetic analysis. Significant association with one single-nucleotide polymorphism was identified for the regions on chromosome 11 and 18. Haplotype analyses were attempted for the other regions, but only the underlying variation of the DR locus could be identified. Two methods were then applied to predict classification using the factors identified so far. These methods – logistic regression and multifactor dimensionality reduction (MDR) – performed comparably for this data set. Those affected individuals that were misclassified as unaffected were then used in a genome-wide association analysis to identify additional susceptibility loci. Two additional loci were identified in this fashion, illustrating the usefulness of this two-stage classification approach.
PMCID: PMC2367605  PMID: 18466528
13.  Linkage studies of catechol-O-methyltransferase (COMT) and dopamine-beta-hydroxylase (DBH) cDNA expression levels 
BMC Proceedings  2007;1(Suppl 1):S95.
The COMT and DBH genes are physically located at chromosomes 22q11 and 9q34, respectively, and both COMT and DBH are involved in catecholamine metabolism and are strong candidates for certain psychiatric and neurological disorders. Although the genetic determinants for both enzymes' activities have been widely studied, their genetic involvement on gene mRNA expression levels remains unclear. In this study we performed quantitative linkage analysis of COMT and DBH cDNA expression levels, identifying transcriptional regulatory regions for both genes. Multiple Haseman-Elston regression was used to detect both additive and interactive effects between two loci. We found that the master transcriptional regulatory region 20q13 had an additive effect on the COMT expression level. We also found that chromosome 19p13 showed both additive and interactive effects with 9q34 on DBH expression level. Furthermore, a potential interaction between COMT and DBH was indicated.
PMCID: PMC2367604  PMID: 18466599
14.  Dealing with missing phase and missing data in phylogeny-based analysis 
BMC Proceedings  2007;1(Suppl 1):S22.
We recently described a new method to identify disease susceptibility loci, based on the analysis of the evolutionary relationships between haplotypes of cases and controls. However, haplotypes are often unknown and the problem of phase inference is even more crucial when there are missing data. In this work, we suggest using a multiple imputation algorithm to deal with missing phase and missing data, prior to a phylogeny-based analysis. We used the simulated data of Genetic Analysis Workshop 15 (Problem 3, answer known) to assess the power of the phylogeny-based analysis to detect disease susceptibility loci after reconstruction of haplotypes by a multiple-imputation method. We compare, for various rates of missing data, the performance of the multiple imputation method with the performance achieved when considering only the most probable haplotypic configurations or the true phase. When only the phase is unknown, all methods perform approximately the same to identify disease susceptibility sites. In the presence of missing data however, the detection of disease susceptibility sites is significantly better when reconstructing haplotypes by multiple imputation than when considering only the best haplotype configurations.
PMCID: PMC2367603  PMID: 18466519
15.  Impact of gene expression data pre-processing on expression quantitative trait locus mapping 
BMC Proceedings  2007;1(Suppl 1):S153.
We evaluate the impact of three pre-processing methods for Affymetrix microarray data on expression quantitative trait locus (eQTL) mapping, using 14 CEPH Utah families (GAW Problem 1 data). Different sets of expression traits were chosen according to different selection criteria: expression level, variance, and heritability. For each gene, three expression phenotypes were obtained by different pre-processing methods. Each quantitative phenotype was then submitted to a whole-genome scan, using multipoint variance component LODs. Pre-processing methods were compared with respect to their linkage outcomes (number of linkage signals with LODs greater than 3, consistencies in the location of the trait-specific linkage signals, and type of cis/trans-regulating loci). Overall, we found little agreement between linkage results from the different pre-processing methods: most of the linkage signals were specific to one pre-processing method. However, agreement rates varied according to the criteria used to select the traits. For instance, these rates were higher in the set of the most heritable traits. On the other hand, the pre-processing method had little impact on the relative proportion of detected cis and trans-regulating loci. Interestingly, although the number of detected cis-regulating loci was relatively small, pre-processing methods agreed much better in this set of linkage signals than in the trans-regulating loci. Several potential factors explaining the discordance observed between the methods are discussed.
PMCID: PMC2367602  PMID: 18466498
16.  Extracting disease risk profiles from expression data for linkage analysis: application to prostate cancer 
BMC Proceedings  2007;1(Suppl 1):S82.
The genetic factors underlying many complex traits are not well understood. The Genetic Analysis Workshop 15 Problem 1 data present the opportunity to explore whether gene expression data from microarrays can be utilized to define useful phenotypes for linkage analysis in complex diseases. We utilize expression profiles for multiple genes that have been associated with a disease to develop a composite 'risk profile' that can be used to map other loci involved in the same disease process. Using prostate cancer as our disease of interest, we identified 26 genes whose expression levels had previously been associated with prostate cancer and defined three phenotypes: high, neutral, or low risk profiles, based on individual expression levels. Linkage analyses using MCLINK, a Markov-chain Monte Carlo method, and MERLIN were performed for all three phenotypes. Both methods were in very close agreement. Genome-wide suggestive linkage evidence was observed on chromosomes 6 and 4. It was interesting to note that the linkage signals did not appear to be strongly influenced by the location of the original 26 genes used in the phenotype definition, indicating that composite measures may have potential to locate additional genes in the same process. In this example, however, extreme caution is necessary in any extrapolation of the identified loci to prostate cancer due to the lack of data regarding the behavior of these genes' expression level in lymphoblastoid cells. Our results do indicate there exists potential to augment our current knowledge about the relationships among genes associated with complex diseases using expression data.
PMCID: PMC2367601  PMID: 18466585
17.  Application of an iterative Bayesian variable selection method in a genome-wide association study of rheumatoid arthritis 
BMC Proceedings  2007;1(Suppl 1):S109.
Genome-wide association studies usually involve several hundred thousand of single-nucleotide polymorphisms (SNPs). Conventional approaches face challenges when there are enormous number of SNPs but a relatively small number of samples and, in some cases, are not feasible. We introduce here an iterative Bayesian variable selection method that provides a unique tool for association studies with a large number of SNPs (p) but a relatively small sample size (n). We applied this method to the simulated case-control sample provided by the Genetic Analysis Workshop 15 and compared its performance with stepwise variable selection method. We demonstrated that the results of iterative Bayesian variable selection applied to when p » n are as comparable as those of stepwise variable selection implemented to when n » p. When n > p, the iterative Bayesian variable selection performs better than stepwise variable selection does.
PMCID: PMC2367600  PMID: 18466449
18.  Searching for master regulators of transcription in a human gene expression data set 
BMC Proceedings  2007;1(Suppl 1):S81.
Microarray technologies allow the measurement of the expression levels of thousands of transcripts at the same time. As part of Genetic Analysis Workshop 15 (GAW15), we analyzed a data set that measured the expression of more than 3000 genes in 14 families. Our goal was to identify genomic regions that regulate the expression of several genes at the same time. We tried two different approaches: one was maximum likelihood-based variance-component linkage analysis and the other was a new linkage regression approach. We detected some loci that were linked with the expression level of more genes than would be expected by chance. These loci are candidates for master regulators of transcription (MRT). Finally, for each candidate MRT, we did a gene ontology (GO) analysis to test whether the genes linked to it were biologically related.
PMCID: PMC2367599  PMID: 18466584
19.  Comparing strategies for evaluation of candidate genes in case-control studies using family data 
BMC Proceedings  2007;1(Suppl 1):S31.
The goal of this analysis is to compare different test strategies for genetic association in case-control studies using related individuals. The first test is the trend test that is corrected for related individuals on the basis of identity-by-descent information. The second approach is to use generalized estimating equations to adjust for the correlation between relatives, and the third is the multiple outputation method. We compare the power of these test strategies in a simulation study, and apply these methods to a candidate gene dataset of Genetic Analysis Workshop 15 from the North American Rheumatoid Arthritis Consortium.
PMCID: PMC2367598  PMID: 18466529
20.  Comparison of the power of haplotype-based versus single- and multilocus association methods for gene × environment (gene × sex) interactions and application to gene × smoking and gene × sex interactions in rheumatoid arthritis 
BMC Proceedings  2007;1(Suppl 1):S73.
Accounting for interactions with environmental factors in association studies may improve the power to detect genetic effects and may help identifying important environmental effect modifiers. The power of unphased genotype-versus haplotype-based methods in regions with high linkage disequilibrium (LD), as measured by D', for analyzing gene × environment (gene × sex) interactions was compared using the Genetic Analysis Workshop 15 (GAW15) simulated data on rheumatoid arthritis with prior knowledge of the answers. Stepwise and regular conditional logistic regression (CLR) was performed using a matched case-control sample for a HLA region interacting with sex. Haplotype-based analyses were performed using a haplotype-sharing-based Mantel statistic and a test for haplotype-trait association in a general linear model framework. A step-down minP algorithm was applied to derive adjusted p-values and to allow for power comparisons. These methods were also applied to the GAW15 real data set for PTPN22.
For markers in strong LD, stepwise CLR performed poorly because of the correlation/collinearity between the predictors in the model. The power was high for detecting genetic main effects using simple CLR models and haplotype-based methods and for detecting joint effects using CLR and Mantel statistics. Only the haplotype-trait association test had high power to detect the gene × sex interaction.
In the PTPN22 region with markers characterized by strong LD, all methods indicated a significant genotype × sex interaction in a sample of about 1000 subjects. The previously reported R620W single-nucleotide polymorphism was identified using logistic regression, but the haplotype-based methods did not provide any precise location information.
PMCID: PMC2367597  PMID: 18466575
21.  Impact of marker density on the accuracy of association mapping 
BMC Proceedings  2007;1(Suppl 1):S166.
We studied the impact of marker density on the accuracy of association mapping using Genetic Analysis Workshop 15 simulated dense single-nucleotide polymorphism (SNP) data on chromosome 6. A total of 1500 cases and 2000 unaffected controls genotyped for 17,820 SNPs were analyzed. We applied the approach that combines information from multiple SNPs under the framework of the Malecot model and composite likelihood to non-overlapping regions of the chromosome. We successfully detected the associations with disease Loci C and D and predicted their locations as small as zero distance to Locus C when it was "typed" and 112 kb from the untyped rare Locus D. Reducing marker density decreased the accuracy of location estimates. However, the predicted locations were robust to variations in the number of SNPs. Generally, the linkage disequilibrium (LD) map reflecting distances between markers in relation to LD produced higher accuracy than the physical map. We also demonstrated that SNP selection based on equal LD distance outperforms that based on equal physical distance or SNP tagging. Furthermore, ignoring rare SNPs diminished the ability to detect rare causal variants.
PMCID: PMC2367596  PMID: 18466512
22.  Genetic heterogeneity and trans regulators of gene expression 
BMC Proceedings  2007;1(Suppl 1):S80.
Heterogeneity poses a challenge to linkage mapping. Here, we apply a latent class extension of Haseman-Elston regression to expression phenotypes with significant evidence of linkage to trans regulators in 14 large pedigrees. We test for linkage, accounting for heterogeneity, and classify individual families as "linked" and "unlinked" on the basis of their contribution to the overall evidence of linkage.
PMCID: PMC2367595  PMID: 18466583
23.  Comparison of genome-wide single-nucleotide polymorphism linkage analyses in Caucasian and Hispanic NARAC families 
BMC Proceedings  2007;1(Suppl 1):S97.
We performed linkage analysis on families with rheumatoid arthritis, stratifying by ethnic origin. We compared results using either Kong and Cox nonparametric LOD scores or MOD score analysis using the software GeneHunter MODSCORE. We first applied SNPLINK to remove markers showing excess linkage disequilibrium from the SNPs in the Illumina IV SNP Linkage panel. In this analysis there were 659 self-reported Caucasian families and 29 self-reported Hispanic families in the NARAC collection. Chromosome 19 yielded MOD scores > 3.00 in the Hispanic group, while chromosomes 2, 6, 7, 11, and XY had MOD scores > 3.00 in the Caucasian group. We performed simulation studies to evaluate the empirical distribution of the MOD score for autosomal loci separately in Hispanics and Caucasians. Results showed genome-wide significant evidence for linkage in Caucasians for chromosomes 2q and 6p, but no significant evidence for any linkages in the Hispanics, including little evidence for linkage to chromosome 6p in this group. An examination of the difference of phenotypes in two ethnic groups suggested significantly earlier mean age of onset, higher percentage of anti-cyclic citrullinated peptide positive people, and lower percentage of affected people carrying shared epitopes in Hispanics than those in Caucasians. A larger sample size of the Hispanic group is needed to identify linkage regions.
PMCID: PMC2367594  PMID: 18466601
24.  Mapping of trans-acting regulatory factors from microarray data 
BMC Proceedings  2007;1(Suppl 1):S155.
To explore the mapping of factors regulating gene expression, we have carried out linkage studies using expression data from individual transcripts (from Affymetrix microarrays; Genetic Analysis Workshop 15 Problem 1) and composite data on correlated groups of transcripts. Quality measures for the arrays were used to remove outliers, and arrays with sex mismatches were also removed. Data likely to represent noise were removed by setting a minimum threshold of present calls among the non-redundant set of 190 arrays. SOLAR was used for genetic analysis, with MAS5 signal as the measure of expression. Probe sets with larger CVs generated more linkages (LOD > 2.0). While trans linkages predominated, linkages with the largest LOD scores (>4) were mostly cis. Hierarchical clustering was used to generate correlated groups of genes. We tested four composite measures of expression for the clusters. The average signal, average normalized signal, and the first principal component of the data behaved similarly; in 8/19 clusters tested, the composite measures linked to a region to which some individual probe sets within the cluster also linked. The second principal component only produced one linkage with LOD > 2. One cluster based upon chromosomal location, containing histone genes, linked to two trans regions. This work demonstrates that composite measures for genes with correlated expression can be used to identify loci that affect multiple co-expressed genes.
PMCID: PMC2367593  PMID: 18466500
25.  Incorporating quantitative variables into linkage analysis using affected sib pairs 
BMC Proceedings  2007;1(Suppl 1):S98.
Rheumatoid arthritis is a complex disease in which environmental factors interact with genetic factors that influence susceptibility. Incorporating information about related quantitative traits or environmental factors into linkage mapping could therefore greatly improve the efficiency and precision of identifying the disease locus. Using a multipoint linkage approach that allows the incorporation of quantitative variables into multipoint linkage mapping based on affected sib pairs, we incorporated data on anti-cyclic citrullinated peptide antibodies, immunoglobulin M rheumatoid factor and age at onset into genome-wide linkage scans. The strongest evidence of linkage was observed on chromosome 6p with a p-value of 3.8 × 10-15 for the genetic effect. The trait locus is estimated at approximately 45.51–45.82 cM, with standard errors of the estimates range from 0.82 to 1.26 cM, depending on whether and which quantitative variable is incorporated. The standard error of the estimate of trait locus decreased about 28% to 35% after incorporating the additional information from the quantitative variables. This mapping technique helps to narrow down the regions of interest when searching for a susceptibility locus and to elucidate underlying disease mechanisms.
PMCID: PMC2367592  PMID: 18466602

Results 1-25 (166)