1.  Utilizing Graph Theory to Select the Largest Set of Unrelated Individuals for Genetic Analysis 
Genetic epidemiology  2012;37(2):136-141.
Many statistical analyses of genetic data rely on the assumption of independence among samples. Consequently, relatedness is either modeled in the analysis or samples are removed to “clean” the data of any pairwise relatedness above a tolerated threshold. Current methods do not maximize the number of unrelated individuals retained for further analysis, and this is a needless loss of resources. We report a novel application of graph theory that identifies the maximum set of unrelated samples in any dataset given a user-defined threshold of relatedness as well as all networks of related samples. We have implemented this method into an open source program called Pedigree Reconstruction and Identification of a Maximum Unrelated Set, PRIMUS. We show that PRIMUS outperforms the three existing methods, allowing researchers to retain up to 50% more unrelated samples. A unique strength of PRIMUS is its ability to weight the maximum clique selection using additional criteria (e.g. affected status and data missingness). PRIMUS is a permanent solution to identifying the maximum number of unrelated samples for a genetic analysis.
PMCID: PMC3770842  PMID: 22996348
genome-wide association study; Bron–Kerbosch; cryptic relatedness; bioinformatics; sample selection
2.  TCIRG1 associated Congenital Neutropenia 
Human mutation  2014;35(7):824-827.
Severe congenital neutropenia (SCN) is a rare hematopoietic disorder, with estimated incidence of 1 in 200,000 individuals of European descent, many cases of which are inherited in an autosomal dominant pattern. Despite the fact that several causal genes have been identified, the genetic basis for >30% of cases remains unknown. We report a five generation family segregating a novel single nucleotide variant (SNV) in TCIRG1. There is perfect co-segregation of the SNV with congenital neutropenia in this family; all 11 affected, but none of the unaffected, individuals carry this novel SNV. Western blot analysis show reduced levels of TCIRG1 protein in affected individuals, compared to healthy controls. Two unrelated patients with SCN, identified by independent investigators, are heterozygous for different, rare, highly conserved, coding variants in TCIRG1.
PMCID: PMC4055522  PMID: 24753205
TCIRG1; Congenital neutropenia; SCN; V-ATPase
3.  Secondary findings and carrier test frequencies in a large multiethnic sample 
Genome Medicine  2015;7(1):54.
Besides its growing importance in clinical diagnostics and understanding the genetic basis of Mendelian and complex diseases, whole exome sequencing (WES) is a rich source of additional information of potential clinical utility for physicians, patients and their families. We analyzed the frequency and nature of single nucleotide variants (SNVs) considered secondary findings and recessive disease allele carrier status in the exomes of 8554 individuals from a large, randomly sampled cohort study and 2514 patients from a study of presumed Mendelian disease having undergone WES.
We used the same sequencing platform and data processing pipeline to analyze all samples and characterized the distributions of reported pathogenic (ClinVar, Human Gene Mutation Database (HGMD)) and predicted deleterious variants in the pre-specified American College of Medical Genetics and Genomics (ACMG) secondary findings and recessive disease genes in different ethnic groups.
In the 56 ACMG secondary findings genes, the average number of predicted deleterious variants per individual was 0.74, and the mean number of ClinVar reported pathogenic variants was 0.06. We observed an average of 10 deleterious and 0.78 ClinVar reported pathogenic variants per individual in 1423 autosomal recessive disease genes. By repeatedly sampling pairs of exomes, 0.5 % of the randomly generated couples were at 25 % risk of having an affected offspring for an autosomal recessive disorder based on the ClinVar variants.
By investigating reported pathogenic and novel, predicted deleterious variants we estimated the lower and upper limits of the population fraction for which exome sequencing may reveal additional medically relevant information. We suggest that the observed wide range for the lower and upper limits of these frequency numbers will be gradually reduced due to improvement in classification databases and prediction algorithms.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0171-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4507324  PMID: 26195989
4.  Factors that Impact Susceptibility to Fiber-Induced Health Effects 
Asbestos and related fibers are associated with a number of adverse health effects, including malignant mesothelioma (MM), an aggressive cancer that generally develops in the surface serosal cells of the pleural, pericardial, and peritoneal cavities. Although approximately 80% of individuals with MM are exposed to asbestos, fewer than 5% of asbestos workers develop MM. In addition to asbestos, other mineralogical, environmental, genetic, and possibly viral factors might contribute to MM susceptibility. Given this complex etiology of MM, understanding susceptibility to MM needs to be a priority for investigators in order to reduce exposure of those most at risk to known environmental carcinogens. In this review, the current body of literature related to fiber-associated disease susceptibility including age, sex, nutrition, genetics, asbestos, and other mineral exposure is addressed with a focus on MM, and critical areas for further study are recommended.
PMCID: PMC3118508  PMID: 21534090
5.  Genome-wide meta-analysis for severe diabetic retinopathy 
Human Molecular Genetics  2011;20(12):2472-2481.
Diabetic retinopathy is a leading cause of blindness. The purpose of this study is to identify novel genetic loci associated with the sight threatening complications of diabetic retinopathy. We performed a meta-analysis of genome-wide association data for severe diabetic retinopathy as defined by diabetic macular edema or proliferative diabetic retinopathy in unrelated cases ascertained from two large, type I diabetic cohorts: the Genetics of Kidney in Diabetes (GoKinD) and the Epidemiology of Diabetes Intervention and Control Trial (EDIC) studies. Controls were other diabetic subjects in the cohort. A combined total of 2829 subjects (973 cases, 1856 controls) were studied on 2 543 887 single nucleotide polymorphisms (SNPs). Subjects with nephropathy were excluded in a sub-analysis of 281 severe retinopathy cases. We also performed an association analysis of 1390 copy number variations (CNVs) using tag SNPs. No associations were significant at a genome-wide level after correcting for multiple measures. The meta-analysis did identify several associations that can be pursued in future replication studies, including an intergenic SNP, rs476141, on chromosome 1 (P-value 1.2 × 10−7). The most interesting signal from the CNV analysis came from the sub-group analysis without nephropathy subjects and is rs10521145 (P-value 3.4 × 10−6) in the intron of CCDC101, a histone acetyltransferase. This SNP tags the copy number region CNVR6685.1 on chromosome 16 at 28.5 Mb, a gain/loss site. In summary, this study nominates several novel genetic loci associated with the sight-threatening complications of diabetic retinopathy and anticipates future large-scale consortium-based validation studies.
PMCID: PMC3098732  PMID: 21441570
6.  Germline BAP1 mutations predispose to malignant mesothelioma 
Nature genetics  2011;43(10):1022-1025.
Because only a small fraction of asbestos-exposed individuals develop malignant mesothelioma1, and because mesothelioma clustering is observed in some families1, we searched for genetic predisposing factors. We discovered germline mutations in BAP1 (BRCA1-associated protein 1) in two families with a high incidence of mesothelioma. Somatic alterations affecting BAP1 were observed in familial mesotheliomas, indicating biallelic inactivation. Besides mesothelioma, some BAP1 mutation carriers developed uveal melanoma. Germline BAP1 mutations were also found in two of 26 sporadic mesotheliomas: both patients with mutant BAP1 were previously diagnosed with uveal melanoma. Truncating mutations and aberrant BAP1 expression were common in sporadic mesotheliomas without germline mutations. These results reveal a BAP1-related cancer syndrome characterized by mesothelioma and uveal melanoma. We hypothesize that other cancers may also be involved, and that mesothelioma predominates upon asbestos exposure. These findings will help identify individuals at high risk of mesothelioma who could be targeted for early intervention.
PMCID: PMC3184199  PMID: 21874000
7.  Correlation of rare coding variants in the gene encoding human glucokinase regulatory protein with phenotypic, cellular, and kinetic outcomes 
Defining the genetic contribution of rare variants to common diseases is a major basic and clinical science challenge that could offer new insights into disease etiology and provide potential for directed gene- and pathway-based prevention and treatment. Common and rare nonsynonymous variants in the GCKR gene are associated with alterations in metabolic traits, most notably serum triglyceride levels. GCKR encodes glucokinase regulatory protein (GKRP), a predominantly nuclear protein that inhibits hepatic glucokinase (GCK) and plays a critical role in glucose homeostasis. The mode of action of rare GCKR variants remains unexplored. We identified 19 nonsynonymous GCKR variants among 800 individuals from the ClinSeq medical sequencing project. Excluding the previously described common missense variant p.Pro446Leu, all variants were rare in the cohort. Accordingly, we functionally characterized all variants to evaluate their potential phenotypic effects. Defects were observed for the majority of the rare variants after assessment of cellular localization, ability to interact with GCK, and kinetic activity of the encoded proteins. Comparing the individuals with functional rare variants to those without such variants showed associations with lipid phenotypes. Our findings suggest that, while nonsynonymous GCKR variants, excluding p.Pro446Leu, are rare in individuals of mixed European descent, the majority do affect protein function. In sum, this study utilizes computational, cell biological, and biochemical methods to present a model for interpreting the clinical significance of rare genetic variants in common disease.
PMCID: PMC3248284  PMID: 22182842
8.  Obesity and Hyperinsulinemia in a Family with Pancreatic Agenesis and MODY Due to the IPF1 Mutation Pro63fsX60 
We studied the genetic and clinical features of diabetic subjects in a five-generation Michigan-Kentucky pedigree ascertained through a proband with pancreatic agenesis and homozygous for the IPF1 mutation Pro63fsx60. Diabetic and nondiabetic family members were genotyped and phenotyped. We also carried out genetic studies to determine the history of the IPF1 mutation in the Michigan-Kentucky family and a Virginia family with the same mutation. We identified 110 individuals. Thirty-four are currently being treated for diabetes and ten of these are Pro63fsX60 carriers (i.e. MODY4). Subjects with MODY as well as those with type 2 diabetes are characterized by obesity and hyperinsulinemia. Genetic studies suggest that the IPF1 mutation was inherited from an ancestor common to both the Michigan-Kentucky and Virginia families. MODY4 and type 2 diabetes in the Michigan-Kentucky pedigree are associated with obesity and hyperinsulinemia. Obesity and hyperinsulinemia have been observed occasionally in other subtypes of MODY suggesting that hyperinsulinemia may be a general phenomenon when obesity occurs in MODY subjects. Hypoinsulinemia in non-obese MODY subjects appears to be due to a functional defect in the beta cell. Genetic testing should be considered in multigenerational obese diabetic subjects, particularly when such families contain young diabetic members.
PMCID: PMC2904650  PMID: 20621032
Diabetes Mellitus; MODY4; IPF1; Obesity; Hyperinsulinemia; Insulin Resistance
9.  A Genome-Wide Association Study Identifies a Novel Major Locus for Glycemic Control in Type 1 Diabetes, as Measured by Both A1C and Glucose 
Diabetes  2009;59(2):539-549.
Glycemia is a major risk factor for the development of long-term complications in type 1 diabetes; however, no specific genetic loci have been identified for glycemic control in individuals with type 1 diabetes. To identify such loci in type 1 diabetes, we analyzed longitudinal repeated measures of A1C from the Diabetes Control and Complications Trial.
We performed a genome-wide association study using the mean of quarterly A1C values measured over 6.5 years, separately in the conventional (n = 667) and intensive (n = 637) treatment groups of the DCCT. At loci of interest, linear mixed models were used to take advantage of all the repeated measures. We then assessed the association of these loci with capillary glucose and repeated measures of multiple complications of diabetes.
We identified a major locus for A1C levels in the conventional treatment group near SORCS1 (10q25.1, P = 7 × 10−10), which was also associated with mean glucose (P = 2 × 10−5). This was confirmed using A1C in the intensive treatment group (P = 0.01). Other loci achieved evidence close to genome-wide significance: 14q32.13 (GSC) and 9p22 (BNC2) in the combined treatment groups and 15q21.3 (WDR72) in the intensive group. Further, these loci gave evidence for association with diabetic complications, specifically SORCS1 with hypoglycemia and BNC2 with renal and retinal complications. We replicated the SORCS1 association in Genetics of Diabetes in Kidneys (GoKinD) study control subjects (P = 0.01) and the BNC2 association with A1C in nondiabetic individuals.
A major locus for A1C and glucose in individuals with diabetes is near SORCS1. This may influence the design and analysis of genetic studies attempting to identify risk factors for long-term diabetic complications.
PMCID: PMC2809960  PMID: 19875614

