PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1555418)

Clipboard (0)
None

Related Articles

1.  Identification of Genetic Association of Multiple Rare Variants Using Collapsing Methods 
Genetic Epidemiology  2011;35(Suppl 1):S101-S106.
Next-generation sequencing technology allows investigation of both common and rare variants in humans. Exomes are sequenced on the population level or in families to further study the genetics of human diseases. Genetic Analysis Workshop 17 (GAW17) provided exomic data from the 1000 Genomes Project and simulated phenotypes. These data enabled evaluations of existing and newly developed statistical methods for rare variant sequence analysis for which standard statistical methods fail because of the rareness of the alleles. Various alternative approaches have been proposed that overcome the rareness problem by combining multiple rare variants within a gene. These approaches are termed collapsing methods, and our GAW17 group focused on studying the performance of existing and novel collapsing methods using rare variants. All tested methods performed similarly, as measured by type I error and power. Inflated type I error fractions were consistently observed and might be caused by gametic phase disequilibrium between causal and noncausal rare variants in this relatively small sample as well as by population stratification. Incorporating prior knowledge, such as appropriate covariates and information on functionality of SNPs, increased the power of detecting associated genes. Overall, collapsing rare variants can increase the power of identifying disease-associated genes. However, studying genetic associations of rare variants remains a challenging task that requires further development and improvement in data collection, management, analysis, and computation.
doi:10.1002/gepi.20658
PMCID: PMC3289287  PMID: 22128049
1000 Genomes Project; association; collapsing methods; next-generation sequencing
2.  Human genome meeting 2016 
Srivastava, A. K. | Wang, Y. | Huang, R. | Skinner, C. | Thompson, T. | Pollard, L. | Wood, T. | Luo, F. | Stevenson, R. | Polimanti, R. | Gelernter, J. | Lin, X. | Lim, I. Y. | Wu, Y. | Teh, A. L. | Chen, L. | Aris, I. M. | Soh, S. E. | Tint, M. T. | MacIsaac, J. L. | Yap, F. | Kwek, K. | Saw, S. M. | Kobor, M. S. | Meaney, M. J. | Godfrey, K. M. | Chong, Y. S. | Holbrook, J. D. | Lee, Y. S. | Gluckman, P. D. | Karnani, N. | Kapoor, A. | Lee, D. | Chakravarti, A. | Maercker, C. | Graf, F. | Boutros, M. | Stamoulis, G. | Santoni, F. | Makrythanasis, P. | Letourneau, A. | Guipponi, M. | Panousis, N. | Garieri, M. | Ribaux, P. | Falconnet, E. | Borel, C. | Antonarakis, S. E. | Kumar, S. | Curran, J. | Blangero, J. | Chatterjee, S. | Kapoor, A. | Akiyama, J. | Auer, D. | Berrios, C. | Pennacchio, L. | Chakravarti, A. | Donti, T. R. | Cappuccio, G. | Miller, M. | Atwal, P. | Kennedy, A. | Cardon, A. | Bacino, C. | Emrick, L. | Hertecant, J. | Baumer, F. | Porter, B. | Bainbridge, M. | Bonnen, P. | Graham, B. | Sutton, R. | Sun, Q. | Elsea, S. | Hu, Z. | Wang, P. | Zhu, Y. | Zhao, J. | Xiong, M. | Bennett, David A. | Hidalgo-Miranda, A. | Romero-Cordoba, S. | Rodriguez-Cuevas, S. | Rebollar-Vega, R. | Tagliabue, E. | Iorio, M. | D’Ippolito, E. | Baroni, S. | Kaczkowski, B. | Tanaka, Y. | Kawaji, H. | Sandelin, A. | Andersson, R. | Itoh, M. | Lassmann, T. | Hayashizaki, Y. | Carninci, P. | Forrest, A. R. R. | Semple, C. A. | Rosenthal, E. A. | Shirts, B. | Amendola, L. | Gallego, C. | Horike-Pyne, M. | Burt, A. | Robertson, P. | Beyers, P. | Nefcy, C. | Veenstra, D. | Hisama, F. | Bennett, R. | Dorschner, M. | Nickerson, D. | Smith, J. | Patterson, K. | Crosslin, D. | Nassir, R. | Zubair, N. | Harrison, T. | Peters, U. | Jarvik, G. | Menghi, F. | Inaki, K. | Woo, X. | Kumar, P. | Grzeda, K. | Malhotra, A. | Kim, H. | Ucar, D. | Shreckengast, P. | Karuturi, K. | Keck, J. | Chuang, J. | Liu, E. T. | Ji, B. | Tyler, A. | Ananda, G. | Carter, G. | Nikbakht, H. | Montagne, M. | Zeinieh, M. | Harutyunyan, A. | Mcconechy, M. | Jabado, N. | Lavigne, P. | Majewski, J. | Goldstein, J. B. | Overman, M. | Varadhachary, G. | Shroff, R. | Wolff, R. | Javle, M. | Futreal, A. | Fogelman, D. | Bravo, L. | Fajardo, W. | Gomez, H. | Castaneda, C. | Rolfo, C. | Pinto, J. A. | Akdemir, K. C. | Chin, L. | Futreal, A. | Patterson, S. | Statz, C. | Mockus, S. | Nikolaev, S. N. | Bonilla, X. I. | Parmentier, L. | King, B. | Bezrukov, F. | Kaya, G. | Zoete, V. | Seplyarskiy, V. | Sharpe, H. | McKee, T. | Letourneau, A. | Ribaux, P. | Popadin, K. | Basset-Seguin, N. | Chaabene, R. Ben | Santoni, F. | Andrianova, M. | Guipponi, M. | Garieri, M. | Verdan, C. | Grosdemange, K. | Sumara, O. | Eilers, M. | Aifantis, I. | Michielin, O. | de Sauvage, F. | Antonarakis, S. | Likhitrattanapisal, S. | Lincoln, S. | Kurian, A. | Desmond, A. | Yang, S. | Kobayashi, Y. | Ford, J. | Ellisen, L. | Peters, T. L. | Alvarez, K. R. | Hollingsworth, E. F. | Lopez-Terrada, D. H. | Hastie, A. | Dzakula, Z. | Pang, A. W. | Lam, E. T. | Anantharaman, T. | Saghbini, M. | Cao, H. | Gonzaga-Jauregui, C. | Ma, L. | King, A. | Rosenzweig, E. Berman | Krishnan, U. | Reid, J. G. | Overton, J. D. | Dewey, F. | Chung, W. K. | Small, K. | DeLuca, A. | Cremers, F. | Lewis, R. A. | Puech, V. | Bakall, B. | Silva-Garcia, R. | Rohrschneider, K. | Leys, M. | Shaya, F. S. | Stone, E. | Sobreira, N. L. | Schiettecatte, F. | Ling, H. | Pugh, E. | Witmer, D. | Hetrick, K. | Zhang, P. | Doheny, K. | Valle, D. | Hamosh, A. | Jhangiani, S. N. | Akdemir, Z. Coban | Bainbridge, M. N. | Charng, W. | Wiszniewski, W. | Gambin, T. | Karaca, E. | Bayram, Y. | Eldomery, M. K. | Posey, J. | Doddapaneni, H. | Hu, J. | Sutton, V. R. | Muzny, D. M. | Boerwinkle, E. A. | Valle, D. | Lupski, J. R. | Gibbs, R. A. | Shekar, S. | Salerno, W. | English, A. | Mangubat, A. | Bruestle, J. | Thorogood, A. | Knoppers, B. M. | Takahashi, H. | Nitta, K. R. | Kozhuharova, A. | Suzuki, A. M. | Sharma, H. | Cotella, D. | Santoro, C. | Zucchelli, S. | Gustincich, S. | Carninci, P. | Mulvihill, J. J. | Baynam, G. | Gahl, W. | Groft, S. C. | Kosaki, K. | Lasko, P. | Melegh, B. | Taruscio, D. | Ghosh, R. | Plon, S. | Scherer, S. | Qin, X. | Sanghvi, R. | Walker, K. | Chiang, T. | Muzny, D. | Wang, L. | Black, J. | Boerwinkle, E. | Weinshilboum, R. | Gibbs, R. | Karpinets, T. | Calderone, T. | Wani, K. | Yu, X. | Creasy, C. | Haymaker, C. | Forget, M. | Nanda, V. | Roszik, J. | Wargo, J. | Haydu, L. | Song, X. | Lazar, A. | Gershenwald, J. | Davies, M. | Bernatchez, C. | Zhang, J. | Futreal, A. | Woodman, S. | Chesler, E. J. | Reynolds, T. | Bubier, J. A. | Phillips, C. | Langston, M. A. | Baker, E. J. | Xiong, M. | Ma, L. | Lin, N. | Amos, C. | Lin, N. | Wang, P. | Zhu, Y. | Zhao, J. | Calhoun, V. | Xiong, M. | Dobretsberger, O. | Egger, M. | Leimgruber, F. | Sadedin, S. | Oshlack, A. | Antonio, V. A. A. | Ono, N. | Ahmed, Z. | Bolisetty, M. | Zeeshan, S. | Anguiano, E. | Ucar, D. | Sarkar, A. | Nandineni, M. R. | Zeng, C. | Shao, J. | Cao, H. | Hastie, A. | Pang, A. W. | Lam, E. T. | Liang, T. | Pham, K. | Saghbini, M. | Dzakula, Z. | Chee-Wei, Y. | Dongsheng, L. | Lai-Ping, W. | Lian, D. | Hee, R. O. Twee | Yunus, Y. | Aghakhanian, F. | Mokhtar, S. S. | Lok-Yung, C. V. | Bhak, J. | Phipps, M. | Shuhua, X. | Yik-Ying, T. | Kumar, V. | Boon-Peng, H. | Campbell, I. | Young, M. -A. | James, P. | Rain, M. | Mohammad, G. | Kukreti, R. | Pasha, Q. | Akilzhanova, A. R. | Guelly, C. | Abilova, Z. | Rakhimova, S. | Akhmetova, A. | Kairov, U. | Trajanoski, S. | Zhumadilov, Z. | Bekbossynova, M. | Schumacher, C. | Sandhu, S. | Harkins, T. | Makarov, V. | Doddapaneni, H. | Glenn, R. | Momin, Z. | Dilrukshi, B. | Chao, H. | Meng, Q. | Gudenkauf, B. | Kshitij, R. | Jayaseelan, J. | Nessner, C. | Lee, S. | Blankenberg, K. | Lewis, L. | Hu, J. | Han, Y. | Dinh, H. | Jireh, S. | Walker, K. | Boerwinkle, E. | Muzny, D. | Gibbs, R. | Hu, J. | Walker, K. | Buhay, C. | Liu, X. | Wang, Q. | Sanghvi, R. | Doddapaneni, H. | Ding, Y. | Veeraraghavan, N. | Yang, Y. | Boerwinkle, E. | Beaudet, A. L. | Eng, C. M. | Muzny, D. M. | Gibbs, R. A. | Worley, K. C. C. | Liu, Y. | Hughes, D. S. T. | Murali, S. C. | Harris, R. A. | English, A. C. | Qin, X. | Hampton, O. A. | Larsen, P. | Beck, C. | Han, Y. | Wang, M. | Doddapaneni, H. | Kovar, C. L. | Salerno, W. J. | Yoder, A. | Richards, S. | Rogers, J. | Lupski, J. R. | Muzny, D. M. | Gibbs, R. A. | Meng, Q. | Bainbridge, M. | Wang, M. | Doddapaneni, H. | Han, Y. | Muzny, D. | Gibbs, R. | Harris, R. A. | Raveenedran, M. | Xue, C. | Dahdouli, M. | Cox, L. | Fan, G. | Ferguson, B. | Hovarth, J. | Johnson, Z. | Kanthaswamy, S. | Kubisch, M. | Platt, M. | Smith, D. | Vallender, E. | Wiseman, R. | Liu, X. | Below, J. | Muzny, D. | Gibbs, R. | Yu, F. | Rogers, J. | Lin, J. | Zhang, Y. | Ouyang, Z. | Moore, A. | Wang, Z. | Hofmann, J. | Purdue, M. | Stolzenberg-Solomon, R. | Weinstein, S. | Albanes, D. | Liu, C. S. | Cheng, W. L. | Lin, T. T. | Lan, Q. | Rothman, N. | Berndt, S. | Chen, E. S. | Bahrami, H. | Khoshzaban, A. | Keshal, S. Heidari | Bahrami, H. | Khoshzaban, A. | Keshal, S. Heidari | Alharbi, K. K. R. | Zhalbinova, M. | Akilzhanova, A. | Rakhimova, S. | Bekbosynova, M. | Myrzakhmetova, S. | Matar, M. | Mili, N. | Molinari, R. | Ma, Y. | Guerrier, S. | Elhawary, N. | Tayeb, M. | Bogari, N. | Qotb, N. | McClymont, S. A. | Hook, P. W. | Goff, L. A. | McCallion, A. | Kong, Y. | Charette, J. R. | Hicks, W. L. | Naggert, J. K. | Zhao, L. | Nishina, P. M. | Edrees, B. M. | Athar, M. | Al-Allaf, F. A. | Taher, M. M. | Khan, W. | Bouazzaoui, A. | Harbi, N. A. | Safar, R. | Al-Edressi, H. | Anazi, A. | Altayeb, N. | Ahmed, M. A. | Alansary, K. | Abduljaleel, Z. | Kratz, A. | Beguin, P. | Poulain, S. | Kaneko, M. | Takahiko, C. | Matsunaga, A. | Kato, S. | Suzuki, A. M. | Bertin, N. | Lassmann, T. | Vigot, R. | Carninci, P. | Plessy, C. | Launey, T. | Graur, D. | Lee, D. | Kapoor, A. | Chakravarti, A. | Friis-Nielsen, J. | Izarzugaza, J. M. | Brunak, S. | Chakraborty, A. | Basak, J. | Mukhopadhyay, A. | Soibam, B. S. | Das, D. | Biswas, N. | Das, S. | Sarkar, S. | Maitra, A. | Panda, C. | Majumder, P. | Morsy, H. | Gaballah, A. | Samir, M. | Shamseya, M. | Mahrous, H. | Ghazal, A. | Arafat, W. | Hashish, M. | Gruber, J. J. | Jaeger, N. | Snyder, M. | Patel, K. | Bowman, S. | Davis, T. | Kraushaar, D. | Emerman, A. | Russello, S. | Henig, N. | Hendrickson, C. | Zhang, K. | Rodriguez-Dorantes, M. | Cruz-Hernandez, C. D. | Garcia-Tobilla, C. D. P. | Solorzano-Rosales, S. | Jäger, N. | Chen, J. | Haile, R. | Hitchins, M. | Brooks, J. D. | Snyder, M. | Jiménez-Morales, S. | Ramírez, M. | Nuñez, J. | Bekker, V. | Leal, Y. | Jiménez, E. | Medina, A. | Hidalgo, A. | Mejía, J. | Halytskiy, V. | Naggert, J. | Collin, G. B. | DeMauro, K. | Hanusek, R. | Nishina, P. M. | Belhassa, K. | Belhassan, K. | Bouguenouch, L. | Samri, I. | Sayel, H. | moufid, FZ. | El Bouchikhi, I. | Trhanint, S. | Hamdaoui, H. | Elotmani, I. | Khtiri, I. | Kettani, O. | Quibibo, L. | Ahagoud, M. | Abbassi, M. | Ouldim, K. | Marusin, A. V. | Kornetov, A. N. | Swarovskaya, M. | Vagaiceva, K. | Stepanov, V. | De La Paz, E. M. Cutiongco | Sy, R. | Nevado, J. | Reganit, P. | Santos, L. | Magno, J. D. | Punzalan, F. E. | Ona, D. | Llanes, E. | Santos-Cortes, R. L. | Tiongco, R. | Aherrera, J. | Abrahan, L. | Pagauitan-Alan, P. | Morelli, K. H. | Domire, J. S. | Pyne, N. | Harper, S. | Burgess, R. | Zhalbinova, M. | Akilzhanova, A. | Rakhimova, S. | Bekbosynova, M. | Myrzakhmetova, S. | Gari, M. A. | Dallol, A. | Alsehli, H. | Gari, A. | Gari, M. | Abuzenadah, A. | Thomas, M. | Sukhai, M. | Garg, S. | Misyura, M. | Zhang, T. | Schuh, A. | Stockley, T. | Kamel-Reid, S. | Sherry, S. | Xiao, C. | Slotta, D. | Rodarmer, K. | Feolo, M. | Kimelman, M. | Godynskiy, G. | O’Sullivan, C. | Yaschenko, E. | Xiao, C. | Yaschenko, E. | Sherry, S. | Rangel-Escareño, C. | Rueda-Zarate, H. | Tayubi, I. A. | Mohammed, R. | Ahmed, I. | Ahmed, T. | Seth, S. | Amin, S. | Song, X. | Mao, X. | Sun, H. | Verhaak, R. G. | Futreal, A. | Zhang, J. | Whiite, S. J. | Chiang, T. | English, A. | Farek, J. | Kahn, Z. | Salerno, W. | Veeraraghavan, N. | Boerwinkle, E. | Gibbs, R. | Kasukawa, T. | Lizio, M. | Harshbarger, J. | Hisashi, S. | Severin, J. | Imad, A. | Sahin, S. | Freeman, T. C. | Baillie, K. | Sandelin, A. | Carninci, P. | Forrest, A. R. R. | Kawaji, H. | Salerno, W. | English, A. | Shekar, S. N. | Mangubat, A. | Bruestle, J. | Boerwinkle, E. | Gibbs, R. A. | Salem, A. H. | Ali, M. | Ibrahim, A. | Ibrahim, M. | Barrera, H. A. | Garza, L. | Torres, J. A. | Barajas, V. | Ulloa-Aguirre, A. | Kershenobich, D. | Mortaji, Shahroj | Guizar, Pedro | Loera, Eliezer | Moreno, Karen | De León, Adriana | Monsiváis, Daniela | Gómez, Jackeline | Cardiel, Raquel | Fernandez-Lopez, J. C. | Bonifaz-Peña, V. | Rangel-Escareño, C. | Hidalgo-Miranda, A. | Contreras, A. V. | Polfus, L. | Wang, X. | Philip, V. | Carter, G. | Abuzenadah, A. A. | Gari, M. | Turki, R. | Dallol, A. | Uyar, A. | Kaygun, A. | Zaman, S. | Marquez, E. | George, J. | Ucar, D. | Hendrickson, C. L. | Emerman, A. | Kraushaar, D. | Bowman, S. | Henig, N. | Davis, T. | Russello, S. | Patel, K. | Starr, D. B. | Baird, M. | Kirkpatrick, B. | Sheets, K. | Nitsche, R. | Prieto-Lafuente, L. | Landrum, M. | Lee, J. | Rubinstein, W. | Maglott, D. | Thavanati, P. K. R. | de Dios, A. Escoto | Hernandez, R. E. Navarro | Aldrate, M. E. Aguilar | Mejia, M. R. Ruiz | Kanala, K. R. R. | Abduljaleel, Z. | Khan, W. | Al-Allaf, F. A. | Athar, M. | Taher, M. M. | Shahzad, N. | Bouazzaoui, A. | Huber, E. | Dan, A. | Al-Allaf, F. A. | Herr, W. | Sprotte, G. | Köstler, J. | Hiergeist, A. | Gessner, A. | Andreesen, R. | Holler, E. | Al-Allaf, F. | Alashwal, A. | Abduljaleel, Z. | Taher, M. | Bouazzaoui, A. | Abalkhail, H. | Al-Allaf, A. | Bamardadh, R. | Athar, M. | Filiptsova, O. | Kobets, M. | Kobets, Y. | Burlaka, I. | Timoshyna, I. | Filiptsova, O. | Kobets, M. N. | Kobets, Y. | Burlaka, I. | Timoshyna, I. | Filiptsova, O. | Kobets, M. N. | Kobets, Y. | Burlaka, I. | Timoshyna, I. | Al-allaf, F. A. | Mohiuddin, M. T. | Zainularifeen, A. | Mohammed, A. | Abalkhail, H. | Owaidah, T. | Bouazzaoui, A.
Human Genomics  2016;10(Suppl 1):12.
Table of contents
O1 The metabolomics approach to autism: identification of biomarkers for early detection of autism spectrum disorder
A. K. Srivastava, Y. Wang, R. Huang, C. Skinner, T. Thompson, L. Pollard, T. Wood, F. Luo, R. Stevenson
O2 Phenome-wide association study for smoking- and drinking-associated genes in 26,394 American women with African, Asian, European, and Hispanic descents
R. Polimanti, J. Gelernter
O3 Effects of prenatal environment, genotype and DNA methylation on birth weight and subsequent postnatal outcomes: findings from GUSTO, an Asian birth cohort
X. Lin, I. Y. Lim, Y. Wu, A. L. Teh, L. Chen, I. M. Aris, S. E. Soh, M. T. Tint, J. L. MacIsaac, F. Yap, K. Kwek, S. M. Saw, M. S. Kobor, M. J. Meaney, K. M. Godfrey, Y. S. Chong, J. D. Holbrook, Y. S. Lee, P. D. Gluckman, N. Karnani, GUSTO study group
O4 High-throughput identification of specific qt interval modulating enhancers at the SCN5A locus
A. Kapoor, D. Lee, A. Chakravarti
O5 Identification of extracellular matrix components inducing cancer cell migration in the supernatant of cultivated mesenchymal stem cells
C. Maercker, F. Graf, M. Boutros
O6 Single cell allele specific expression (ASE) IN T21 and common trisomies: a novel approach to understand DOWN syndrome and other aneuploidies
G. Stamoulis, F. Santoni, P. Makrythanasis, A. Letourneau, M. Guipponi, N. Panousis, M. Garieri, P. Ribaux, E. Falconnet, C. Borel, S. E. Antonarakis
O7 Role of microRNA in LCL to IPSC reprogramming
S. Kumar, J. Curran, J. Blangero
O8 Multiple enhancer variants disrupt gene regulatory network in Hirschsprung disease
S. Chatterjee, A. Kapoor, J. Akiyama, D. Auer, C. Berrios, L. Pennacchio, A. Chakravarti
O9 Metabolomic profiling for the diagnosis of neurometabolic disorders
T. R. Donti, G. Cappuccio, M. Miller, P. Atwal, A. Kennedy, A. Cardon, C. Bacino, L. Emrick, J. Hertecant, F. Baumer, B. Porter, M. Bainbridge, P. Bonnen, B. Graham, R. Sutton, Q. Sun, S. Elsea
O10 A novel causal methylation network approach to Alzheimer’s disease
Z. Hu, P. Wang, Y. Zhu, J. Zhao, M. Xiong, David A Bennett
O11 A microRNA signature identifies subtypes of triple-negative breast cancer and reveals MIR-342-3P as regulator of a lactate metabolic pathway
A. Hidalgo-Miranda, S. Romero-Cordoba, S. Rodriguez-Cuevas, R. Rebollar-Vega, E. Tagliabue, M. Iorio, E. D’Ippolito, S. Baroni
O12 Transcriptome analysis identifies genes, enhancer RNAs and repetitive elements that are recurrently deregulated across multiple cancer types
B. Kaczkowski, Y. Tanaka, H. Kawaji, A. Sandelin, R. Andersson, M. Itoh, T. Lassmann, the FANTOM5 consortium, Y. Hayashizaki, P. Carninci, A. R. R. Forrest
O13 Elevated mutation and widespread loss of constraint at regulatory and architectural binding sites across 11 tumour types
C. A. Semple
O14 Exome sequencing provides evidence of pathogenicity for genes implicated in colorectal cancer
E. A. Rosenthal, B. Shirts, L. Amendola, C. Gallego, M. Horike-Pyne, A. Burt, P. Robertson, P. Beyers, C. Nefcy, D. Veenstra, F. Hisama, R. Bennett, M. Dorschner, D. Nickerson, J. Smith, K. Patterson, D. Crosslin, R. Nassir, N. Zubair, T. Harrison, U. Peters, G. Jarvik, NHLBI GO Exome Sequencing Project
O15 The tandem duplicator phenotype as a distinct genomic configuration in cancer
F. Menghi, K. Inaki, X. Woo, P. Kumar, K. Grzeda, A. Malhotra, H. Kim, D. Ucar, P. Shreckengast, K. Karuturi, J. Keck, J. Chuang, E. T. Liu
O16 Modeling genetic interactions associated with molecular subtypes of breast cancer
B. Ji, A. Tyler, G. Ananda, G. Carter
O17 Recurrent somatic mutation in the MYC associated factor X in brain tumors
H. Nikbakht, M. Montagne, M. Zeinieh, A. Harutyunyan, M. Mcconechy, N. Jabado, P. Lavigne, J. Majewski
O18 Predictive biomarkers to metastatic pancreatic cancer treatment
J. B. Goldstein, M. Overman, G. Varadhachary, R. Shroff, R. Wolff, M. Javle, A. Futreal, D. Fogelman
O19 DDIT4 gene expression as a prognostic marker in several malignant tumors
L. Bravo, W. Fajardo, H. Gomez, C. Castaneda, C. Rolfo, J. A. Pinto
O20 Spatial organization of the genome and genomic alterations in human cancers
K. C. Akdemir, L. Chin, A. Futreal, ICGC PCAWG Structural Alterations Group
O21 Landscape of targeted therapies in solid tumors
S. Patterson, C. Statz, S. Mockus
O22 Genomic analysis reveals novel drivers and progression pathways in skin basal cell carcinoma
S. N. Nikolaev, X. I. Bonilla, L. Parmentier, B. King, F. Bezrukov, G. Kaya, V. Zoete, V. Seplyarskiy, H. Sharpe, T. McKee, A. Letourneau, P. Ribaux, K. Popadin, N. Basset-Seguin, R. Ben Chaabene, F. Santoni, M. Andrianova, M. Guipponi, M. Garieri, C. Verdan, K. Grosdemange, O. Sumara, M. Eilers, I. Aifantis, O. Michielin, F. de Sauvage, S. Antonarakis
O23 Identification of differential biomarkers of hepatocellular carcinoma and cholangiocarcinoma via transcriptome microarray meta-analysis
S. Likhitrattanapisal
O24 Clinical validity and actionability of multigene tests for hereditary cancers in a large multi-center study
S. Lincoln, A. Kurian, A. Desmond, S. Yang, Y. Kobayashi, J. Ford, L. Ellisen
O25 Correlation with tumor ploidy status is essential for correct determination of genome-wide copy number changes by SNP array
T. L. Peters, K. R. Alvarez, E. F. Hollingsworth, D. H. Lopez-Terrada
O26 Nanochannel based next-generation mapping for interrogation of clinically relevant structural variation
A. Hastie, Z. Dzakula, A. W. Pang, E. T. Lam, T. Anantharaman, M. Saghbini, H. Cao, BioNano Genomics
O27 Mutation spectrum in a pulmonary arterial hypertension (PAH) cohort and identification of associated truncating mutations in TBX4
C. Gonzaga-Jauregui, L. Ma, A. King, E. Berman Rosenzweig, U. Krishnan, J. G. Reid, J. D. Overton, F. Dewey, W. K. Chung
O28 NORTH CAROLINA macular dystrophy (MCDR1): mutations found affecting PRDM13
K. Small, A. DeLuca, F. Cremers, R. A. Lewis, V. Puech, B. Bakall, R. Silva-Garcia, K. Rohrschneider, M. Leys, F. S. Shaya, E. Stone
O29 PhenoDB and genematcher, solving unsolved whole exome sequencing data
N. L. Sobreira, F. Schiettecatte, H. Ling, E. Pugh, D. Witmer, K. Hetrick, P. Zhang, K. Doheny, D. Valle, A. Hamosh
O30 Baylor-Johns Hopkins Center for Mendelian genomics: a four year review
S. N. Jhangiani, Z. Coban Akdemir, M. N. Bainbridge, W. Charng, W. Wiszniewski, T. Gambin, E. Karaca, Y. Bayram, M. K. Eldomery, J. Posey, H. Doddapaneni, J. Hu, V. R. Sutton, D. M. Muzny, E. A. Boerwinkle, D. Valle, J. R. Lupski, R. A. Gibbs
O31 Using read overlap assembly to accurately identify structural genetic differences in an ashkenazi jewish trio
S. Shekar, W. Salerno, A. English, A. Mangubat, J. Bruestle
O32 Legal interoperability: a sine qua non for international data sharing
A. Thorogood, B. M. Knoppers, Global Alliance for Genomics and Health - Regulatory and Ethics Working Group
O33 High throughput screening platform of competent sineups: that can enhance translation activities of therapeutic target
H. Takahashi, K. R. Nitta, A. Kozhuharova, A. M. Suzuki, H. Sharma, D. Cotella, C. Santoro, S. Zucchelli, S. Gustincich, P. Carninci
O34 The undiagnosed diseases network international (UDNI): clinical and laboratory research to meet patient needs
J. J. Mulvihill, G. Baynam, W. Gahl, S. C. Groft, K. Kosaki, P. Lasko, B. Melegh, D. Taruscio
O36 Performance of computational algorithms in pathogenicity predictions for activating variants in oncogenes versus loss of function mutations in tumor suppressor genes
R. Ghosh, S. Plon
O37 Identification and electronic health record incorporation of clinically actionable pharmacogenomic variants using prospective targeted sequencing
S. Scherer, X. Qin, R. Sanghvi, K. Walker, T. Chiang, D. Muzny, L. Wang, J. Black, E. Boerwinkle, R. Weinshilboum, R. Gibbs
O38 Melanoma reprogramming state correlates with response to CTLA-4 blockade in metastatic melanoma
T. Karpinets, T. Calderone, K. Wani, X. Yu, C. Creasy, C. Haymaker, M. Forget, V. Nanda, J. Roszik, J. Wargo, L. Haydu, X. Song, A. Lazar, J. Gershenwald, M. Davies, C. Bernatchez, J. Zhang, A. Futreal, S. Woodman
O39 Data-driven refinement of complex disease classification from integration of heterogeneous functional genomics data in GeneWeaver
E. J. Chesler, T. Reynolds, J. A. Bubier, C. Phillips, M. A. Langston, E. J. Baker
O40 A general statistic framework for genome-based disease risk prediction
M. Xiong, L. Ma, N. Lin, C. Amos
O41 Integrative large-scale causal network analysis of imaging and genomic data and its application in schizophrenia studies
N. Lin, P. Wang, Y. Zhu, J. Zhao, V. Calhoun, M. Xiong
O42 Big data and NGS data analysis: the cloud to the rescue
O. Dobretsberger, M. Egger, F. Leimgruber
O43 Cpipe: a convergent clinical exome pipeline specialised for targeted sequencing
S. Sadedin, A. Oshlack, Melbourne Genomics Health Alliance
O44 A Bayesian classification of biomedical images using feature extraction from deep neural networks implemented on lung cancer data
V. A. A. Antonio, N. Ono, Clark Kendrick C. Go
O45 MAV-SEQ: an interactive platform for the Management, Analysis, and Visualization of sequence data
Z. Ahmed, M. Bolisetty, S. Zeeshan, E. Anguiano, D. Ucar
O47 Allele specific enhancer in EPAS1 intronic regions may contribute to high altitude adaptation of Tibetans
C. Zeng, J. Shao
O48 Nanochannel based next-generation mapping for structural variation detection and comparison in trios and populations
H. Cao, A. Hastie, A. W. Pang, E. T. Lam, T. Liang, K. Pham, M. Saghbini, Z. Dzakula
O49 Archaic introgression in indigenous populations of Malaysia revealed by whole genome sequencing
Y. Chee-Wei, L. Dongsheng, W. Lai-Ping, D. Lian, R. O. Twee Hee, Y. Yunus, F. Aghakhanian, S. S. Mokhtar, C. V. Lok-Yung, J. Bhak, M. Phipps, X. Shuhua, T. Yik-Ying, V. Kumar, H. Boon-Peng
O50 Breast and ovarian cancer prevention: is it time for population-based mutation screening of high risk genes?
I. Campbell, M.-A. Young, P. James, Lifepool
O53 Comprehensive coverage from low DNA input using novel NGS library preparation methods for WGS and WGBS
C. Schumacher, S. Sandhu, T. Harkins, V. Makarov
O54 Methods for large scale construction of robust PCR-free libraries for sequencing on Illumina HiSeqX platform
H. DoddapaneniR. Glenn, Z. Momin, B. Dilrukshi, H. Chao, Q. Meng, B. Gudenkauf, R. Kshitij, J. Jayaseelan, C. Nessner, S. Lee, K. Blankenberg, L. Lewis, J. Hu, Y. Han, H. Dinh, S. Jireh, K. Walker, E. Boerwinkle, D. Muzny, R. Gibbs
O55 Rapid capture methods for clinical sequencing
J. Hu, K. Walker, C. Buhay, X. Liu, Q. Wang, R. Sanghvi, H. Doddapaneni, Y. Ding, N. Veeraraghavan, Y. Yang, E. Boerwinkle, A. L. Beaudet, C. M. Eng, D. M. Muzny, R. A. Gibbs
O56 A diploid personal human genome model for better genomes from diverse sequence data
K. C. C. Worley, Y. Liu, D. S. T. Hughes, S. C. Murali, R. A. Harris, A. C. English, X. Qin, O. A. Hampton, P. Larsen, C. Beck, Y. Han, M. Wang, H. Doddapaneni, C. L. Kovar, W. J. Salerno, A. Yoder, S. Richards, J. Rogers, J. R. Lupski, D. M. Muzny, R. A. Gibbs
O57 Development of PacBio long range capture for detection of pathogenic structural variants
Q. Meng, M. Bainbridge, M. Wang, H. Doddapaneni, Y. Han, D. Muzny, R. Gibbs
O58 Rhesus macaques exhibit more non-synonymous variation but greater impact of purifying selection than humans
R. A. Harris, M. Raveenedran, C. Xue, M. Dahdouli, L. Cox, G. Fan, B. Ferguson, J. Hovarth, Z. Johnson, S. Kanthaswamy, M. Kubisch, M. Platt, D. Smith, E. Vallender, R. Wiseman, X. Liu, J. Below, D. Muzny, R. Gibbs, F. Yu, J. Rogers
O59 Assessing RNA structure disruption induced by single-nucleotide variation
J. Lin, Y. Zhang, Z. Ouyang
P1 A meta-analysis of genome-wide association studies of mitochondrial dna copy number
A. Moore, Z. Wang, J. Hofmann, M. Purdue, R. Stolzenberg-Solomon, S. Weinstein, D. Albanes, C.-S. Liu, W.-L. Cheng, T.-T. Lin, Q. Lan, N. Rothman, S. Berndt
P2 Missense polymorphic genetic combinations underlying down syndrome susceptibility
E. S. Chen
P4 The evaluation of alteration of ELAM-1 expression in the endometriosis patients
H. Bahrami, A. Khoshzaban, S. Heidari Keshal
P5 Obesity and the incidence of apolipoprotein E polymorphisms in an assorted population from Saudi Arabia population
K. K. R. Alharbi
P6 Genome-associated personalized antithrombotical therapy for patients with high risk of thrombosis and bleeding
M. Zhalbinova, A. Akilzhanova, S. Rakhimova, M. Bekbosynova, S. Myrzakhmetova
P7 Frequency of Xmn1 polymorphism among sickle cell carrier cases in UAE population
M. Matar
P8 Differentiating inflammatory bowel diseases by using genomic data: dimension of the problem and network organization
N. Mili, R. Molinari, Y. Ma, S. Guerrier
P9 Vulnerability of genetic variants to the risk of autism among Saudi children
N. Elhawary, M. Tayeb, N. Bogari, N. Qotb
P10 Chromatin profiles from ex vivo purified dopaminergic neurons establish a promising model to support studies of neurological function and dysfunction
S. A. McClymont, P. W. Hook, L. A. Goff, A. McCallion
P11 Utilization of a sensitized chemical mutagenesis screen to identify genetic modifiers of retinal dysplasia in homozygous Nr2e3rd7 mice
Y. Kong, J. R. Charette, W. L. Hicks, J. K. Naggert, L. Zhao, P. M. Nishina
P12 Ion torrent next generation sequencing of recessive polycystic kidney disease in Saudi patients
B. M. Edrees, M. Athar, F. A. Al-Allaf, M. M. Taher, W. Khan, A. Bouazzaoui, N. A. Harbi, R. Safar, H. Al-Edressi, A. Anazi, N. Altayeb, M. A. Ahmed, K. Alansary, Z. Abduljaleel
P13 Digital expression profiling of Purkinje neurons and dendrites in different subcellular compartments
A. Kratz, P. Beguin, S. Poulain, M. Kaneko, C. Takahiko, A. Matsunaga, S. Kato, A. M. Suzuki, N. Bertin, T. Lassmann, R. Vigot, P. Carninci, C. Plessy, T. Launey
P14 The evolution of imperfection and imperfection of evolution: the functional and functionless fractions of the human genome
D. Graur
P16 Species-independent identification of known and novel recurrent genomic entities in multiple cancer patients
J. Friis-Nielsen, J. M. Izarzugaza, S. Brunak
P18 Discovery of active gene modules which are densely conserved across multiple cancer types reveal their prognostic power and mutually exclusive mutation patterns
B. S. Soibam
P19 Whole exome sequencing of dysplastic leukoplakia tissue indicates sequential accumulation of somatic mutations from oral precancer to cancer
D. Das, N. Biswas, S. Das, S. Sarkar, A. Maitra, C. Panda, P. Majumder
P21 Epigenetic mechanisms of carcinogensis by hereditary breast cancer genes
J. J. Gruber, N. Jaeger, M. Snyder
P22 RNA direct: a novel RNA enrichment strategy applied to transcripts associated with solid tumors
K. Patel, S. Bowman, T. Davis, D. Kraushaar, A. Emerman, S. Russello, N. Henig, C. Hendrickson
P23 RNA sequencing identifies gene mutations for neuroblastoma
K. Zhang
P24 Participation of SFRP1 in the modulation of TMPRSS2-ERG fusion gene in prostate cancer cell lines
M. Rodriguez-Dorantes, C. D. Cruz-Hernandez, C. D. P. Garcia-Tobilla, S. Solorzano-Rosales
P25 Targeted Methylation Sequencing of Prostate Cancer
N. Jäger, J. Chen, R. Haile, M. Hitchins, J. D. Brooks, M. Snyder
P26 Mutant TPMT alleles in children with acute lymphoblastic leukemia from México City and Yucatán, Mexico
S. Jiménez-Morales, M. Ramírez, J. Nuñez, V. Bekker, Y. Leal, E. Jiménez, A. Medina, A. Hidalgo, J. Mejía
P28 Genetic modifiers of Alström syndrome
J. Naggert, G. B. Collin, K. DeMauro, R. Hanusek, P. M. Nishina
P31 Association of genomic variants with the occurrence of angiotensin-converting-enzyme inhibitor (ACEI)-induced coughing among Filipinos
E. M. Cutiongco De La Paz, R. Sy, J. Nevado, P. Reganit, L. Santos, J. D. Magno, F. E. Punzalan , D. Ona , E. Llanes, R. L. Santos-Cortes , R. Tiongco, J. Aherrera, L. Abrahan, P. Pagauitan-Alan; Philippine Cardiogenomics Study Group
P32 The use of “humanized” mouse models to validate disease association of a de novo GARS variant and to test a novel gene therapy strategy for Charcot-Marie-Tooth disease type 2D
K. H. Morelli, J. S. Domire, N. Pyne, S. Harper, R. Burgess
P34 Molecular regulation of chondrogenic human induced pluripotent stem cells
M. A. Gari, A. Dallol, H. Alsehli, A. Gari, M. Gari, A. Abuzenadah
P35 Molecular profiling of hematologic malignancies: implementation of a variant assessment algorithm for next generation sequencing data analysis and clinical reporting
M. Thomas, M. Sukhai, S. Garg, M. Misyura, T. Zhang, A. Schuh, T. Stockley, S. Kamel-Reid
P36 Accessing genomic evidence for clinical variants at NCBI
S. Sherry, C. Xiao, D. Slotta, K. Rodarmer, M. Feolo, M. Kimelman, G. Godynskiy, C. O’Sullivan, E. Yaschenko
P37 NGS-SWIFT: a cloud-based variant analysis framework using control-accessed sequencing data from DBGAP/SRA
C. Xiao, E. Yaschenko, S. Sherry
P38 Computational assessment of drug induced hepatotoxicity through gene expression profiling
C. Rangel-Escareño, H. Rueda-Zarate
P40 Flowr: robust and efficient pipelines using a simple language-agnostic approach;ultraseq; fast modular pipeline for somatic variation calling using flowr
S. Seth, S. Amin, X. Song, X. Mao, H. Sun, R. G. Verhaak, A. Futreal, J. Zhang
P41 Applying “Big data” technologies to the rapid analysis of heterogenous large cohort data
S. J. Whiite, T. Chiang, A. English, J. Farek, Z. Kahn, W. Salerno, N. Veeraraghavan, E. Boerwinkle, R. Gibbs
P42 FANTOM5 web resource for the large-scale genome-wide transcription start site activity profiles of wide-range of mammalian cells
T. Kasukawa, M. Lizio, J. Harshbarger, S. Hisashi, J. Severin, A. Imad, S. Sahin, T. C. Freeman, K. Baillie, A. Sandelin, P. Carninci, A. R. R. Forrest, H. Kawaji, The FANTOM Consortium
P43 Rapid and scalable typing of structural variants for disease cohorts
W. Salerno, A. English, S. N. Shekar, A. Mangubat, J. Bruestle, E. Boerwinkle, R. A. Gibbs
P44 Polymorphism of glutathione S-transferases and sulphotransferases genes in an Arab population
A. H. Salem, M. Ali, A. Ibrahim, M. Ibrahim
P46 Genetic divergence of CYP3A5*3 pharmacogenomic marker for native and admixed Mexican populations
J. C. Fernandez-Lopez, V. Bonifaz-Peña, C. Rangel-Escareño, A. Hidalgo-Miranda, A. V. Contreras
P47 Whole exome sequence meta-analysis of 13 white blood cell, red blood cell, and platelet traits
L. Polfus, CHARGE and NHLBI Exome Sequence Project Working Groups
P48 Association of adipoq gene with type 2 diabetes and related phenotypes in african american men and women: The jackson heart study
S. Davis, R. Xu, S. Gebeab, P Riestra, A Gaye, R. Khan, J. Wilson, A. Bidulescu
P49 Common variants in casr gene are associated with serum calcium levels in koreans
S. H. Jung, N. Vinayagamoorthy, S. H. Yim, Y. J. Chung
P50 Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with multiple exponential functions
Y. Zhou, S. Xu
P51 A Bayesian framework for generalized linear mixed models in genome-wide association studies
X. Wang, V. Philip, G. Carter
P52 Targeted sequencing approach for the identification of the genetic causes of hereditary hearing impairment
A. A. Abuzenadah, M. Gari, R. Turki, A. Dallol
P53 Identification of enhancer sequences by ATAC-seq open chromatin profiling
A. Uyar, A. Kaygun, S. Zaman, E. Marquez, J. George, D. Ucar
P54 Direct enrichment for the rapid preparation of targeted NGS libraries
C. L. Hendrickson, A. Emerman, D. Kraushaar, S. Bowman, N. Henig, T. Davis, S. Russello, K. Patel
P56 Performance of the Agilent D5000 and High Sensitivity D5000 ScreenTape assays for the Agilent 4200 Tapestation System
R. Nitsche, L. Prieto-Lafuente
P57 ClinVar: a multi-source archive for variant interpretation
M. Landrum, J. Lee, W. Rubinstein, D. Maglott
P59 Association of functional variants and protein physical interactions of human MUTY homolog linked with familial adenomatous polyposis and colorectal cancer syndrome
Z. Abduljaleel, W. Khan, F. A. Al-Allaf, M. Athar , M. M. Taher, N. Shahzad
P60 Modification of the microbiom constitution in the gut using chicken IgY antibodies resulted in a reduction of acute graft-versus-host disease after experimental bone marrow transplantation
A. Bouazzaoui, E. Huber, A. Dan, F. A. Al-Allaf, W. Herr, G. Sprotte, J. Köstler, A. Hiergeist, A. Gessner, R. Andreesen, E. Holler
P61 Compound heterozygous mutation in the LDLR gene in Saudi patients suffering severe hypercholesterolemia
F. Al-Allaf, A. Alashwal, Z. Abduljaleel, M. Taher, A. Bouazzaoui, H. Abalkhail, A. Al-Allaf, R. Bamardadh, M. Athar
doi:10.1186/s40246-016-0063-5
PMCID: PMC4896275  PMID: 27294413
3.  The Power of Gene-Based Rare Variant Methods to Detect Disease-Associated Variation and Test Hypotheses About Complex Disease 
PLoS Genetics  2015;11(4):e1005165.
Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α=2.5×10-6) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.
Author Summary
Re-sequencing technologies allow for a more complete interrogation of the role of human variation in complex disease. The inadequate power of single variant methods to assess the role of less common variation has led to the development of numerous statistical methods for testing aggregate groups of variants for association with disease. Such endeavors pose substantial analytical challenges, however, due to the diverse array of genetic hypotheses that need to be considered. In this work, we systematically quantify and compare the performance of a panel of commonly used gene-based association methods under a range of allelic architectures, significance thresholds, locus effect sizes, sample sizes, and filters for neutral variation. We find that MiST, SKAT-O, and KBAC have the highest mean power across simulated datasets. Across all methods, however, the power to detect even loci of relatively large effect is very low at exome-wide significance thresholds for sample sizes comparable with those of ongoing sequencing studies; as such, the absence of signal in studies of a few thousand individuals does not exclude a role for rare variation in complex traits. Finally, we directly compare the results reported by different gene-based methods in order to identify their comparative advantages and disadvantages under distinct locus architectures. Our findings have implications for meaningful interpretation of both positive and negative findings in ongoing and future sequencing studies.
doi:10.1371/journal.pgen.1005165
PMCID: PMC4407972  PMID: 25906071
4.  Methods for Collapsing Multiple Rare Variants in Whole-Genome Sequence Data 
Genetic epidemiology  2014;38(0 1):S13-S20.
Genetic Analysis Workshop 18 provided whole-genome sequence data in a pedigree-based sample and longitudinal phenotype data for hypertension and related traits, presenting an excellent opportunity for evaluating analysis choices. We summarize the nine contributions to the working group on collapsing methods, which evaluated various approaches for the analysis of multiple rare variants. One contributor defined a variant prioritization scheme, whereas the remaining eight contributors evaluated statistical methods for association analysis. Six contributors chose the gene as the genomic region for collapsing variants, whereas three contributors chose nonoverlapping sliding windows across the entire genome. Statistical methods spanned most of the published methods, including well-established burden tests, variance-components-type tests, and recently developed hybrid approaches. Lesser known methods, such as functional principal components analysis, higher criticism, and homozygosity association, and some newly introduced methods were also used. We found that performance of these methods depended on the characteristics of the genomic region, such as effect size and direction of variants under consideration. Except for MAP4 and FLT3, the performance of all statistical methods to identify rare casual variants was disappointingly poor, providing overall power almost identical to the type I error. This poor performance may have arisen from a combination of (1) small sample size, (2) small effects of most of the causal variants, explaining a small fraction of variance, (3) use of incomplete annotation information, and (4) linkage disequilibrium between causal variants in a gene and noncausal variants in nearby genes. Our findings demonstrate challenges in analyzing rare variants identified from sequence data.
doi:10.1002/gepi.21820
PMCID: PMC4558905  PMID: 25112183
Genetic Analysis Workshop 18; rare variants; whole-genome sequence; burden tests; nonburden tests
5.  Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level 
PLoS Computational Biology  2016;12(6):e1004993.
Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.
Author Summary
Next-generation sequencing technologies have allowed genetic association studies of complex traits at the single base-pair resolution, where most genetic variants have extremely low mutation frequencies. These rare variants have been the focus of modern statistical-computational genomics due to their potential to explain missing disease heritability. The identification of individual rare variants associated with diseases can provide new biological insights and enable the precise delineation of disease mechanisms. However, due to the extreme rarity of mutations and large numbers of variants, significances of causative variants tend to be mixed inseparably with a few noncausative ones, and standard multiple testing procedures controlling for false positives fail to provide a meaningful way to include a large proportion of the causative variants. To address the challenge of detecting weak biological signals, we propose a novel statistical procedure, based on false-negative control, to provide a practical approach for variant inclusion in large-scale sequencing studies. By determining those variants that can be confidently dispatched as noncausative, the proposed procedure offers an objective selection of a modest number of potentially causative variants at the single-locus level. Results can be further prioritized or used to infer disease-associated genes with annotation information.
doi:10.1371/journal.pcbi.1004993
PMCID: PMC4927097  PMID: 27355347
6.  Utilizing mutual information for detecting rare and common variants associated with a categorical trait 
PeerJ  2016;4:e2139.
Background. Genome-wide association studies have succeeded in detecting novel common variants which associate with complex diseases. As a result of the fast changes in next generation sequencing technology, a large number of sequencing data are generated, which offers great opportunities to identify rare variants that could explain a larger proportion of missing heritability. Many effective and powerful methods are proposed, although they are usually limited to continuous, dichotomous or ordinal traits. Notice that traits having nominal categorical features are commonly observed in complex diseases, especially in mental disorders, which motivates the incorporation of the characteristics of the categorical trait into association studies with rare and common variants.
Methods. We construct two simple and intuitive nonparametric tests, MIT and aMIT, based on mutual information for detecting association between genetic variants in a gene or region and a categorical trait. MIT and aMIT can gauge the difference among the distributions of rare and common variants across a region given every categorical trait value. If there is little association between variants and a categorical trait, MIT or aMIT approximately equals zero. The larger the difference in distributions, the greater values MIT and aMIT have. Therefore, MIT and aMIT have the potential for detecting functional variants.
Results.We checked the validity of proposed statistics and compared them to the existing ones through extensive simulation studies with varied combinations of the numbers of variants of rare causal, rare non-causal, common causal, and common non-causal, deleterious and protective, various minor allele frequencies and different levels of linkage disequilibrium. The results show our methods have higher statistical power than conventional ones, including the likelihood based score test, in most cases: (1) there are multiple genetic variants in a gene or region; (2) both protective and deleterious variants are present; (3) there exist rare and common variants; and (4) more than half of the variants are neutral. The proposed tests are applied to the data from Collaborative Studies on Genetics of Alcoholism, and a competent performance is exhibited therein.
Discussion. As a complementary to the existing methods mainly focusing on quantitative traits, this study provides the nonparametric tests MIT and aMIT for detecting variants associated with categorical trait. Furthermore, we plan to investigate the association between rare variants and multiple categorical traits.
doi:10.7717/peerj.2139
PMCID: PMC4918222  PMID: 27350900
Association analysis; Categorical trait; Next generation sequencing data; Mutual information; Rare variant
7.  Two-stage study designs combining genome-wide association studies, tag single-nucleotide polymorphisms, and exome sequencing: accuracy of genetic effect estimates 
BMC Proceedings  2011;5(Suppl 9):S64.
Genome-wide association studies (GWAS) test for disease-trait associations and estimate effect sizes at tag single-nucleotide polymorphisms (SNPs), which imperfectly capture variation at causal SNPs. Sequencing studies can examine potential causal SNPs directly; however, sequencing the whole genome or exome can be prohibitively expensive. Costs can be limited by using a GWAS to detect the associated region(s) at tag SNPs followed by targeted sequencing to identify and estimate the effect size of the causal variant. Genetic effect estimates obtained from association studies can be inflated because of a form of selection bias known as the winner’s curse. Conversely, estimates at tag SNPs can be attenuated compared to the causal SNP because of incomplete linkage disequilibrium. These two effects oppose each other. Analysis of rare SNPs further complicates our understanding of the winner’s curse because rare SNPs are difficult to tag and analysis can involve collapsing over multiple rare variants. In two-stage analysis of Genetic Analysis Workshop 17 simulated data sets, we find that selection at the tag SNP produces upward bias in the estimate of effect at the causal SNP, even when the tag and causal SNPs are not well correlated. The bias similarly carries through to effect estimates for rare variant summary measures. Replication studies designed with sample sizes computed using biased estimates will be under-powered to detect a disease-causing variant. Accounting for bias in the original study is critical to avoid discarding disease-associated SNPs at follow up.
doi:10.1186/1753-6561-5-S9-S64
PMCID: PMC3287903  PMID: 22373407
8.  Adaptive Combination of P-Values for Family-Based Association Testing with Sequence Data 
PLoS ONE  2014;9(12):e115971.
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an ‘adaptive combination of P-values method’ (abbreviated as ‘ADA’). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
doi:10.1371/journal.pone.0115971
PMCID: PMC4277421  PMID: 25541952
9.  The human splicing code reveals new insights into the genetic determinants of disease 
Science (New York, N.Y.)  2014;347(6218):1254806.
Introduction
Advancing whole-genome precision medicine requires understanding how gene expression is altered by genetic variants, especially those that are outside of protein-coding regions. We developed a computational technique that scores how strongly genetic variants alter RNA splicing, a critical step in gene expression whose disruption contributes to many diseases, including cancers and neurological disorders. A genome-wide analysis reveals tens of thousands of variants that alter splicing and are enriched with a wide range of known diseases. Our results provide insight into the genetic basis of spinal muscular atrophy, hereditary nonpolyposis colorectal cancer and autism spectrum disorder.
Methods
We used machine learning to derive a computational model that takes as input DNA sequences and applies general rules to predict splicing in human tissues. Given a test variant, our model computes a score that predicts how much the variant disrupts splicing. The model was derived in such a way that it can be used to study diverse diseases and disorders, and to determine the consequences of common, rare, and even spontaneous variants.
Results
Our technique is able to accurately classify disease-causing variants and provides insights into the role of aberrant splicing in disease. We scored over 650,000 DNA variants and found that disease-causing variants have higher scores than common variants and even those associated with disease in genome-wide association studies. Our model predicts substantial and unexpected aberrant splicing due to variants within introns and exons, including those far from the splice site. For example, among intronic variants that are more than 30 nucleotides away from a splice site, known disease variants alter splicing nine times more often than common variants; among missense exonic disease variants, those that least impact protein function are over five times more likely to alter splicing than other variants.
Autism has been associated with disrupted splicing in brain regions, so we used our method to score variants detected using whole genome sequencing data from individuals with and without autism. Genes with high scoring variants include many that have been previously linked with autism, as well as new genes with known neurodevelopmental phenotypes. Most of the high scoring variants are intronic and cannot be detected by exome analysis techniques.
When we score clinical variants in spinal muscular atrophy and colorectal cancer genes, up to 94% of variants found to disrupt splicing using minigene reporters are correctly classified.
Discussion
In the context of precision medicine, causal support for variants that is independent of existing studies is greatly needed. Our computational model was trained to predict splicing from DNA sequence alone, without using disease annotations or population data. Consequently, its predictions are independent of and complementary to population data, genome-wide association studies (GWAS), expression-based quantitative trait loci (QTL), and functional annotations of the genome. As such, our technique greatly expands the opportunities for understanding the genetic determinants of disease.
doi:10.1126/science.1254806
PMCID: PMC4362528  PMID: 25525159
10.  A new statistical framework for genetic pleiotropic analysis of high dimensional phenotype data 
BMC Genomics  2016;17:881.
Background
The widely used genetic pleiotropic analyses of multiple phenotypes are often designed for examining the relationship between common variants and a few phenotypes. They are not suited for both high dimensional phenotypes and high dimensional genotype (next-generation sequencing) data.
To overcome limitations of the traditional genetic pleiotropic analysis of multiple phenotypes, we develop sparse structural equation models (SEMs) as a general framework for a new paradigm of genetic analysis of multiple phenotypes. To incorporate both common and rare variants into the analysis, we extend the traditional multivariate SEMs to sparse functional SEMs. To deal with high dimensional phenotype and genotype data, we employ functional data analysis and the alternative direction methods of multiplier (ADMM) techniques to reduce data dimension and improve computational efficiency.
Results
Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods. Simulations also demonstrate that the gene-based pleiotropic analysis has higher power than the single variant-based pleiotropic analysis. The proposed method is applied to exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) with 11 phenotypes, which identifies a network with 137 genes connected to 11 phenotypes and 341 edges. Among them, 114 genes showed pleiotropic genetic effects and 45 genes were reported to be associated with phenotypes in the analysis or other cardiovascular disease (CVD) related phenotypes in the literature.
Conclusions
Our proposed sparse functional SEMs can incorporate both common and rare variants into the analysis and the ADMM algorithm can efficiently solve the penalized SEMs. Using this model we can jointly infer genetic architecture and casual phenotype network structure, and decompose the genetic effect into direct, indirect and total effect. Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-3169-1) contains supplementary material, which is available to authorized users.
doi:10.1186/s12864-016-3169-1
PMCID: PMC5100198  PMID: 27821073
Structural equations; Causal inference; Multiple phenotypes; Quantitative trait; Next-generation sequencing; Pleiotropic analysis
11.  The Role and Challenges of Exome Sequencing in Studies of Human Diseases 
Frontiers in Genetics  2013;4:160.
Recent advances in next-generation sequencing technologies have transformed the genetics study of human diseases; this is an era of unprecedented productivity. Exome sequencing, the targeted sequencing of the protein-coding portion of the human genome, has been shown to be a powerful and cost-effective method for detection of disease variants underlying Mendelian disorders. Increasing effort has been made in the interest of the identification of rare variants associated with complex traits in sequencing studies. Here we provided an overview of the application fields for exome sequencing in human diseases. We describe a general framework of computation and bioinformatics for handling sequencing data. We then demonstrate data quality and agreement between exome sequencing and exome microarray (chip) genotypes using data collected on the same set of subjects in a genetic study of panic disorder. Our results show that, in sequencing data, the data quality was generally higher for variants within the exonic target regions, compared to that outside the target regions, due to the target enrichment. We also compared genotype concordance for variant calls obtained by exome sequencing vs. exome genotyping microarrays. The overall consistency rate was >99.83% and the heterozygous consistency rate was >97.55%. The two platforms share a large amount of agreement over low frequency variants in the exonic regions, while exome sequencing provides much more information on variants not included on exome genotyping microarrays. The results demonstrate that exome sequencing data are of high quality and can be used to investigate the role of rare coding variants in human diseases.
doi:10.3389/fgene.2013.00160
PMCID: PMC3752524  PMID: 24032039
exome sequencing; exome arrays; Mendelian diseases; complex traits; whole-genome sequencing
12.  Incorporating Linkage Information into a Common Disease/Rare Variant Framework 
Genetic epidemiology  2011;35(0 1):S74-S79.
Recent developments in sequencing technology have allowed the investigation of the common disease/rare variant hypothesis. In the Genetic Analysis Workshop 17 data set, we have sequence data on both unrelated individuals and eight large extended pedigrees with simulated quantitative and qualitative phenotypes. Group 11, whose focus was incorporating linkage information, considered several different ways to use the extended pedigrees to identify causal genes and variants. The first issue was the use of standard linkage or identity-by-descent information to identify regions containing causal rare variants. We found that rare variants of large effect segregating through pedigrees were precisely the bailiwick of linkage analysis. For a common disease, we anticipate many risk loci, so a heterogeneity linkage analysis or an analysis of a single pedigree at a time may be useful. The second issue was using pedigree data to identify individuals for sequencing. If one can identify linked regions and even carriers of risk haplotypes, the sequencing will be substantially more efficient. In fact, sequencing only 2.5% of the genome in carefully selected individuals can detect 52% of the risk variants that would be detected through whole-exome sequencing in a large number of unrelated individuals. Finally, we found that linkage information from pedigrees can provide weights for case-control association tests. We also found that pedigree-based association tests have the same issues of binning variants and variant counting as those in tests of unrelated individuals. Clearly, when pedigrees are available, they can provide great assistance in the search for rare variants that influence common disorders.
doi:10.1002/gepi.20654
PMCID: PMC4558895  PMID: 22128063
linkage analysis; sequencing; LOD; heterogeneity LOD (HLOD); association tests
13.  Gene-based multiple trait analysis for exome sequencing data 
BMC Proceedings  2011;5(Suppl 9):S75.
The common genetic variants identified through genome-wide association studies explain only a small proportion of the genetic risk for complex diseases. The advancement of next-generation sequencing technologies has enabled the detection of rare variants that are expected to contribute significantly to the missing heritability. Some genetic association studies provide multiple correlated traits for analysis. Multiple trait analysis has the potential to improve the power to detect pleiotropic genetic variants that influence multiple traits. We propose a gene-level association test for multiple traits that accounts for correlation among the traits. Gene- or region-level testing for association involves both common and rare variants. Statistical tests for common variants may have limited power for individual rare variants because of their low frequency and multiple testing issues. To address these concerns, we use the weighted-sum pooling method to test the joint association of multiple rare and common variants within a gene. The proposed method is applied to the Genetic Association Workshop 17 (GAW17) simulated mini-exome data to analyze multiple traits. Because of the nature of the GAW17 simulation model, increased power was not observed for multiple-trait analysis compared to single-trait analysis. However, multiple-trait analysis did not result in a substantial loss of power because of the testing of multiple traits. We conclude that this method would be useful for identifying pleiotropic genes.
doi:10.1186/1753-6561-5-S9-S75
PMCID: PMC3287915  PMID: 22373189
14.  Assessing the Power of Exome Chips 
PLoS ONE  2015;10(10):e0139642.
Genotyping chips for rare and low-frequent variants have recently gained popularity with the introduction of exome chips, but the utility of these chips remains unclear. These chips were designed using exome sequencing data from mainly American-European individuals, enriched for a narrow set of common diseases. In addition, it is well-known that the statistical power of detecting associations with rare and low-frequent variants is much lower compared to studies exclusively involving common variants. We developed a simulation program adaptable to any exome chip design to empirically evaluate the power of the exome chips. We implemented the main properties of the Illumina HumanExome BeadChip array. The simulated data sets were used to assess the power of exome chip based studies for varying effect sizes and causal variant scenarios. We applied two widely-used statistical approaches for rare and low-frequency variants, which collapse the variants into genetic regions or genes. Under optimal conditions, we found that a sample size between 20,000 to 30,000 individuals were needed in order to detect modest effect sizes (0.5% < PAR > 1%) with 80% power. For small effect sizes (PAR <0.5%), 60,000–100,000 individuals were needed in the presence of non-causal variants. In conclusion, we found that at least tens of thousands of individuals are necessary to detect modest effects under optimal conditions. In addition, when using rare variant chips on cohorts or diseases they were not originally designed for, the identification of associated variants or genes will be even more challenging.
doi:10.1371/journal.pone.0139642
PMCID: PMC4593624  PMID: 26437075
15.  Identification of Grouped Rare and Common Variants via Penalized Logistic Regression 
Genetic Epidemiology  2013;37(6):592-602.
In spite of the success of genome-wide association studies in finding many common variants associated with disease, these variants seem to explain only a small proportion of the estimated heritability. Data collection has turned toward exome and whole genome sequencing, but it is well known that single marker methods frequently used for common variants have low power to detect rare variants associated with disease, even with very large sample sizes. In response, a variety of methods have been developed that attempt to cluster rare variants so that they may gather strength from one another under the premise that there may be multiple causal variants within a gene. Most of these methods group variants by gene or proximity, and test one gene or marker window at a time. We propose a penalized regression method (PeRC) that analyzes all genes at once, allowing grouping of all (rare and common) variants within a gene, along with subgrouping of the rare variants, thus borrowing strength from both rare and common variants within the same gene. The method can incorporate either a burden-based weighting of the rare variants or one in which the weights are data driven. In simulations, our method performs favorably when compared to many previously proposed approaches, including its predecessor, the sparse group lasso [Friedman et al., 2010].
doi:10.1002/gepi.21746
PMCID: PMC3842118  PMID: 23836590
penalized likelihood; lasso; elastic net; association analysis; rare variants
16.  GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm 
PLoS Genetics  2013;9(8):e1003657.
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n>100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.
Author Summary
Nowadays, the availability of cheaper and accurate assays to quantify multiple (endo)phenotypes in large population cohorts allows multi-trait studies. However, these studies are limited by the lack of flexible models integrated with efficient computational tools for genome-wide multi SNPs-traits analyses. To overcome this problem, we propose a novel Bayesian analysis strategy and a new algorithmic implementation which exploits parallel processing architecture for fully multivariate modeling of groups of correlated phenotypes at the genome-wide scale. In addition to increased power of our algorithm over alternative Bayesian and well-established non-Bayesian multi-phenotype methods, we provide an application to a real case study of several blood lipid traits, and show how our method recovered most of the major associations and is better at refining multi-trait polygenic associations than alternative methods. We reveal and replicate in independent cohorts new associations with two phenotypic groups that were not detected by competing multivariate approaches and not noticed by a large meta-GWAS. We also discuss the applicability of the proposed method to large meta-analyses involving hundreds of thousands of individuals and to diverse genomic datasets where complex dependencies in the predictor space are present.
doi:10.1371/journal.pgen.1003657
PMCID: PMC3738451  PMID: 23950726
17.  Exome Sequencing Reveals Novel Rare Variants in the Ryanodine Receptor and Calcium Channel Genes in Malignant Hyperthermia Families 
Anesthesiology  2013;119(5):1054-1065.
Background
About half of malignant hyperthermia (MH) cases are associated with skeletal muscle ryanodine receptor 1 (RYR1) and calcium channel, voltage-dependent, L type, α1S subunit (CACNA1S) gene mutations, leaving many with an unknown cause. We chose to apply a sequencing approach to uncover causal variants in unknown cases. Sequencing the exome, the protein-coding region of the genome, has power at low sample sizes and identified the cause of over a dozen Mendelian disorders.
Methods
We considered four families with multiple MH cases but in whom no mutations in RYR1 and CACNA1S had been identified by Sanger sequencing of complementary DNA. Exome sequencing of two affecteds per family, chosen for maximum genetic distance, were compared. Variants were ranked by allele frequency, protein change, and measures of conservation among mammals to assess likelihood of causation. Finally, putative pathogenic mutations were genotyped in other family members to verify cosegregation with MH.
Results
Exome sequencing revealed 1 rare RYR1 nonsynonymous variant in each of 3 families (Asp1056His, Val2627Met, Val4234Leu), and 1 CACNA1S variant (Thr1009Lys) in a 4th family. These were not seen in variant databases or in our control population sample of 5379 exomes. Follow-up sequencing in other family members verified cosegregation of alleles with MH.
Conclusions
Using both exome sequencing and allele frequency data from large sequencing efforts may aid genetic diagnosis of MH. In our sample, it was more sensitive for variant detection in known genes than Sanger sequencing of complementary DNA, and allows for the possibility of novel gene discovery.
doi:10.1097/ALN.0b013e3182a8a998
PMCID: PMC4115638  PMID: 24013571
18.  Combining effects from rare and common genetic variants in an exome-wide association study of sequence data 
BMC Proceedings  2011;5(Suppl 9):S44.
Recent breakthroughs in next-generation sequencing technologies allow cost-effective methods for measuring a growing list of cellular properties, including DNA sequence and structural variation. Next-generation sequencing has the potential to revolutionize complex trait genetics by directly measuring common and rare genetic variants within a genome-wide context. Because for a given gene both rare and common causal variants can coexist and have independent effects on a trait, strategies that model the effects of both common and rare variants could enhance the power of identifying disease-associated genes. To date, little work has been done on integrating signals from common and rare variants into powerful statistics for finding disease genes in genome-wide association studies. In this analysis of the Genetic Analysis Workshop 17 data, we evaluate various strategies for association of rare, common, or a combination of both rare and common variants on quantitative phenotypes in unrelated individuals. We show that the analysis of common variants only using classical approaches can achieve higher power to detect causal genes than recently proposed rare variant methods and that strategies that combine association signals derived independently in rare and common variants can slightly increase the power compared to strategies that focus on the effect of either the rare variants or the common variants.
doi:10.1186/1753-6561-5-S9-S44
PMCID: PMC3287881  PMID: 22373328
19.  Quality Control Issues and the Identification of Rare Functional Variants with Next-Generation Sequencing Data 
Genetic Epidemiology  2011;35(Suppl 1):S22-S28.
Next-generation sequencing of large numbers of individuals presents challenges in data preparation, quality control, and statistical analysis because of the rarity of the variants. The Genetic Analysis Workshop 17 (GAW17) data provide an opportunity to survey existing methods and compare these methods with novel ones. Specifically, the GAW17 Group 2 contributors investigate existing and newly proposed methods and study design strategies to identify rare variants, predict functional variants, and/or examine quality control. We introduce the eight Group 2 papers, summarize their approaches, and discuss their strengths and weaknesses. For these investigations, some groups used only the genotype data, whereas others also used the simulated phenotype data. Although the eight Group 2 contributions covered a wide variety of topics under the general idea of identifying rare variants, they can be grouped into three broad categories according to their common research interests: functionality of variants and quality control issues, family-based analyses, and association analyses of unrelated individuals. The aims of the first subgroup were quite different. These were population structure analyses that used rare variants to predict functionality and examine the accuracy of genotype calls. The aims of the family-based analyses were to select which families should be sequenced and to identify high-risk pedigrees; the aim of the association analyses was to identify variants or genes with regression-based methods. However, power to detect associations was low in all three association studies. Thus this work shows opportunities for incorporating rare variants into the genetic and statistical analyses of common diseases.
doi:10.1002/gepi.20645
PMCID: PMC3268158  PMID: 22128054
1000 Genomes Project; association; collection of rare variants; family data; next-generation sequencing; regression; quality control
20.  Characterisation and Validation of Insertions and Deletions in 173 Patient Exomes 
PLoS ONE  2012;7(12):e51292.
Recent advances in genomics technologies have spurred unprecedented efforts in genome and exome re-sequencing aiming to unravel the genetic component of rare and complex disorders. While in rare disorders this allowed the identification of novel causal genes, the missing heritability paradox in complex diseases remains so far elusive. Despite rapid advances of next-generation sequencing, both the technology and the analysis of the data it produces are in its infancy. At present there is abundant knowledge pertaining to the role of rare single nucleotide variants (SNVs) in rare disorders and of common SNVs in common disorders. Although the 1,000 genome project has clearly highlighted the prevalence of rare variants and more complex variants (e.g. insertions, deletions), their role in disease is as yet far from elucidated.
We set out to analyse the properties of sequence variants identified in a comprehensive collection of exome re-sequencing studies performed on samples from patients affected by a broad range of complex and rare diseases (N = 173). Given the known potential for Loss of Function (LoF) variants to be false positive, we performed an extensive validation of the common, rare and private LoF variants identified, which indicated that most of the private and rare variants identified were indeed true, while common novel variants had a significantly higher false positive rate. Our results indicated a strong enrichment of very low-frequency insertion/deletion variants, so far under-investigated, which might be difficult to capture with low coverage and imputation approaches and for which most of study designs would be under-powered. These insertions and deletions might play a significant role in disease genetics, contributing specifically to the underlining rare and private variation predicted to be discovered through next generation sequencing.
doi:10.1371/journal.pone.0051292
PMCID: PMC3522676  PMID: 23251486
21.  Exome Sequencing of Phenotypic Extremes Identifies CAV2 and TMC6 as Interacting Modifiers of Chronic Pseudomonas aeruginosa Infection in Cystic Fibrosis 
PLoS Genetics  2015;11(6):e1005273.
Discovery of rare or low frequency variants in exome or genome data that are associated with complex traits often will require use of very large sample sizes to achieve adequate statistical power. For a fixed sample size, sequencing of individuals sampled from the tails of a phenotype distribution (i.e., extreme phenotypes design) maximizes power and this approach was recently validated empirically with the discovery of variants in DCTN4 that influence the natural history of P. aeruginosa airway infection in persons with cystic fibrosis (CF; MIM219700). The increasing availability of large exome/genome sequence datasets that serve as proxies for population-based controls affords the opportunity to test an alternative, potentially more powerful and generalizable strategy, in which the frequency of rare variants in a single extreme phenotypic group is compared to a control group (i.e., extreme phenotype vs. control population design). As proof-of-principle, we applied this approach to search for variants associated with risk for age-of-onset of chronic P. aeruginosa airway infection among individuals with CF and identified variants in CAV2 and TMC6 that were significantly associated with group status. These results were validated using a large, prospective, longitudinal CF cohort and confirmed a significant association of a variant in CAV2 with increased age-of-onset of P. aeruginosa airway infection (hazard ratio = 0.48, 95% CI=[0.32, 0.88]) and variants in TMC6 with diminished age-of-onset of P. aeruginosa airway infection (HR = 5.4, 95% CI=[2.2, 13.5]) A strong interaction between CAV2 and TMC6 variants was observed (HR=12.1, 95% CI=[3.8, 39]) for children with the deleterious TMC6 variant and without the CAV2 protective variant. Neither gene showed a significant association using an extreme phenotypes design, and conditions for which the power of an extreme phenotype vs. control population design was greater than that for the extreme phenotypes design were explored.
Author Summary
Whole exome and whole genome sequencing provide the opportunity to test for associations between expressed traits and genetic variants that cannot be tested with chip technology, particularly variants that are too rare to be included on chips designed for genome-wide association analysis. We used exome sequencing to identify variants in CAV2 and TMC6 that modify the age-of-onset of chronic Pseudomonas aeruginosa infection among children with cystic fibrosis, and validated our findings in a large cohort of children with cystic fibrosis. For a fixed number of study participants, it is known that the extreme phenotypes design provides greater statistical power than a random sampling design. In the extreme phenotypes design, one compares the frequency of a given set of genetic variants in one extreme of age-of-onset (early onset) to that in the other extreme (late onset). Here, we employed an alternative design that compares genetic frequencies in exomes sampled from one extreme to that among exomes from a large set of controls. We show that this design confers substantially greater statistical power for discovery of CAV2 and TMC6 and provide general conditions under which this single extreme versus control design is more powerful than the extreme phenotypes design.
doi:10.1371/journal.pgen.1005273
PMCID: PMC4457883  PMID: 26047157
22.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions 
PLoS Genetics  2010;6(10):e1001156.
There is solid evidence that rare variants contribute to complex disease etiology. Next-generation sequencing technologies make it possible to uncover rare variants within candidate genes, exomes, and genomes. Working in a novel framework, the kernel-based adaptive cluster (KBAC) was developed to perform powerful gene/locus based rare variant association testing. The KBAC combines variant classification and association testing in a coherent framework. Covariates can also be incorporated in the analysis to control for potential confounders including age, sex, and population substructure. To evaluate the power of KBAC: 1) variant data was simulated using rigorous population genetic models for both Europeans and Africans, with parameters estimated from sequence data, and 2) phenotypes were generated using models motivated by complex diseases including breast cancer and Hirschsprung's disease. It is demonstrated that the KBAC has superior power compared to other rare variant analysis methods, such as the combined multivariate and collapsing and weight sum statistic. In the presence of variant misclassification and gene interaction, association testing using KBAC is particularly advantageous. The KBAC method was also applied to test for associations, using sequence data from the Dallas Heart Study, between energy metabolism traits and rare variants in ANGPTL 3,4,5 and 6 genes. A number of novel associations were identified, including the associations of high density lipoprotein and very low density lipoprotein with ANGPTL4. The KBAC method is implemented in a user-friendly R package.
Author Summary
It has been demonstrated that both rare and common variants are involved in complex disease etiology. Until recently it was only possible to perform large scale analysis of common variants. With the development of next-generation sequencing technologies, detection and mapping of rare variants have been made possible. However, methods used to analyze common variants are not powerful for the analysis of rare variants. To address the problems of rare variant analysis working in a novel framework, the kernel-based adaptive cluster (KBAC) method was developed to perform gene/locus based analysis. The KBAC combines variant classification and association testing in a coherent framework. Through simulations motivated by population genetic and disease data, it is demonstrated that the KBAC has superior power to other rare variant analysis methods, especially in the presence of variant misclassification and gene interaction. Using data from the Dallas Heart Study, the KBAC method was applied to test for associations between energy metabolism traits and rare variants in ANGPTL 3,4,5 and 6 genes. A number of novel associations were identified. The KBAC method is implemented in a user-friendly R package.
doi:10.1371/journal.pgen.1001156
PMCID: PMC2954824  PMID: 20976247
23.  A Hybrid Likelihood Model for Sequence-Based Disease Association Studies 
PLoS Genetics  2013;9(1):e1003224.
In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing.
Author Summary
Inexpensive, high-throughput sequencing has transformed the field of case-control association studies. For the first time, it may be possible to identify the genetic underpinnings of complex diseases, by sequencing the DNA of hundreds (even thousands) of cases and controls and comparing patterns of DNA sequence variation. However, complex diseases are likely to be caused by many variants, some of which are very rare. Taken one at a time, the association between variant and disease phenotype may not be detectable by current statistical methods. One strategy is to identify regions where important variants occur by “collapsing” variants into groups. Here, we present a new collapsing approach, capable of detecting subtle genetic differences between cases and controls. We show, in extensive simulations and using a benchmark set of genes involved in human triglyceride levels, that the approach is potentially more powerful than existing methods. We apply the new method to an ongoing sequencing study of bipolar cases and controls and identify a set of genes found in neuronal synapses, which may be implicated in bipolar disorder.
doi:10.1371/journal.pgen.1003224
PMCID: PMC3554549  PMID: 23358228
24.  Identification of multiple rare variants associated with a disease 
BMC Proceedings  2011;5(Suppl 9):S103.
Identifying rare variants that are responsible for complex disease has been promoted by advances in sequencing technologies. However, statistical methods that can handle the vast amount of data generated and that can interpret the complicated relationship between disease and these variants have lagged. We apply a zero-inflated Poisson regression model to take into account the excess of zeros caused by the extremely low frequency of the 24,487 exonic variants in the Genetic Analysis Workshop 17 data. We grouped the 697 subjects in the data set as Europeans, Asians, and Africans based on principal components analysis and found the total number of rare variants per gene for each individual. We then analyzed these collapsed variants based on the assumption that rare variants are enriched in a group of people affected by a disease compared to a group of unaffected people. We also tested the hypothesis with quantitative traits Q1, Q2, and Q4. Analyses performed on the combined 697 individuals and on each ethnic group yielded different results. For the combined population analysis, we found that UGT1A1, which was not part of the simulation model, was associated with disease liability and that FLT1, which was a causal locus in the simulation model, was associated with Q1. Of the causal loci in the simulation models, FLT1 and KDR were associated with Q1 and VNN1 was correlated with Q2. No significant genes were associated with Q4. These results show the feasibility and capability of our new statistical model to detect multiple rare variants influencing disease risk.
doi:10.1186/1753-6561-5-S9-S103
PMCID: PMC3287826  PMID: 22373445
25.  Bayesian Detection of Causal Rare Variants under Posterior Consistency 
PLoS ONE  2013;8(7):e69633.
Identification of causal rare variants that are associated with complex traits poses a central challenge on genome-wide association studies. However, most current research focuses only on testing the global association whether the rare variants in a given genomic region are collectively associated with the trait. Although some recent work, e.g., the Bayesian risk index method, have tried to address this problem, it is unclear whether the causal rare variants can be consistently identified by them in the small--large- situation. We develop a new Bayesian method, the so-called Bayesian Rare Variant Detector (BRVD), to tackle this problem. The new method simultaneously addresses two issues: (i) (Global association test) Are there any of the variants associated with the disease, and (ii) (Causal variant detection) Which variants, if any, are driving the association. The BRVD ensures the causal rare variants to be consistently identified in the small--large- situation by imposing some appropriate prior distributions on the model and model specific parameters. The numerical results indicate that the BRVD is more powerful for testing the global association than the existing methods, such as the combined multivariate and collapsing test, weighted sum statistic test, RARECOVER, sequence kernel association test, and Bayesian risk index, and also more powerful for identification of causal rare variants than the Bayesian risk index method. The BRVD has also been successfully applied to the Early-Onset Myocardial Infarction (EOMI) Exome Sequence Data. It identified a few causal rare variants that have been verified in the literature.
doi:10.1371/journal.pone.0069633
PMCID: PMC3724943  PMID: 23922764

Results 1-25 (1555418)