PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1310071)

Clipboard (0)
None

Related Articles

1.  Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly 
PLoS Computational Biology  2014;10(5):e1003628.
The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.
Author Summary
At least 5–10% of the human genome remains unassembled, unmapped, and poorly characterized. The reference assembly annotates these missing regions as multi-megabase heterochromatic gaps, found primarily near centromeres and on the short arms of the acrocentric chromosomes. This missing fraction of the genome consists predominantly of long arrays of near-identical tandem repeats called satellite DNA. Due to the repetitive nature of satellite DNA, sequence assembly algorithms cannot uniquely align overlapping sequence reads, and thus satellite-rich domains have been omitted from the reference assembly and from most genome-wide studies of variation and function. Existing methods for analyzing some satellite DNAs cannot be easily extended to a large portion of satellites whose repeat structures are complex and largely uncharacterized, such as Human Satellites 2 and 3 (HSat2,3). Here we characterize HSat2,3 using a novel approach that does not depend on having a well-defined repeat structure. By classifying genome-wide HSat2,3 sequences into subfamilies and localizing them to chromosomes, we have generated an initial HSat2,3 genomic reference, which serves as a critical foundation for future studies of variation and function in these regions. This approach should be generally applicable to other classes of satellite DNA, in both the human genome and other complex genomes.
doi:10.1371/journal.pcbi.1003628
PMCID: PMC4022460  PMID: 24831296
2.  Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing 
BMC Genomics  2007;8:330.
Background
Soybean, Glycine max (L.) Merr., is a well documented paleopolyploid. What remains relatively under characterized is the level of sequence identity in retained homeologous regions of the genome. Recently, the Department of Energy Joint Genome Institute and United States Department of Agriculture jointly announced the sequencing of the soybean genome. One of the initial concerns is to what extent sequence identity in homeologous regions would have on whole genome shotgun sequence assembly.
Results
Seventeen BACs representing ~2.03 Mb were sequenced as representative potential homeologous regions from the soybean genome. Genetic mapping of each BAC shows that 11 of the 20 chromosomes are represented. Sequence comparisons between homeologous BACs shows that the soybean genome is a mosaic of retained paleopolyploid regions. Some regions appear to be highly conserved while other regions have diverged significantly. Large-scale "batch" reassembly of all 17 BACs combined showed that even the most homeologous BACs with upwards of 95% sequence identity resolve into their respective homeologous sequences. Potential assembly errors were generated by tandemly duplicated pentatricopeptide repeat containing genes and long simple sequence repeats. Analysis of a whole-genome shotgun assembly of 80,000 randomly chosen JGI-DOE sequence traces reveals some new soybean-specific repeat sequences.
Conclusion
This analysis investigated both the structure of the paleopolyploid soybean genome and the potential effects retained homeology will have on assembling the whole genome shotgun sequence. Based upon these results, homeologous regions similar to those characterized here will not cause major assembly issues.
doi:10.1186/1471-2164-8-330
PMCID: PMC2077340  PMID: 17880721
3.  Intergenic Locations of Rice Centromeric Chromatin 
PLoS Biology  2008;6(11):e286.
Centromeres are sites for assembly of the chromosomal structures that mediate faithful segregation at mitosis and meiosis. Plant and animal centromeres are typically located in megabase-sized arrays of tandem satellite repeats, making their precise mapping difficult. However, some rice centromeres are largely embedded in nonsatellite DNA, providing an excellent model to study centromere structure and evolution. We used chromatin immunoprecipitation and 454 sequencing to define the boundaries of nine of the 12 centromeres of rice. Centromere regions from chromosomes 8 and 9 were found to share synteny, most likely reflecting an ancient genome duplication. For four centromeres, we mapped discrete subdomains of binding by the centromeric histone variant CENH3. These subdomains were depleted in both intact and nonfunctional genes relative to interspersed subdomains lacking CENH3. The intergenic location of rice centromeric chromatin resembles the situation for human neocentromeres and supports a model of the evolution of centromeres from gene-poor regions.
Author Summary
Before a cell divides, its chromosomes must be duplicated and then separated to provide each daughter cell with an identical genome copy. To accomplish this separation, the cell-division apparatus attaches to structures on the chromosomes called centromeres. Most plant and animal centromeres contain highly repetitive DNA sequences and specific proteins such as CENH3; however, it is not known which of the many repeats bind CENH3. Some rice centromeres, however, consist largely of single-copy DNA, providing a tractable model for investigating CENH3-binding patterns. Using modern DNA sequencing technology and an antibody to CENH3, we were able to find which sequences in the rice genome are bound by CENH3. We uncovered evidence that one centromere, Cen8, which has lost much of its repetitive content through a rearrangement within the last approximately 5 million years, is derived from a highly repetitive centromeric region that was duplicated along with the rest of the genome 50–70 million years ago. We also found that CENH3 is bound discontinuously in centromeric subdomains that have fewer genes than subdomains lacking CENH3. These results suggest, not only that centromeres evolve in gene-poor regions, but also how centromeres might evolve from single-copy to repetitive sequences.
A key centromere protein is found to bind discontinuously to subdomains of centromeres that are depleted in genes, suggesting that centromeres evolve in gene-poor regions.
doi:10.1371/journal.pbio.0060286
PMCID: PMC2586382  PMID: 19067486
4.  Use of targeted SNP selection for an improved anchoring of the melon (Cucumis melo L.) scaffold genome assembly 
BMC Genomics  2015;16(1):4.
Background
The genome of the melon (Cucumis melo L.) double-haploid line DHL92 was recently sequenced, with 87.5 and 80.8% of the scaffold assembly anchored and oriented to the 12 linkage groups, respectively. However, insufficient marker coverage and a lack of recombination left several large, gene rich scaffolds unanchored, and some anchored scaffolds unoriented. To improve the anchoring and orientation of the melon genome assembly, we used resequencing data between the parental lines of DHL92 to develop a new set of SNP markers from unanchored scaffolds.
Results
A high-resolution genetic map composed of 580 SNPs was used to anchor 354.8 Mb of sequence, contained in 141 scaffolds (average size 2.5 Mb) and corresponding to 98.2% of the scaffold assembly, to the 12 melon chromosomes. Over 325.4 Mb (90%) of the assembly was oriented. The genetic map revealed regions of segregation distortion favoring SC alleles as well as recombination suppression regions coinciding with putative centromere, 45S, and 5S rDNA sites. New chromosome-scale pseudomolecules were created by incorporating to the previous v3.5 version an additional 38.3 Mb of anchored sequence representing 1,837 predicted genes contained in 55 scaffolds. Using fluorescent in situ hybridization (FISH) with BACs that produced chromosome-specific signals, melon chromosomes that correspond to the twelve linkage groups were identified, and a standardized karyotype of melon inbred line T111 was developed.
Conclusions
By utilizing resequencing data and targeted SNP selection combined with a large F2 mapping population, we significantly improved the quantity of anchored and oriented melon scaffold genome assembly. Using genome information combined with FISH mapping provided the first cytogenetic map of an inodorus melon type. With these results it was possible to make inferences on melon chromosome structure by relating zones of recombination suppression to centromeres and 45S and 5S heterochromatic regions. This study represents the first steps towards the integration of the high-resolution genetic and cytogenetic maps with the genomic sequence in melon that will provide more information on genome organization and allow for the improvement of the melon genome draft sequence.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-014-1196-3) contains supplementary material, which is available to authorized users.
doi:10.1186/s12864-014-1196-3
PMCID: PMC4316794  PMID: 25612459
Melon; SNP; Genome; Scaffold; Pseudomolecules; FISH; Karyotype
5.  Human genome meeting 2016 
Srivastava, A. K. | Wang, Y. | Huang, R. | Skinner, C. | Thompson, T. | Pollard, L. | Wood, T. | Luo, F. | Stevenson, R. | Polimanti, R. | Gelernter, J. | Lin, X. | Lim, I. Y. | Wu, Y. | Teh, A. L. | Chen, L. | Aris, I. M. | Soh, S. E. | Tint, M. T. | MacIsaac, J. L. | Yap, F. | Kwek, K. | Saw, S. M. | Kobor, M. S. | Meaney, M. J. | Godfrey, K. M. | Chong, Y. S. | Holbrook, J. D. | Lee, Y. S. | Gluckman, P. D. | Karnani, N. | Kapoor, A. | Lee, D. | Chakravarti, A. | Maercker, C. | Graf, F. | Boutros, M. | Stamoulis, G. | Santoni, F. | Makrythanasis, P. | Letourneau, A. | Guipponi, M. | Panousis, N. | Garieri, M. | Ribaux, P. | Falconnet, E. | Borel, C. | Antonarakis, S. E. | Kumar, S. | Curran, J. | Blangero, J. | Chatterjee, S. | Kapoor, A. | Akiyama, J. | Auer, D. | Berrios, C. | Pennacchio, L. | Chakravarti, A. | Donti, T. R. | Cappuccio, G. | Miller, M. | Atwal, P. | Kennedy, A. | Cardon, A. | Bacino, C. | Emrick, L. | Hertecant, J. | Baumer, F. | Porter, B. | Bainbridge, M. | Bonnen, P. | Graham, B. | Sutton, R. | Sun, Q. | Elsea, S. | Hu, Z. | Wang, P. | Zhu, Y. | Zhao, J. | Xiong, M. | Bennett, David A. | Hidalgo-Miranda, A. | Romero-Cordoba, S. | Rodriguez-Cuevas, S. | Rebollar-Vega, R. | Tagliabue, E. | Iorio, M. | D’Ippolito, E. | Baroni, S. | Kaczkowski, B. | Tanaka, Y. | Kawaji, H. | Sandelin, A. | Andersson, R. | Itoh, M. | Lassmann, T. | Hayashizaki, Y. | Carninci, P. | Forrest, A. R. R. | Semple, C. A. | Rosenthal, E. A. | Shirts, B. | Amendola, L. | Gallego, C. | Horike-Pyne, M. | Burt, A. | Robertson, P. | Beyers, P. | Nefcy, C. | Veenstra, D. | Hisama, F. | Bennett, R. | Dorschner, M. | Nickerson, D. | Smith, J. | Patterson, K. | Crosslin, D. | Nassir, R. | Zubair, N. | Harrison, T. | Peters, U. | Jarvik, G. | Menghi, F. | Inaki, K. | Woo, X. | Kumar, P. | Grzeda, K. | Malhotra, A. | Kim, H. | Ucar, D. | Shreckengast, P. | Karuturi, K. | Keck, J. | Chuang, J. | Liu, E. T. | Ji, B. | Tyler, A. | Ananda, G. | Carter, G. | Nikbakht, H. | Montagne, M. | Zeinieh, M. | Harutyunyan, A. | Mcconechy, M. | Jabado, N. | Lavigne, P. | Majewski, J. | Goldstein, J. B. | Overman, M. | Varadhachary, G. | Shroff, R. | Wolff, R. | Javle, M. | Futreal, A. | Fogelman, D. | Bravo, L. | Fajardo, W. | Gomez, H. | Castaneda, C. | Rolfo, C. | Pinto, J. A. | Akdemir, K. C. | Chin, L. | Futreal, A. | Patterson, S. | Statz, C. | Mockus, S. | Nikolaev, S. N. | Bonilla, X. I. | Parmentier, L. | King, B. | Bezrukov, F. | Kaya, G. | Zoete, V. | Seplyarskiy, V. | Sharpe, H. | McKee, T. | Letourneau, A. | Ribaux, P. | Popadin, K. | Basset-Seguin, N. | Chaabene, R. Ben | Santoni, F. | Andrianova, M. | Guipponi, M. | Garieri, M. | Verdan, C. | Grosdemange, K. | Sumara, O. | Eilers, M. | Aifantis, I. | Michielin, O. | de Sauvage, F. | Antonarakis, S. | Likhitrattanapisal, S. | Lincoln, S. | Kurian, A. | Desmond, A. | Yang, S. | Kobayashi, Y. | Ford, J. | Ellisen, L. | Peters, T. L. | Alvarez, K. R. | Hollingsworth, E. F. | Lopez-Terrada, D. H. | Hastie, A. | Dzakula, Z. | Pang, A. W. | Lam, E. T. | Anantharaman, T. | Saghbini, M. | Cao, H. | Gonzaga-Jauregui, C. | Ma, L. | King, A. | Rosenzweig, E. Berman | Krishnan, U. | Reid, J. G. | Overton, J. D. | Dewey, F. | Chung, W. K. | Small, K. | DeLuca, A. | Cremers, F. | Lewis, R. A. | Puech, V. | Bakall, B. | Silva-Garcia, R. | Rohrschneider, K. | Leys, M. | Shaya, F. S. | Stone, E. | Sobreira, N. L. | Schiettecatte, F. | Ling, H. | Pugh, E. | Witmer, D. | Hetrick, K. | Zhang, P. | Doheny, K. | Valle, D. | Hamosh, A. | Jhangiani, S. N. | Akdemir, Z. Coban | Bainbridge, M. N. | Charng, W. | Wiszniewski, W. | Gambin, T. | Karaca, E. | Bayram, Y. | Eldomery, M. K. | Posey, J. | Doddapaneni, H. | Hu, J. | Sutton, V. R. | Muzny, D. M. | Boerwinkle, E. A. | Valle, D. | Lupski, J. R. | Gibbs, R. A. | Shekar, S. | Salerno, W. | English, A. | Mangubat, A. | Bruestle, J. | Thorogood, A. | Knoppers, B. M. | Takahashi, H. | Nitta, K. R. | Kozhuharova, A. | Suzuki, A. M. | Sharma, H. | Cotella, D. | Santoro, C. | Zucchelli, S. | Gustincich, S. | Carninci, P. | Mulvihill, J. J. | Baynam, G. | Gahl, W. | Groft, S. C. | Kosaki, K. | Lasko, P. | Melegh, B. | Taruscio, D. | Ghosh, R. | Plon, S. | Scherer, S. | Qin, X. | Sanghvi, R. | Walker, K. | Chiang, T. | Muzny, D. | Wang, L. | Black, J. | Boerwinkle, E. | Weinshilboum, R. | Gibbs, R. | Karpinets, T. | Calderone, T. | Wani, K. | Yu, X. | Creasy, C. | Haymaker, C. | Forget, M. | Nanda, V. | Roszik, J. | Wargo, J. | Haydu, L. | Song, X. | Lazar, A. | Gershenwald, J. | Davies, M. | Bernatchez, C. | Zhang, J. | Futreal, A. | Woodman, S. | Chesler, E. J. | Reynolds, T. | Bubier, J. A. | Phillips, C. | Langston, M. A. | Baker, E. J. | Xiong, M. | Ma, L. | Lin, N. | Amos, C. | Lin, N. | Wang, P. | Zhu, Y. | Zhao, J. | Calhoun, V. | Xiong, M. | Dobretsberger, O. | Egger, M. | Leimgruber, F. | Sadedin, S. | Oshlack, A. | Antonio, V. A. A. | Ono, N. | Ahmed, Z. | Bolisetty, M. | Zeeshan, S. | Anguiano, E. | Ucar, D. | Sarkar, A. | Nandineni, M. R. | Zeng, C. | Shao, J. | Cao, H. | Hastie, A. | Pang, A. W. | Lam, E. T. | Liang, T. | Pham, K. | Saghbini, M. | Dzakula, Z. | Chee-Wei, Y. | Dongsheng, L. | Lai-Ping, W. | Lian, D. | Hee, R. O. Twee | Yunus, Y. | Aghakhanian, F. | Mokhtar, S. S. | Lok-Yung, C. V. | Bhak, J. | Phipps, M. | Shuhua, X. | Yik-Ying, T. | Kumar, V. | Boon-Peng, H. | Campbell, I. | Young, M. -A. | James, P. | Rain, M. | Mohammad, G. | Kukreti, R. | Pasha, Q. | Akilzhanova, A. R. | Guelly, C. | Abilova, Z. | Rakhimova, S. | Akhmetova, A. | Kairov, U. | Trajanoski, S. | Zhumadilov, Z. | Bekbossynova, M. | Schumacher, C. | Sandhu, S. | Harkins, T. | Makarov, V. | Doddapaneni, H. | Glenn, R. | Momin, Z. | Dilrukshi, B. | Chao, H. | Meng, Q. | Gudenkauf, B. | Kshitij, R. | Jayaseelan, J. | Nessner, C. | Lee, S. | Blankenberg, K. | Lewis, L. | Hu, J. | Han, Y. | Dinh, H. | Jireh, S. | Walker, K. | Boerwinkle, E. | Muzny, D. | Gibbs, R. | Hu, J. | Walker, K. | Buhay, C. | Liu, X. | Wang, Q. | Sanghvi, R. | Doddapaneni, H. | Ding, Y. | Veeraraghavan, N. | Yang, Y. | Boerwinkle, E. | Beaudet, A. L. | Eng, C. M. | Muzny, D. M. | Gibbs, R. A. | Worley, K. C. C. | Liu, Y. | Hughes, D. S. T. | Murali, S. C. | Harris, R. A. | English, A. C. | Qin, X. | Hampton, O. A. | Larsen, P. | Beck, C. | Han, Y. | Wang, M. | Doddapaneni, H. | Kovar, C. L. | Salerno, W. J. | Yoder, A. | Richards, S. | Rogers, J. | Lupski, J. R. | Muzny, D. M. | Gibbs, R. A. | Meng, Q. | Bainbridge, M. | Wang, M. | Doddapaneni, H. | Han, Y. | Muzny, D. | Gibbs, R. | Harris, R. A. | Raveenedran, M. | Xue, C. | Dahdouli, M. | Cox, L. | Fan, G. | Ferguson, B. | Hovarth, J. | Johnson, Z. | Kanthaswamy, S. | Kubisch, M. | Platt, M. | Smith, D. | Vallender, E. | Wiseman, R. | Liu, X. | Below, J. | Muzny, D. | Gibbs, R. | Yu, F. | Rogers, J. | Lin, J. | Zhang, Y. | Ouyang, Z. | Moore, A. | Wang, Z. | Hofmann, J. | Purdue, M. | Stolzenberg-Solomon, R. | Weinstein, S. | Albanes, D. | Liu, C. S. | Cheng, W. L. | Lin, T. T. | Lan, Q. | Rothman, N. | Berndt, S. | Chen, E. S. | Bahrami, H. | Khoshzaban, A. | Keshal, S. Heidari | Bahrami, H. | Khoshzaban, A. | Keshal, S. Heidari | Alharbi, K. K. R. | Zhalbinova, M. | Akilzhanova, A. | Rakhimova, S. | Bekbosynova, M. | Myrzakhmetova, S. | Matar, M. | Mili, N. | Molinari, R. | Ma, Y. | Guerrier, S. | Elhawary, N. | Tayeb, M. | Bogari, N. | Qotb, N. | McClymont, S. A. | Hook, P. W. | Goff, L. A. | McCallion, A. | Kong, Y. | Charette, J. R. | Hicks, W. L. | Naggert, J. K. | Zhao, L. | Nishina, P. M. | Edrees, B. M. | Athar, M. | Al-Allaf, F. A. | Taher, M. M. | Khan, W. | Bouazzaoui, A. | Harbi, N. A. | Safar, R. | Al-Edressi, H. | Anazi, A. | Altayeb, N. | Ahmed, M. A. | Alansary, K. | Abduljaleel, Z. | Kratz, A. | Beguin, P. | Poulain, S. | Kaneko, M. | Takahiko, C. | Matsunaga, A. | Kato, S. | Suzuki, A. M. | Bertin, N. | Lassmann, T. | Vigot, R. | Carninci, P. | Plessy, C. | Launey, T. | Graur, D. | Lee, D. | Kapoor, A. | Chakravarti, A. | Friis-Nielsen, J. | Izarzugaza, J. M. | Brunak, S. | Chakraborty, A. | Basak, J. | Mukhopadhyay, A. | Soibam, B. S. | Das, D. | Biswas, N. | Das, S. | Sarkar, S. | Maitra, A. | Panda, C. | Majumder, P. | Morsy, H. | Gaballah, A. | Samir, M. | Shamseya, M. | Mahrous, H. | Ghazal, A. | Arafat, W. | Hashish, M. | Gruber, J. J. | Jaeger, N. | Snyder, M. | Patel, K. | Bowman, S. | Davis, T. | Kraushaar, D. | Emerman, A. | Russello, S. | Henig, N. | Hendrickson, C. | Zhang, K. | Rodriguez-Dorantes, M. | Cruz-Hernandez, C. D. | Garcia-Tobilla, C. D. P. | Solorzano-Rosales, S. | Jäger, N. | Chen, J. | Haile, R. | Hitchins, M. | Brooks, J. D. | Snyder, M. | Jiménez-Morales, S. | Ramírez, M. | Nuñez, J. | Bekker, V. | Leal, Y. | Jiménez, E. | Medina, A. | Hidalgo, A. | Mejía, J. | Halytskiy, V. | Naggert, J. | Collin, G. B. | DeMauro, K. | Hanusek, R. | Nishina, P. M. | Belhassa, K. | Belhassan, K. | Bouguenouch, L. | Samri, I. | Sayel, H. | moufid, FZ. | El Bouchikhi, I. | Trhanint, S. | Hamdaoui, H. | Elotmani, I. | Khtiri, I. | Kettani, O. | Quibibo, L. | Ahagoud, M. | Abbassi, M. | Ouldim, K. | Marusin, A. V. | Kornetov, A. N. | Swarovskaya, M. | Vagaiceva, K. | Stepanov, V. | De La Paz, E. M. Cutiongco | Sy, R. | Nevado, J. | Reganit, P. | Santos, L. | Magno, J. D. | Punzalan, F. E. | Ona, D. | Llanes, E. | Santos-Cortes, R. L. | Tiongco, R. | Aherrera, J. | Abrahan, L. | Pagauitan-Alan, P. | Morelli, K. H. | Domire, J. S. | Pyne, N. | Harper, S. | Burgess, R. | Zhalbinova, M. | Akilzhanova, A. | Rakhimova, S. | Bekbosynova, M. | Myrzakhmetova, S. | Gari, M. A. | Dallol, A. | Alsehli, H. | Gari, A. | Gari, M. | Abuzenadah, A. | Thomas, M. | Sukhai, M. | Garg, S. | Misyura, M. | Zhang, T. | Schuh, A. | Stockley, T. | Kamel-Reid, S. | Sherry, S. | Xiao, C. | Slotta, D. | Rodarmer, K. | Feolo, M. | Kimelman, M. | Godynskiy, G. | O’Sullivan, C. | Yaschenko, E. | Xiao, C. | Yaschenko, E. | Sherry, S. | Rangel-Escareño, C. | Rueda-Zarate, H. | Tayubi, I. A. | Mohammed, R. | Ahmed, I. | Ahmed, T. | Seth, S. | Amin, S. | Song, X. | Mao, X. | Sun, H. | Verhaak, R. G. | Futreal, A. | Zhang, J. | Whiite, S. J. | Chiang, T. | English, A. | Farek, J. | Kahn, Z. | Salerno, W. | Veeraraghavan, N. | Boerwinkle, E. | Gibbs, R. | Kasukawa, T. | Lizio, M. | Harshbarger, J. | Hisashi, S. | Severin, J. | Imad, A. | Sahin, S. | Freeman, T. C. | Baillie, K. | Sandelin, A. | Carninci, P. | Forrest, A. R. R. | Kawaji, H. | Salerno, W. | English, A. | Shekar, S. N. | Mangubat, A. | Bruestle, J. | Boerwinkle, E. | Gibbs, R. A. | Salem, A. H. | Ali, M. | Ibrahim, A. | Ibrahim, M. | Barrera, H. A. | Garza, L. | Torres, J. A. | Barajas, V. | Ulloa-Aguirre, A. | Kershenobich, D. | Mortaji, Shahroj | Guizar, Pedro | Loera, Eliezer | Moreno, Karen | De León, Adriana | Monsiváis, Daniela | Gómez, Jackeline | Cardiel, Raquel | Fernandez-Lopez, J. C. | Bonifaz-Peña, V. | Rangel-Escareño, C. | Hidalgo-Miranda, A. | Contreras, A. V. | Polfus, L. | Wang, X. | Philip, V. | Carter, G. | Abuzenadah, A. A. | Gari, M. | Turki, R. | Dallol, A. | Uyar, A. | Kaygun, A. | Zaman, S. | Marquez, E. | George, J. | Ucar, D. | Hendrickson, C. L. | Emerman, A. | Kraushaar, D. | Bowman, S. | Henig, N. | Davis, T. | Russello, S. | Patel, K. | Starr, D. B. | Baird, M. | Kirkpatrick, B. | Sheets, K. | Nitsche, R. | Prieto-Lafuente, L. | Landrum, M. | Lee, J. | Rubinstein, W. | Maglott, D. | Thavanati, P. K. R. | de Dios, A. Escoto | Hernandez, R. E. Navarro | Aldrate, M. E. Aguilar | Mejia, M. R. Ruiz | Kanala, K. R. R. | Abduljaleel, Z. | Khan, W. | Al-Allaf, F. A. | Athar, M. | Taher, M. M. | Shahzad, N. | Bouazzaoui, A. | Huber, E. | Dan, A. | Al-Allaf, F. A. | Herr, W. | Sprotte, G. | Köstler, J. | Hiergeist, A. | Gessner, A. | Andreesen, R. | Holler, E. | Al-Allaf, F. | Alashwal, A. | Abduljaleel, Z. | Taher, M. | Bouazzaoui, A. | Abalkhail, H. | Al-Allaf, A. | Bamardadh, R. | Athar, M. | Filiptsova, O. | Kobets, M. | Kobets, Y. | Burlaka, I. | Timoshyna, I. | Filiptsova, O. | Kobets, M. N. | Kobets, Y. | Burlaka, I. | Timoshyna, I. | Filiptsova, O. | Kobets, M. N. | Kobets, Y. | Burlaka, I. | Timoshyna, I. | Al-allaf, F. A. | Mohiuddin, M. T. | Zainularifeen, A. | Mohammed, A. | Abalkhail, H. | Owaidah, T. | Bouazzaoui, A.
Human Genomics  2016;10(Suppl 1):12.
Table of contents
O1 The metabolomics approach to autism: identification of biomarkers for early detection of autism spectrum disorder
A. K. Srivastava, Y. Wang, R. Huang, C. Skinner, T. Thompson, L. Pollard, T. Wood, F. Luo, R. Stevenson
O2 Phenome-wide association study for smoking- and drinking-associated genes in 26,394 American women with African, Asian, European, and Hispanic descents
R. Polimanti, J. Gelernter
O3 Effects of prenatal environment, genotype and DNA methylation on birth weight and subsequent postnatal outcomes: findings from GUSTO, an Asian birth cohort
X. Lin, I. Y. Lim, Y. Wu, A. L. Teh, L. Chen, I. M. Aris, S. E. Soh, M. T. Tint, J. L. MacIsaac, F. Yap, K. Kwek, S. M. Saw, M. S. Kobor, M. J. Meaney, K. M. Godfrey, Y. S. Chong, J. D. Holbrook, Y. S. Lee, P. D. Gluckman, N. Karnani, GUSTO study group
O4 High-throughput identification of specific qt interval modulating enhancers at the SCN5A locus
A. Kapoor, D. Lee, A. Chakravarti
O5 Identification of extracellular matrix components inducing cancer cell migration in the supernatant of cultivated mesenchymal stem cells
C. Maercker, F. Graf, M. Boutros
O6 Single cell allele specific expression (ASE) IN T21 and common trisomies: a novel approach to understand DOWN syndrome and other aneuploidies
G. Stamoulis, F. Santoni, P. Makrythanasis, A. Letourneau, M. Guipponi, N. Panousis, M. Garieri, P. Ribaux, E. Falconnet, C. Borel, S. E. Antonarakis
O7 Role of microRNA in LCL to IPSC reprogramming
S. Kumar, J. Curran, J. Blangero
O8 Multiple enhancer variants disrupt gene regulatory network in Hirschsprung disease
S. Chatterjee, A. Kapoor, J. Akiyama, D. Auer, C. Berrios, L. Pennacchio, A. Chakravarti
O9 Metabolomic profiling for the diagnosis of neurometabolic disorders
T. R. Donti, G. Cappuccio, M. Miller, P. Atwal, A. Kennedy, A. Cardon, C. Bacino, L. Emrick, J. Hertecant, F. Baumer, B. Porter, M. Bainbridge, P. Bonnen, B. Graham, R. Sutton, Q. Sun, S. Elsea
O10 A novel causal methylation network approach to Alzheimer’s disease
Z. Hu, P. Wang, Y. Zhu, J. Zhao, M. Xiong, David A Bennett
O11 A microRNA signature identifies subtypes of triple-negative breast cancer and reveals MIR-342-3P as regulator of a lactate metabolic pathway
A. Hidalgo-Miranda, S. Romero-Cordoba, S. Rodriguez-Cuevas, R. Rebollar-Vega, E. Tagliabue, M. Iorio, E. D’Ippolito, S. Baroni
O12 Transcriptome analysis identifies genes, enhancer RNAs and repetitive elements that are recurrently deregulated across multiple cancer types
B. Kaczkowski, Y. Tanaka, H. Kawaji, A. Sandelin, R. Andersson, M. Itoh, T. Lassmann, the FANTOM5 consortium, Y. Hayashizaki, P. Carninci, A. R. R. Forrest
O13 Elevated mutation and widespread loss of constraint at regulatory and architectural binding sites across 11 tumour types
C. A. Semple
O14 Exome sequencing provides evidence of pathogenicity for genes implicated in colorectal cancer
E. A. Rosenthal, B. Shirts, L. Amendola, C. Gallego, M. Horike-Pyne, A. Burt, P. Robertson, P. Beyers, C. Nefcy, D. Veenstra, F. Hisama, R. Bennett, M. Dorschner, D. Nickerson, J. Smith, K. Patterson, D. Crosslin, R. Nassir, N. Zubair, T. Harrison, U. Peters, G. Jarvik, NHLBI GO Exome Sequencing Project
O15 The tandem duplicator phenotype as a distinct genomic configuration in cancer
F. Menghi, K. Inaki, X. Woo, P. Kumar, K. Grzeda, A. Malhotra, H. Kim, D. Ucar, P. Shreckengast, K. Karuturi, J. Keck, J. Chuang, E. T. Liu
O16 Modeling genetic interactions associated with molecular subtypes of breast cancer
B. Ji, A. Tyler, G. Ananda, G. Carter
O17 Recurrent somatic mutation in the MYC associated factor X in brain tumors
H. Nikbakht, M. Montagne, M. Zeinieh, A. Harutyunyan, M. Mcconechy, N. Jabado, P. Lavigne, J. Majewski
O18 Predictive biomarkers to metastatic pancreatic cancer treatment
J. B. Goldstein, M. Overman, G. Varadhachary, R. Shroff, R. Wolff, M. Javle, A. Futreal, D. Fogelman
O19 DDIT4 gene expression as a prognostic marker in several malignant tumors
L. Bravo, W. Fajardo, H. Gomez, C. Castaneda, C. Rolfo, J. A. Pinto
O20 Spatial organization of the genome and genomic alterations in human cancers
K. C. Akdemir, L. Chin, A. Futreal, ICGC PCAWG Structural Alterations Group
O21 Landscape of targeted therapies in solid tumors
S. Patterson, C. Statz, S. Mockus
O22 Genomic analysis reveals novel drivers and progression pathways in skin basal cell carcinoma
S. N. Nikolaev, X. I. Bonilla, L. Parmentier, B. King, F. Bezrukov, G. Kaya, V. Zoete, V. Seplyarskiy, H. Sharpe, T. McKee, A. Letourneau, P. Ribaux, K. Popadin, N. Basset-Seguin, R. Ben Chaabene, F. Santoni, M. Andrianova, M. Guipponi, M. Garieri, C. Verdan, K. Grosdemange, O. Sumara, M. Eilers, I. Aifantis, O. Michielin, F. de Sauvage, S. Antonarakis
O23 Identification of differential biomarkers of hepatocellular carcinoma and cholangiocarcinoma via transcriptome microarray meta-analysis
S. Likhitrattanapisal
O24 Clinical validity and actionability of multigene tests for hereditary cancers in a large multi-center study
S. Lincoln, A. Kurian, A. Desmond, S. Yang, Y. Kobayashi, J. Ford, L. Ellisen
O25 Correlation with tumor ploidy status is essential for correct determination of genome-wide copy number changes by SNP array
T. L. Peters, K. R. Alvarez, E. F. Hollingsworth, D. H. Lopez-Terrada
O26 Nanochannel based next-generation mapping for interrogation of clinically relevant structural variation
A. Hastie, Z. Dzakula, A. W. Pang, E. T. Lam, T. Anantharaman, M. Saghbini, H. Cao, BioNano Genomics
O27 Mutation spectrum in a pulmonary arterial hypertension (PAH) cohort and identification of associated truncating mutations in TBX4
C. Gonzaga-Jauregui, L. Ma, A. King, E. Berman Rosenzweig, U. Krishnan, J. G. Reid, J. D. Overton, F. Dewey, W. K. Chung
O28 NORTH CAROLINA macular dystrophy (MCDR1): mutations found affecting PRDM13
K. Small, A. DeLuca, F. Cremers, R. A. Lewis, V. Puech, B. Bakall, R. Silva-Garcia, K. Rohrschneider, M. Leys, F. S. Shaya, E. Stone
O29 PhenoDB and genematcher, solving unsolved whole exome sequencing data
N. L. Sobreira, F. Schiettecatte, H. Ling, E. Pugh, D. Witmer, K. Hetrick, P. Zhang, K. Doheny, D. Valle, A. Hamosh
O30 Baylor-Johns Hopkins Center for Mendelian genomics: a four year review
S. N. Jhangiani, Z. Coban Akdemir, M. N. Bainbridge, W. Charng, W. Wiszniewski, T. Gambin, E. Karaca, Y. Bayram, M. K. Eldomery, J. Posey, H. Doddapaneni, J. Hu, V. R. Sutton, D. M. Muzny, E. A. Boerwinkle, D. Valle, J. R. Lupski, R. A. Gibbs
O31 Using read overlap assembly to accurately identify structural genetic differences in an ashkenazi jewish trio
S. Shekar, W. Salerno, A. English, A. Mangubat, J. Bruestle
O32 Legal interoperability: a sine qua non for international data sharing
A. Thorogood, B. M. Knoppers, Global Alliance for Genomics and Health - Regulatory and Ethics Working Group
O33 High throughput screening platform of competent sineups: that can enhance translation activities of therapeutic target
H. Takahashi, K. R. Nitta, A. Kozhuharova, A. M. Suzuki, H. Sharma, D. Cotella, C. Santoro, S. Zucchelli, S. Gustincich, P. Carninci
O34 The undiagnosed diseases network international (UDNI): clinical and laboratory research to meet patient needs
J. J. Mulvihill, G. Baynam, W. Gahl, S. C. Groft, K. Kosaki, P. Lasko, B. Melegh, D. Taruscio
O36 Performance of computational algorithms in pathogenicity predictions for activating variants in oncogenes versus loss of function mutations in tumor suppressor genes
R. Ghosh, S. Plon
O37 Identification and electronic health record incorporation of clinically actionable pharmacogenomic variants using prospective targeted sequencing
S. Scherer, X. Qin, R. Sanghvi, K. Walker, T. Chiang, D. Muzny, L. Wang, J. Black, E. Boerwinkle, R. Weinshilboum, R. Gibbs
O38 Melanoma reprogramming state correlates with response to CTLA-4 blockade in metastatic melanoma
T. Karpinets, T. Calderone, K. Wani, X. Yu, C. Creasy, C. Haymaker, M. Forget, V. Nanda, J. Roszik, J. Wargo, L. Haydu, X. Song, A. Lazar, J. Gershenwald, M. Davies, C. Bernatchez, J. Zhang, A. Futreal, S. Woodman
O39 Data-driven refinement of complex disease classification from integration of heterogeneous functional genomics data in GeneWeaver
E. J. Chesler, T. Reynolds, J. A. Bubier, C. Phillips, M. A. Langston, E. J. Baker
O40 A general statistic framework for genome-based disease risk prediction
M. Xiong, L. Ma, N. Lin, C. Amos
O41 Integrative large-scale causal network analysis of imaging and genomic data and its application in schizophrenia studies
N. Lin, P. Wang, Y. Zhu, J. Zhao, V. Calhoun, M. Xiong
O42 Big data and NGS data analysis: the cloud to the rescue
O. Dobretsberger, M. Egger, F. Leimgruber
O43 Cpipe: a convergent clinical exome pipeline specialised for targeted sequencing
S. Sadedin, A. Oshlack, Melbourne Genomics Health Alliance
O44 A Bayesian classification of biomedical images using feature extraction from deep neural networks implemented on lung cancer data
V. A. A. Antonio, N. Ono, Clark Kendrick C. Go
O45 MAV-SEQ: an interactive platform for the Management, Analysis, and Visualization of sequence data
Z. Ahmed, M. Bolisetty, S. Zeeshan, E. Anguiano, D. Ucar
O47 Allele specific enhancer in EPAS1 intronic regions may contribute to high altitude adaptation of Tibetans
C. Zeng, J. Shao
O48 Nanochannel based next-generation mapping for structural variation detection and comparison in trios and populations
H. Cao, A. Hastie, A. W. Pang, E. T. Lam, T. Liang, K. Pham, M. Saghbini, Z. Dzakula
O49 Archaic introgression in indigenous populations of Malaysia revealed by whole genome sequencing
Y. Chee-Wei, L. Dongsheng, W. Lai-Ping, D. Lian, R. O. Twee Hee, Y. Yunus, F. Aghakhanian, S. S. Mokhtar, C. V. Lok-Yung, J. Bhak, M. Phipps, X. Shuhua, T. Yik-Ying, V. Kumar, H. Boon-Peng
O50 Breast and ovarian cancer prevention: is it time for population-based mutation screening of high risk genes?
I. Campbell, M.-A. Young, P. James, Lifepool
O53 Comprehensive coverage from low DNA input using novel NGS library preparation methods for WGS and WGBS
C. Schumacher, S. Sandhu, T. Harkins, V. Makarov
O54 Methods for large scale construction of robust PCR-free libraries for sequencing on Illumina HiSeqX platform
H. DoddapaneniR. Glenn, Z. Momin, B. Dilrukshi, H. Chao, Q. Meng, B. Gudenkauf, R. Kshitij, J. Jayaseelan, C. Nessner, S. Lee, K. Blankenberg, L. Lewis, J. Hu, Y. Han, H. Dinh, S. Jireh, K. Walker, E. Boerwinkle, D. Muzny, R. Gibbs
O55 Rapid capture methods for clinical sequencing
J. Hu, K. Walker, C. Buhay, X. Liu, Q. Wang, R. Sanghvi, H. Doddapaneni, Y. Ding, N. Veeraraghavan, Y. Yang, E. Boerwinkle, A. L. Beaudet, C. M. Eng, D. M. Muzny, R. A. Gibbs
O56 A diploid personal human genome model for better genomes from diverse sequence data
K. C. C. Worley, Y. Liu, D. S. T. Hughes, S. C. Murali, R. A. Harris, A. C. English, X. Qin, O. A. Hampton, P. Larsen, C. Beck, Y. Han, M. Wang, H. Doddapaneni, C. L. Kovar, W. J. Salerno, A. Yoder, S. Richards, J. Rogers, J. R. Lupski, D. M. Muzny, R. A. Gibbs
O57 Development of PacBio long range capture for detection of pathogenic structural variants
Q. Meng, M. Bainbridge, M. Wang, H. Doddapaneni, Y. Han, D. Muzny, R. Gibbs
O58 Rhesus macaques exhibit more non-synonymous variation but greater impact of purifying selection than humans
R. A. Harris, M. Raveenedran, C. Xue, M. Dahdouli, L. Cox, G. Fan, B. Ferguson, J. Hovarth, Z. Johnson, S. Kanthaswamy, M. Kubisch, M. Platt, D. Smith, E. Vallender, R. Wiseman, X. Liu, J. Below, D. Muzny, R. Gibbs, F. Yu, J. Rogers
O59 Assessing RNA structure disruption induced by single-nucleotide variation
J. Lin, Y. Zhang, Z. Ouyang
P1 A meta-analysis of genome-wide association studies of mitochondrial dna copy number
A. Moore, Z. Wang, J. Hofmann, M. Purdue, R. Stolzenberg-Solomon, S. Weinstein, D. Albanes, C.-S. Liu, W.-L. Cheng, T.-T. Lin, Q. Lan, N. Rothman, S. Berndt
P2 Missense polymorphic genetic combinations underlying down syndrome susceptibility
E. S. Chen
P4 The evaluation of alteration of ELAM-1 expression in the endometriosis patients
H. Bahrami, A. Khoshzaban, S. Heidari Keshal
P5 Obesity and the incidence of apolipoprotein E polymorphisms in an assorted population from Saudi Arabia population
K. K. R. Alharbi
P6 Genome-associated personalized antithrombotical therapy for patients with high risk of thrombosis and bleeding
M. Zhalbinova, A. Akilzhanova, S. Rakhimova, M. Bekbosynova, S. Myrzakhmetova
P7 Frequency of Xmn1 polymorphism among sickle cell carrier cases in UAE population
M. Matar
P8 Differentiating inflammatory bowel diseases by using genomic data: dimension of the problem and network organization
N. Mili, R. Molinari, Y. Ma, S. Guerrier
P9 Vulnerability of genetic variants to the risk of autism among Saudi children
N. Elhawary, M. Tayeb, N. Bogari, N. Qotb
P10 Chromatin profiles from ex vivo purified dopaminergic neurons establish a promising model to support studies of neurological function and dysfunction
S. A. McClymont, P. W. Hook, L. A. Goff, A. McCallion
P11 Utilization of a sensitized chemical mutagenesis screen to identify genetic modifiers of retinal dysplasia in homozygous Nr2e3rd7 mice
Y. Kong, J. R. Charette, W. L. Hicks, J. K. Naggert, L. Zhao, P. M. Nishina
P12 Ion torrent next generation sequencing of recessive polycystic kidney disease in Saudi patients
B. M. Edrees, M. Athar, F. A. Al-Allaf, M. M. Taher, W. Khan, A. Bouazzaoui, N. A. Harbi, R. Safar, H. Al-Edressi, A. Anazi, N. Altayeb, M. A. Ahmed, K. Alansary, Z. Abduljaleel
P13 Digital expression profiling of Purkinje neurons and dendrites in different subcellular compartments
A. Kratz, P. Beguin, S. Poulain, M. Kaneko, C. Takahiko, A. Matsunaga, S. Kato, A. M. Suzuki, N. Bertin, T. Lassmann, R. Vigot, P. Carninci, C. Plessy, T. Launey
P14 The evolution of imperfection and imperfection of evolution: the functional and functionless fractions of the human genome
D. Graur
P16 Species-independent identification of known and novel recurrent genomic entities in multiple cancer patients
J. Friis-Nielsen, J. M. Izarzugaza, S. Brunak
P18 Discovery of active gene modules which are densely conserved across multiple cancer types reveal their prognostic power and mutually exclusive mutation patterns
B. S. Soibam
P19 Whole exome sequencing of dysplastic leukoplakia tissue indicates sequential accumulation of somatic mutations from oral precancer to cancer
D. Das, N. Biswas, S. Das, S. Sarkar, A. Maitra, C. Panda, P. Majumder
P21 Epigenetic mechanisms of carcinogensis by hereditary breast cancer genes
J. J. Gruber, N. Jaeger, M. Snyder
P22 RNA direct: a novel RNA enrichment strategy applied to transcripts associated with solid tumors
K. Patel, S. Bowman, T. Davis, D. Kraushaar, A. Emerman, S. Russello, N. Henig, C. Hendrickson
P23 RNA sequencing identifies gene mutations for neuroblastoma
K. Zhang
P24 Participation of SFRP1 in the modulation of TMPRSS2-ERG fusion gene in prostate cancer cell lines
M. Rodriguez-Dorantes, C. D. Cruz-Hernandez, C. D. P. Garcia-Tobilla, S. Solorzano-Rosales
P25 Targeted Methylation Sequencing of Prostate Cancer
N. Jäger, J. Chen, R. Haile, M. Hitchins, J. D. Brooks, M. Snyder
P26 Mutant TPMT alleles in children with acute lymphoblastic leukemia from México City and Yucatán, Mexico
S. Jiménez-Morales, M. Ramírez, J. Nuñez, V. Bekker, Y. Leal, E. Jiménez, A. Medina, A. Hidalgo, J. Mejía
P28 Genetic modifiers of Alström syndrome
J. Naggert, G. B. Collin, K. DeMauro, R. Hanusek, P. M. Nishina
P31 Association of genomic variants with the occurrence of angiotensin-converting-enzyme inhibitor (ACEI)-induced coughing among Filipinos
E. M. Cutiongco De La Paz, R. Sy, J. Nevado, P. Reganit, L. Santos, J. D. Magno, F. E. Punzalan , D. Ona , E. Llanes, R. L. Santos-Cortes , R. Tiongco, J. Aherrera, L. Abrahan, P. Pagauitan-Alan; Philippine Cardiogenomics Study Group
P32 The use of “humanized” mouse models to validate disease association of a de novo GARS variant and to test a novel gene therapy strategy for Charcot-Marie-Tooth disease type 2D
K. H. Morelli, J. S. Domire, N. Pyne, S. Harper, R. Burgess
P34 Molecular regulation of chondrogenic human induced pluripotent stem cells
M. A. Gari, A. Dallol, H. Alsehli, A. Gari, M. Gari, A. Abuzenadah
P35 Molecular profiling of hematologic malignancies: implementation of a variant assessment algorithm for next generation sequencing data analysis and clinical reporting
M. Thomas, M. Sukhai, S. Garg, M. Misyura, T. Zhang, A. Schuh, T. Stockley, S. Kamel-Reid
P36 Accessing genomic evidence for clinical variants at NCBI
S. Sherry, C. Xiao, D. Slotta, K. Rodarmer, M. Feolo, M. Kimelman, G. Godynskiy, C. O’Sullivan, E. Yaschenko
P37 NGS-SWIFT: a cloud-based variant analysis framework using control-accessed sequencing data from DBGAP/SRA
C. Xiao, E. Yaschenko, S. Sherry
P38 Computational assessment of drug induced hepatotoxicity through gene expression profiling
C. Rangel-Escareño, H. Rueda-Zarate
P40 Flowr: robust and efficient pipelines using a simple language-agnostic approach;ultraseq; fast modular pipeline for somatic variation calling using flowr
S. Seth, S. Amin, X. Song, X. Mao, H. Sun, R. G. Verhaak, A. Futreal, J. Zhang
P41 Applying “Big data” technologies to the rapid analysis of heterogenous large cohort data
S. J. Whiite, T. Chiang, A. English, J. Farek, Z. Kahn, W. Salerno, N. Veeraraghavan, E. Boerwinkle, R. Gibbs
P42 FANTOM5 web resource for the large-scale genome-wide transcription start site activity profiles of wide-range of mammalian cells
T. Kasukawa, M. Lizio, J. Harshbarger, S. Hisashi, J. Severin, A. Imad, S. Sahin, T. C. Freeman, K. Baillie, A. Sandelin, P. Carninci, A. R. R. Forrest, H. Kawaji, The FANTOM Consortium
P43 Rapid and scalable typing of structural variants for disease cohorts
W. Salerno, A. English, S. N. Shekar, A. Mangubat, J. Bruestle, E. Boerwinkle, R. A. Gibbs
P44 Polymorphism of glutathione S-transferases and sulphotransferases genes in an Arab population
A. H. Salem, M. Ali, A. Ibrahim, M. Ibrahim
P46 Genetic divergence of CYP3A5*3 pharmacogenomic marker for native and admixed Mexican populations
J. C. Fernandez-Lopez, V. Bonifaz-Peña, C. Rangel-Escareño, A. Hidalgo-Miranda, A. V. Contreras
P47 Whole exome sequence meta-analysis of 13 white blood cell, red blood cell, and platelet traits
L. Polfus, CHARGE and NHLBI Exome Sequence Project Working Groups
P48 Association of adipoq gene with type 2 diabetes and related phenotypes in african american men and women: The jackson heart study
S. Davis, R. Xu, S. Gebeab, P Riestra, A Gaye, R. Khan, J. Wilson, A. Bidulescu
P49 Common variants in casr gene are associated with serum calcium levels in koreans
S. H. Jung, N. Vinayagamoorthy, S. H. Yim, Y. J. Chung
P50 Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with multiple exponential functions
Y. Zhou, S. Xu
P51 A Bayesian framework for generalized linear mixed models in genome-wide association studies
X. Wang, V. Philip, G. Carter
P52 Targeted sequencing approach for the identification of the genetic causes of hereditary hearing impairment
A. A. Abuzenadah, M. Gari, R. Turki, A. Dallol
P53 Identification of enhancer sequences by ATAC-seq open chromatin profiling
A. Uyar, A. Kaygun, S. Zaman, E. Marquez, J. George, D. Ucar
P54 Direct enrichment for the rapid preparation of targeted NGS libraries
C. L. Hendrickson, A. Emerman, D. Kraushaar, S. Bowman, N. Henig, T. Davis, S. Russello, K. Patel
P56 Performance of the Agilent D5000 and High Sensitivity D5000 ScreenTape assays for the Agilent 4200 Tapestation System
R. Nitsche, L. Prieto-Lafuente
P57 ClinVar: a multi-source archive for variant interpretation
M. Landrum, J. Lee, W. Rubinstein, D. Maglott
P59 Association of functional variants and protein physical interactions of human MUTY homolog linked with familial adenomatous polyposis and colorectal cancer syndrome
Z. Abduljaleel, W. Khan, F. A. Al-Allaf, M. Athar , M. M. Taher, N. Shahzad
P60 Modification of the microbiom constitution in the gut using chicken IgY antibodies resulted in a reduction of acute graft-versus-host disease after experimental bone marrow transplantation
A. Bouazzaoui, E. Huber, A. Dan, F. A. Al-Allaf, W. Herr, G. Sprotte, J. Köstler, A. Hiergeist, A. Gessner, R. Andreesen, E. Holler
P61 Compound heterozygous mutation in the LDLR gene in Saudi patients suffering severe hypercholesterolemia
F. Al-Allaf, A. Alashwal, Z. Abduljaleel, M. Taher, A. Bouazzaoui, H. Abalkhail, A. Al-Allaf, R. Bamardadh, M. Athar
doi:10.1186/s40246-016-0063-5
PMCID: PMC4896275  PMID: 27294413
6.  Tandemly repeated DNA families in the mouse genome 
BMC Genomics  2011;12:531.
Background
Functional and morphological studies of tandem DNA repeats, that combine high portion of most genomes, are mostly limited due to the incomplete characterization of these genome elements. We report here a genome wide analysis of the large tandem repeats (TR) found in the mouse genome assemblies.
Results
Using a bioinformatics approach, we identified large TR with array size more than 3 kb in two mouse whole genome shotgun (WGS) assemblies. Large TR were classified based on sequence similarity, chromosome position, monomer length, array variability, and GC content; we identified four superfamilies, eight families, and 62 subfamilies - including 60 not previously described. 1) The superfamily of centromeric minor satellite is only found in the unassembled part of the reference genome. 2) The pericentromeric major satellite is the most abundant superfamily and reveals high order repeat structure. 3) Transposable elements related superfamily contains two families. 4) The superfamily of heterogeneous tandem repeats includes four families. One family is found only in the WGS, while two families represent tandem repeats with either single or multi locus location. Despite multi locus location, TRPC-21A-MM is placed into a separated family due to its abundance, strictly pericentromeric location, and resemblance to big human satellites.
To confirm our data, we next performed in situ hybridization with three repeats from distinct families. TRPC-21A-MM probe hybridized to chromosomes 3 and 17, multi locus TR-22A-MM probe hybridized to ten chromosomes, and single locus TR-54B-MM probe hybridized with the long loops that emerge from chromosome ends. In addition to in silico predicted several extra-chromosomes were positive for TR by in situ analysis, potentially indicating inaccurate genome assembly of the heterochromatic genome regions.
Conclusions
Chromosome-specific TR had been predicted for mouse but no reliable cytogenetic probes were available before. We report new analysis that identified in silico and confirmed in situ 3/17 chromosome-specific probe TRPC-21-MM. Thus, the new classification had proven to be useful tool for continuation of genome study, while annotated TR can be the valuable source of cytogenetic probes for chromosome recognition.
doi:10.1186/1471-2164-12-531
PMCID: PMC3218096  PMID: 22035034
7.  The Mitochondrial Genome of Soybean Reveals Complex Genome Structures and Gene Evolution at Intercellular and Phylogenetic Levels 
PLoS ONE  2013;8(2):e56502.
Determining mitochondrial genomes is important for elucidating vital activities of seed plants. Mitochondrial genomes are specific to each plant species because of their variable size, complex structures and patterns of gene losses and gains during evolution. This complexity has made research on the soybean mitochondrial genome difficult compared with its nuclear and chloroplast genomes. The present study helps to solve a 30-year mystery regarding the most complex mitochondrial genome structure, showing that pairwise rearrangements among the many large repeats may produce an enriched molecular pool of 760 circles in seed plants. The soybean mitochondrial genome harbors 58 genes of known function in addition to 52 predicted open reading frames of unknown function. The genome contains sequences of multiple identifiable origins, including 6.8 kb and 7.1 kb DNA fragments that have been transferred from the nuclear and chloroplast genomes, respectively, and some horizontal DNA transfers. The soybean mitochondrial genome has lost 16 genes, including nine protein-coding genes and seven tRNA genes; however, it has acquired five chloroplast-derived genes during evolution. Four tRNA genes, common among the three genomes, are derived from the chloroplast. Sizeable DNA transfers to the nucleus, with pericentromeric regions as hotspots, are observed, including DNA transfers of 125.0 kb and 151.6 kb identified unambiguously from the soybean mitochondrial and chloroplast genomes, respectively. The soybean nuclear genome has acquired five genes from its mitochondrial genome. These results provide biological insights into the mitochondrial genome of seed plants, and are especially helpful for deciphering vital activities in soybean.
doi:10.1371/journal.pone.0056502
PMCID: PMC3576410  PMID: 23431381
8.  Development and Evaluation of SoySNP50K, a High-Density Genotyping Array for Soybean 
PLoS ONE  2013;8(1):e54985.
The objective of this research was to identify single nucleotide polymorphisms (SNPs) and to develop an Illumina Infinium BeadChip that contained over 50,000 SNPs from soybean (Glycine max L. Merr.). A total of 498,921,777 reads 35–45bp in length were obtained from DNA sequence analysis of reduced representation libraries from several soybean accessions which included six cultivated and two wild soybean (G. soja Sieb. et Zucc.) genotypes. These reads were mapped to the soybean whole genome sequence and 209,903 SNPs were identified. After applying several filters, a total of 146,161 of the 209,903 SNPs were determined to be ideal candidates for Illumina Infinium II BeadChip design. To equalize the distance between selected SNPs, increase assay success rate, and minimize the number of SNPs with low minor allele frequency, an iteration algorithm based on a selection index was developed and used to select 60,800 SNPs for Infinium BeadChip design. Of the 60,800 SNPs, 50,701 were targeted to euchromatic regions and 10,000 to heterochromatic regions of the 20 soybean chromosomes. In addition, 99 SNPs were targeted to unanchored sequence scaffolds. Of the 60,800 SNPs, a total of 52,041 passed Illumina’s manufacturing phase to produce the SoySNP50K iSelect BeadChip. Validation of the SoySNP50K chip with 96 landrace genotypes, 96 elite cultivars and 96 wild soybean accessions showed that 47,337 SNPs were polymorphic and generated successful SNP allele calls. In addition, 40,841 of the 47,337 SNPs (86%) had minor allele frequencies ≥10% among the landraces, elite cultivars and the wild soybean accessions. A total of 620 and 42 candidate regions which may be associated with domestication and recent selection were identified, respectively. The SoySNP50K iSelect SNP beadchip will be a powerful tool for characterizing soybean genetic diversity and linkage disequilibrium, and for constructing high resolution linkage maps to improve the soybean whole genome sequence assembly.
doi:10.1371/journal.pone.0054985
PMCID: PMC3555945  PMID: 23372807
9.  High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence 
BMC Genomics  2010;11:38.
Background
The Soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy, but its marker density was only sufficient to properly orient 66% of the sequence scaffolds. The discovery and genetic mapping of more single nucleotide polymorphism (SNP) markers were needed to anchor and orient the remaining genome sequence. To that end, next generation sequencing and high-throughput genotyping were combined to obtain a much higher resolution genetic map that could be used to anchor and orient most of the remaining sequence and to help validate the integrity of the existing scaffold builds.
Results
A total of 7,108 to 25,047 predicted SNPs were discovered using a reduced representation library that was subsequently sequenced by the Illumina sequence-by-synthesis method on the clonal single molecule array platform. Using multiple SNP prediction methods, the validation rate of these SNPs ranged from 79% to 92.5%. A high resolution genetic map using 444 recombinant inbred lines was created with 1,790 SNP markers. Of the 1,790 mapped SNP markers, 1,240 markers had been selectively chosen to target existing unanchored or un-oriented sequence scaffolds, thereby increasing the amount of anchored sequence to 97%.
Conclusion
We have demonstrated how next generation sequencing was combined with high-throughput SNP detection assays to quickly discover large numbers of SNPs. Those SNPs were then used to create a high resolution genetic map that assisted in the assembly of scaffolds from the 8× whole genome shotgun sequences into pseudomolecules corresponding to chromosomes of the organism.
doi:10.1186/1471-2164-11-38
PMCID: PMC2817691  PMID: 20078886
10.  Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey 
BMC Genomics  2007;8:132.
Background
Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA.
Results
We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis).
Conclusion
This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.
doi:10.1186/1471-2164-8-132
PMCID: PMC1894642  PMID: 17524145
11.  An improved genome release (version Mt4.0) for the model legume Medicago truncatula 
BMC Genomics  2014;15:312.
Background
Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011.
Results
Here we describe a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG. The ALLPATHS-LG scaffolds were anchored onto the pseudomolecules on the basis of alignments to both the optical map and the genotyping-by-sequencing (GBS) map. The Mt4.0 pseudomolecules encompass ~360 Mb of actual sequences spanning 390 Mb of which ~330 Mb align perfectly with the optical map, presenting a drastic improvement over the BAC-based Mt3.5 which only contained 70% sequences (~250 Mb) of the current version. Most of the sequences and genes that previously resided on the unanchored portion of Mt3.5 have now been incorporated into the Mt4.0 pseudomolecules, with the exception of ~28 Mb of unplaced sequences. With regard to gene annotation, the genome has been re-annotated through our gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences. A total of 50,894 genes (31,661 high confidence and 19,233 low confidence) are included in Mt4.0 which overlapped with ~82% of the gene loci annotated in Mt3.5. Of the remaining genes, 14% of the Mt3.5 genes have been deprecated to an “unsupported” status and 4% are absent from the Mt4.0 predictions.
Conclusions
Mt4.0 and its associated resources, such as genome browsers, BLAST-able datasets and gene information pages, can be found on the JCVI Medicago web site (http://www.jcvi.org/medicago). The assembly and annotation has been deposited in GenBank (BioProject: PRJNA10791). The heavily curated chromosomal sequences and associated gene models of Medicago will serve as a better reference for legume biology and comparative genomics.
doi:10.1186/1471-2164-15-312
PMCID: PMC4234490  PMID: 24767513
Medicago; Legume; Genome assembly; Gene annotation; Optical map
12.  Characterization of repetitive DNA landscape in wheat homeologous group 4 chromosomes 
BMC Genomics  2015;16(1):375.
Background
The number and complexity of repetitive elements varies between species, being in general most represented in those with larger genomes. Combining the flow-sorted chromosome arms approach to genome analysis with second generation DNA sequencing technologies provides a unique opportunity to study the repetitive portion of each chromosome, enabling comparisons among them. Additionally, different sequencing approaches may produce different depth of insight to repeatome content and structure. In this work we analyze and characterize the repetitive sequences of Triticum aestivum cv. Chinese Spring homeologous group 4 chromosome arms, obtained through Roche 454 and Illumina sequencing technologies, hereinafter marked by subscripts 454 and I, respectively.
Repetitive sequences were identified with the RepeatMasker software using the interspersed repeat database mips-REdat_v9.0p. The input sequences consisted of our 4DS454 and 4DL454 scaffolds and 4ASI, 4ALI, 4BSI, 4BLI, 4DSI and 4DLI contigs, downloaded from the International Wheat Genome Sequencing Consortium (IWGSC).
Results
Repetitive sequences content varied from 55% to 63% for all chromosome arm assemblies except for 4DLI, in which the repeat content was 38%. Transposable elements, small RNA, satellites, simple repeats and low complexity sequences were analyzed. SSR frequency was found one per 24 to 27 kb for all chromosome assemblies except 4DLI, where it was three times higher. Dinucleotides and trinucleotides were the most abundant SSR repeat units. (GA)n/(TC)n was the most abundant SSR except for 4DLI where the most frequently identified SSR was (CCG/CGG)n. Retrotransposons followed by DNA transposons were the most highly represented sequence repeats, mainly composed of CACTA/En-Spm and Gypsy superfamilies, respectively. This whole chromosome sequence analysis allowed identification of three new LTR retrotransposon families belonging to the Copia superfamily, one belonging to the Gypsy superfamily and two TRIM retrotransposon families. Their physical distribution in wheat genome was analyzed by fluorescent in situ hybridization (FISH) and one of them, the Carmen retrotransposon, was found specific for centromeric regions of all wheat chromosomes.
Conclusion
The presented work is the first deep report of wheat repetitive sequences analyzed at the chromosome arm level, revealing the first insight into the repeatome of T. aestivum chromosomes of homeologous group 4.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1579-0) contains supplementary material, which is available to authorized users.
doi:10.1186/s12864-015-1579-0
PMCID: PMC4440537  PMID: 25962417
13.  Evaluation of genetic variation among Brazilian soybean cultivars through genome resequencing 
BMC Genomics  2016;17:110.
Background
Soybean [Glycine max (L.) Merrill] is one of the most important legumes cultivated worldwide, and Brazil is one of the main producers of this crop. Since the sequencing of its reference genome, interest in structural and allelic variations of cultivated and wild soybean germplasm has grown. To investigate the genetics of the Brazilian soybean germplasm, we selected soybean cultivars based on the year of commercialization, geographical region and maturity group and resequenced their genomes.
Results
We resequenced the genomes of 28 Brazilian soybean cultivars with an average genome coverage of 14.8X. A total of 5,835,185 single nucleotide polymorphisms (SNPs) and 1,329,844 InDels were identified across the 20 soybean chromosomes, with 541,762 SNPs, 98,922 InDels and 1,093 CNVs that were exclusive to the 28 Brazilian cultivars. In addition, 668 allelic variations of 327 genes were shared among all of the Brazilian cultivars, including genes related to DNA-dependent transcription-elongation, photosynthesis, ATP synthesis-coupled electron transport, cellular respiration, and precursors of metabolite generation and energy. A very homogeneous structure was also observed for the Brazilian soybean germplasm, and we observed 41 regions putatively influenced by positive selection. Finally, we detected 3,880 regions with copy-number variations (CNVs) that could help to explain the divergence among the accessions evaluated.
Conclusions
The large number of allelic and structural variations identified in this study can be used in marker-assisted selection programs to detect unique SNPs for cultivar fingerprinting. The results presented here suggest that despite the diversification of modern Brazilian cultivars, the soybean germplasm remains very narrow because of the large number of genome regions that exhibit low diversity. These results emphasize the need to introduce new alleles to increase the genetic diversity of the Brazilian germplasm.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-2431-x) contains supplementary material, which is available to authorized users.
doi:10.1186/s12864-016-2431-x
PMCID: PMC4752768  PMID: 26872939
Glycine max; Allelic variation; Genetic diversity; Positive selection; CNV
14.  Epigenetic Remodeling of Meiotic Crossover Frequency in Arabidopsis thaliana DNA Methyltransferase Mutants 
PLoS Genetics  2012;8(8):e1002844.
Meiosis is a specialized eukaryotic cell division that generates haploid gametes required for sexual reproduction. During meiosis, homologous chromosomes pair and undergo reciprocal genetic exchange, termed crossover (CO). Meiotic CO frequency varies along the physical length of chromosomes and is determined by hierarchical mechanisms, including epigenetic organization, for example methylation of the DNA and histones. Here we investigate the role of DNA methylation in determining patterns of CO frequency along Arabidopsis thaliana chromosomes. In A. thaliana the pericentromeric regions are repetitive, densely DNA methylated, and suppressed for both RNA polymerase-II transcription and CO frequency. DNA hypomethylated methyltransferase1 (met1) mutants show transcriptional reactivation of repetitive sequences in the pericentromeres, which we demonstrate is coupled to extensive remodeling of CO frequency. We observe elevated centromere-proximal COs in met1, coincident with pericentromeric decreases and distal increases. Importantly, total numbers of CO events are similar between wild type and met1, suggesting a role for interference and homeostasis in CO remodeling. To understand recombination distributions at a finer scale we generated CO frequency maps close to the telomere of chromosome 3 in wild type and demonstrate an elevated recombination topology in met1. Using a pollen-typing strategy we have identified an intergenic nucleosome-free CO hotspot 3a, and we demonstrate that it undergoes increased recombination activity in met1. We hypothesize that modulation of 3a activity is caused by CO remodeling driven by elevated centromeric COs. These data demonstrate how regional epigenetic organization can pattern recombination frequency along eukaryotic chromosomes.
Author Summary
The majority of eukaryotes reproduce via a specialized cell division called meiosis, which generates gametes with half the number of chromosomes. During meiosis, homologous chromosomes pair and undergo a process of reciprocal exchange, called crossing-over (CO), which generates new combinations of genetic variation. The relative chance of a CO occurring is variable along the chromosome, for example COs are suppressed in the centromeric regions that attach to the spindle during chromosome segregation. These patterns correlate with domains of epigenetic organization along chromosomes, including methylation of the DNA and histones. DNA methylation occurs most densely in the centromeric regions of Arabidopsis thaliana chromosomes, where it is required for transcriptional suppression of repeated sequences. We demonstrate that mutants that lose DNA methylation (met1) show epigenetic remodeling of crossover frequencies, with increases in the centromeric regions and compensatory changes in the chromosome arms, though the total number of crossovers remains the same. As crossover numbers and distributions are subject to homeostatic mechanisms, we propose that these drive crossover remodeling in met1 in response to epigenetic change in the centromeric regions. Together these data demonstrate how domains of epigenetic organization are important for shaping patterns of crossover frequency along eukaryotic chromosomes.
doi:10.1371/journal.pgen.1002844
PMCID: PMC3410864  PMID: 22876192
15.  BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes 
Plant Biotechnology Journal  2016;14(7):1523-1531.
Summary
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC‐by‐BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high‐resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high‐resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome‐scale analysis of repetitive sequences and revealed a ~800‐kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone‐by‐clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC‐contig physical map and validate sequence assembly on a chromosome‐arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome‐by‐chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules.
doi:10.1111/pbi.12513
PMCID: PMC5066648  PMID: 26801360
optical mapping; wheat; sequencing; physical map; flow sorting; chromosomes
16.  Nucleosomes Shape DNA Polymorphism and Divergence 
PLoS Genetics  2014;10(7):e1004457.
An estimated 80% of genomic DNA in eukaryotes is packaged as nucleosomes, which, together with the remaining interstitial linker regions, generate higher order chromatin structures [1]. Nucleosome sequences isolated from diverse organisms exhibit ∼10 bp periodic variations in AA, TT and GC dinucleotide frequencies. These sequence elements generate intrinsically curved DNA and help establish the histone-DNA interface. We investigated an important unanswered question concerning the interplay between chromatin organization and genome evolution: do the DNA sequence preferences inherent to the highly conserved histone core exert detectable natural selection on genomic divergence and polymorphism? To address this hypothesis, we isolated nucleosomal DNA sequences from Drosophila melanogaster embryos and examined the underlying genomic variation within and between species. We found that divergence along the D. melanogaster lineage is periodic across nucleosome regions with base changes following preferred nucleotides, providing new evidence for systematic evolutionary forces in the generation and maintenance of nucleosome-associated dinucleotide periodicities. Further, Single Nucleotide Polymorphism (SNP) frequency spectra show striking periodicities across nucleosomal regions, paralleling divergence patterns. Preferred alleles occur at higher frequencies in natural populations, consistent with a central role for natural selection. These patterns are stronger for nucleosomes in introns than in intergenic regions, suggesting selection is stronger in transcribed regions where nucleosomes undergo more displacement, remodeling and functional modification. In addition, we observe a large-scale (∼180 bp) periodic enrichment of AA/TT dinucleotides associated with nucleosome occupancy, while GC dinucleotide frequency peaks in linker regions. Divergence and polymorphism data also support a role for natural selection in the generation and maintenance of these super-nucleosomal patterns. Our results demonstrate that nucleosome-associated sequence periodicities are under selective pressure, implying that structural interactions between nucleosomes and DNA sequence shape sequence evolution, particularly in introns.
Author Summary
In eukaryotic cells, the majority of DNA is packaged in nucleosomes comprised of ∼147 bp of DNA wound tightly around the highly conserved histone octamer. Nucleosomal DNA from diverse organisms shows an anti-correlated ∼10 bp periodicity of AT-rich and GC-rich dinucleotides. These sequence features influence DNA bending and shape, facilitating structural interactions. We asked whether natural selection mediated through the periodic sequence preferences of nucleosomes shapes the evolution of non-protein-coding regions of D. melanogaster by examining the inter- and intra-species genomic variation relative to these fundamental chromatin building blocks. The sequence changes across nucleosome-bound regions on the melanogaster lineage mirror the observed nucleosome dinucleotide periodicities. Importantly, we show that the frequencies of polymorphisms in natural populations vary across these regions, paralleling divergence, with higher frequencies of preferred alleles. These patterns are most evident for intronic regions and indicate that non-protein coding regions are evolving toward sequences that facilitate the canonical association with the histone core. This result is consistent with the hypothesis that interactions between DNA and the core have systematic impacts on function that are subject to natural selection and are not solely due to mutational bias. These ubiquitous interactions with the histone core partially account for the evolutionary constraint observed in unannotated genomic regions, and may drive broad changes in base composition.
doi:10.1371/journal.pgen.1004457
PMCID: PMC4081404  PMID: 24991813
17.  Histone Modifications within the Human X Centromere Region 
PLoS ONE  2009;4(8):e6602.
Human centromeres are multi-megabase regions of highly ordered arrays of alpha satellite DNA that are separated from chromosome arms by unordered alpha satellite monomers and other repetitive elements. Complexities in assembling such large repetitive regions have limited detailed studies of centromeric chromatin organization. However, a genomic map of the human X centromere has provided new opportunities to explore genomic architecture of a complex locus. We used ChIP to examine the distribution of modified histones within centromere regions of multiple X chromosomes. Methylation of H3 at lysine 4 coincided with DXZ1 higher order alpha satellite, the site of CENP-A localization. Heterochromatic histone modifications were distributed across the 400–500 kb pericentromeric regions. The large arrays of alpha satellite and gamma satellite DNA were enriched for both euchromatic and heterochromatic modifications, implying that some pericentromeric repeats have multiple chromatin characteristics. Partial truncation of the X centromere resulted in reduction in the size of the CENP-A/Cenp-A domain and increased heterochromatic modifications in the flanking pericentromere. Although the deletion removed ∼1/3 of centromeric DNA, the ratio of CENP-A to alpha satellite array size was maintained in the same proportion, suggesting that a limited, but defined linear region of the centromeric DNA is necessary for kinetochore assembly. Our results indicate that the human X centromere contains multiple types of chromatin, is organized similarly to smaller eukaryotic centromeres, and responds to structural changes by expanding or contracting domains.
doi:10.1371/journal.pone.0006602
PMCID: PMC2719913  PMID: 19672304
18.  Structure and Function of Centromeric and Pericentromeric Heterochromatin in Arabidopsis thaliana 
The centromere is a specific chromosomal region where the kinetochore assembles to ensure the faithful segregation of sister chromatids during mitosis and meiosis. Centromeres are defined by a local enrichment of the specific histone variant CenH3 mostly at repetitive satellite sequences. A larger pericentromeric region containing repetitive sequences and transposable elements surrounds the centromere that adopts a particular chromatin state characterized by specific histone variants and post-translational modifications and forms a transcriptionally repressive chromosomal environment. In the model organism Arabidopsis thaliana centromeric and pericentromeric domains form conspicuous heterochromatin clusters called chromocenters in interphase. Here we discuss, using Arabidopsis as example, recent insight into mechanisms involved in maintenance and establishment of centromeric and pericentromeric chromatin signatures as well as in chromocenter formation.
doi:10.3389/fpls.2015.01049
PMCID: PMC4663263  PMID: 26648952
centromere; chromocenter; histone variants; 3D nucleus; lamina; nuclear envelope
19.  Whole genome co-expression analysis of soybean cytochrome P450 genes identifies nodulation-specific P450 monooxygenases 
BMC Plant Biology  2010;10:243.
Background
Cytochrome P450 monooxygenases (P450s) catalyze oxidation of various substrates using oxygen and NAD(P)H. Plant P450s are involved in the biosynthesis of primary and secondary metabolites performing diverse biological functions. The recent availability of the soybean genome sequence allows us to identify and analyze soybean putative P450s at a genome scale. Co-expression analysis using an available soybean microarray and Illumina sequencing data provides clues for functional annotation of these enzymes. This approach is based on the assumption that genes that have similar expression patterns across a set of conditions may have a functional relationship.
Results
We have identified a total number of 332 full-length P450 genes and 378 pseudogenes from the soybean genome. From the full-length sequences, 195 genes belong to A-type, which could be further divided into 20 families. The remaining 137 genes belong to non-A type P450s and are classified into 28 families. A total of 178 probe sets were found to correspond to P450 genes on the Affymetrix soybean array. Out of these probe sets, 108 represented single genes. Using the 28 publicly available microarray libraries that contain organ-specific information, some tissue-specific P450s were identified. Similarly, stress responsive soybean P450s were retrieved from 99 microarray soybean libraries. We also utilized Illumina transcriptome sequencing technology to analyze the expressions of all 332 soybean P450 genes. This dataset contains total RNAs isolated from nodules, roots, root tips, leaves, flowers, green pods, apical meristem, mock-inoculated and Bradyrhizobium japonicum-infected root hair cells. The tissue-specific expression patterns of these P450 genes were analyzed and the expression of a representative set of genes were confirmed by qRT-PCR. We performed the co-expression analysis on many of the 108 P450 genes on the Affymetrix arrays. First we confirmed that CYP93C5 (an isoflavone synthase gene) is co-expressed with several genes encoding isoflavonoid-related metabolic enzymes. We then focused on nodulation-induced P450s and found that CYP728H1 was co-expressed with the genes involved in phenylpropanoid metabolism. Similarly, CYP736A34 was highly co-expressed with lipoxygenase, lectin and CYP83D1, all of which are involved in root and nodule development.
Conclusions
The genome scale analysis of P450s in soybean reveals many unique features of these important enzymes in this crop although the functions of most of them are largely unknown. Gene co-expression analysis proves to be a useful tool to infer the function of uncharacterized genes. Our work presented here could provide important leads toward functional genomics studies of soybean P450s and their regulatory network through the integration of reverse genetics, biochemistry, and metabolic profiling tools. The identification of nodule-specific P450s and their further exploitation may help us to better understand the intriguing process of soybean and rhizobium interaction.
doi:10.1186/1471-2229-10-243
PMCID: PMC3095325  PMID: 21062474
20.  Primary analysis of repeat elements of the Asian seabass (Lates calcarifer) transcriptome and genome 
Frontiers in Genetics  2014;5:223.
As part of our Asian seabass genome project, we are generating an inventory of repeat elements in the genome and transcriptome. The karyotype showed a diploid number of 2n = 24 chromosomes with a variable number of B-chromosomes. The transcriptome and genome of Asian seabass were searched for repetitive elements with experimental and bioinformatics tools. Six different types of repeats constituting 8–14% of the genome were characterized. Repetitive elements were clustered in the pericentromeric heterochromatin of all chromosomes, but some of them were preferentially accumulated in pretelomeric and pericentromeric regions of several chromosomes pairs and have chromosomes specific arrangement. From the dispersed class of fish-specific non-LTR retrotransposon elements Rex1 and MAUI-like repeats were analyzed. They were wide-spread both in the genome and transcriptome, accumulated on the pericentromeric and peritelomeric areas of all chromosomes. Every analyzed repeat was represented in the Asian seabass transcriptome, some showed differential expression between the gonads. The other group of repeats analyzed belongs to the rRNA multigene family. FISH signal for 5S rDNA was located on a single pair of chromosomes, whereas that for 18S rDNA was found on two pairs. A BAC-derived contig containing rDNA was sequenced and assembled into a scaffold containing incomplete fragments of 18S rDNA. Their assembly and chromosomal position revealed that this part of Asian seabass genome is extremely rich in repeats containing evolutionarily conserved and novel sequences. In summary, transcriptome assemblies and cDNA data are suitable for the identification of repetitive DNA from unknown genomes and for comparative investigation of conserved elements between teleosts and other vertebrates.
doi:10.3389/fgene.2014.00223
PMCID: PMC4110674  PMID: 25120555
repeated DNA; Asian seabass; transcriptome; rDNA; chromosomes
21.  A Single Molecule Scaffold for the Maize Genome 
PLoS Genetics  2009;5(11):e1000711.
About 85% of the maize genome consists of highly repetitive sequences that are interspersed by low-copy, gene-coding sequences. The maize community has dealt with this genomic complexity by the construction of an integrated genetic and physical map (iMap), but this resource alone was not sufficient for ensuring the quality of the current sequence build. For this purpose, we constructed a genome-wide, high-resolution optical map of the maize inbred line B73 genome containing >91,000 restriction sites (averaging 1 site/∼23 kb) accrued from mapping genomic DNA molecules. Our optical map comprises 66 contigs, averaging 31.88 Mb in size and spanning 91.5% (2,103.93 Mb/∼2,300 Mb) of the maize genome. A new algorithm was created that considered both optical map and unfinished BAC sequence data for placing 60/66 (2,032.42 Mb) optical map contigs onto the maize iMap. The alignment of optical maps against numerous data sources yielded comprehensive results that proved revealing and productive. For example, gaps were uncovered and characterized within the iMap, the FPC (fingerprinted contigs) map, and the chromosome-wide pseudomolecules. Such alignments also suggested amended placements of FPC contigs on the maize genetic map and proactively guided the assembly of chromosome-wide pseudomolecules, especially within complex genomic regions. Lastly, we think that the full integration of B73 optical maps with the maize iMap would greatly facilitate maize sequence finishing efforts that would make it a valuable reference for comparative studies among cereals, or other maize inbred lines and cultivars.
Author Summary
The maize genome contains abundant repeats interspersed by low-copy, gene-coding sequences that make it a challenge to sequence; consequently, current BAC sequence assemblies average 11 contigs per clone. The iMap deals with such complexity by the judicious integration of IBM genetic and B73 physical maps, but the B73 genome structure could differ from the IBM population because of genetic recombination and subsequent rearrangements. Accordingly, we report a genome-wide, high-resolution optical map of maize B73 genome that was constructed from the direct analysis of genomic DNA molecules without using genetic markers. The integration of optical and iMap resources with comparisons to FPC maps enabled a uniquely comprehensive and scalable assessment of a given BAC's sequence assembly, its placement within a FPC contig, and the location of this FPC contig within a chromosome-wide pseudomolecule. As such, the overall utility of the maize optical map for the validation of sequence assemblies has been significant and demonstrates the inherent advantages of single molecule platforms. Construction of the maize optical map represents the first physical map of a eukaryotic genome larger than 400 Mb that was created de novo from individual genomic DNA molecules.
doi:10.1371/journal.pgen.1000711
PMCID: PMC2774507  PMID: 19936062
22.  Evolutionary and comparative analyses of the soybean genome 
Breeding Science  2012;61(5):437-444.
The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods.
doi:10.1270/jsbbs.61.437
PMCID: PMC3406793  PMID: 23136483
Glycine max; soybean; legume evolution; polyploidy; SoyBase; Legume Information System; Legumebase; Phytozome
23.  The unique genomic landscape surrounding the EPSPS gene in glyphosate resistant Amaranthus palmeri: a repetitive path to resistance 
BMC Genomics  2017;18:91.
Background
The expanding number and global distributions of herbicide resistant weedy species threaten food, fuel, fiber and bioproduct sustainability and agroecosystem longevity. Amongst the most competitive weeds, Amaranthus palmeri S. Wats has rapidly evolved resistance to glyphosate primarily through massive amplification and insertion of the 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) gene across the genome. Increased EPSPS gene copy numbers results in higher titers of the EPSPS enzyme, the target of glyphosate, and confers resistance to glyphosate treatment. To understand the genomic unit and mechanism of EPSPS gene copy number proliferation, we developed and used a bacterial artificial chromosome (BAC) library from a highly resistant biotype to sequence the local genomic landscape flanking the EPSPS gene.
Results
By sequencing overlapping BACs, a 297 kb sequence was generated, hereafter referred to as the “EPSPS cassette.” This region included several putative genes, dense clusters of tandem and inverted repeats, putative helitron and autonomous replication sequences, and regulatory elements. Whole genome shotgun sequencing (WGS) of two biotypes exhibiting high and no resistance to glyphosate was performed to compare genomic representation across the EPSPS cassette. Mapping of sequences for both biotypes to the reference EPSPS cassette revealed significant differences in upstream and downstream sequences relative to EPSPS with regard to both repetitive units and coding content between these biotypes. The differences in sequence may have resulted from a compounded-building mechanism such as repetitive transpositional events. The association of putative helitron sequences with the cassette suggests a possible amplification and distribution mechanism. Flow cytometry revealed that the EPSPS cassette added measurable genomic content.
Conclusions
The adoption of glyphosate resistant cropping systems in major crops such as corn, soybean, cotton and canola coupled with excessive use of glyphosate herbicide has led to evolved glyphosate resistance in several important weeds. In Amaranthus palmeri, the amplification of the EPSPS cassette, characterized by a complex array of repetitive elements and putative helitron sequences, suggests an adaptive structural genomic mechanism that drives amplification and distribution around the genome. The added genomic content not found in glyphosate sensitive plants may be driving evolution through genome expansion.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-3336-4) contains supplementary material, which is available to authorized users.
doi:10.1186/s12864-016-3336-4
PMCID: PMC5240378  PMID: 28095770
Weedy species; Herbicide resistance; Amaranthus palmeri; EPSPS cassette; Transposable elements; Adaptive evolution
24.  Genome-wide analysis of MATE transporters and expression patterns of a subgroup of MATE genes in response to aluminum toxicity in soybean 
BMC Genomics  2016;17:223.
Background
Multidrug and toxic compound extrusion (MATE) family is an important group of the multidrug efflux transporters that extrude organic compounds, transporting a broad range of substrates such as organic acids, plant hormones and secondary metabolites. However, genome-wide analysis of MATE family in plant species is limited and no such studies have been reported in soybean.
Results
A total of 117 genes encoding MATE transporters were identified from the whole genome sequence of soybean (Glycine max), which were denominated as GmMATE1 - GmMATE117. These 117 GmMATE genes were unevenly localized on soybean chromosomes 1 to 20, with both tandem and segmental duplication events detected, and most genes showed tissue-specific expression patterns. Soybean MATE family could be classified into four subfamilies comprising ten smaller subgroups, with diverse potential functions such as transport and accumulation of flavonoids or alkaloids, extrusion of plant-derived or xenobiotic compounds, regulation of disease resistance, and response to abiotic stresses. Eight soybean MATE transporters clustered together with the previously reported MATE proteins related to aluminum (Al) detoxification and iron translocation were further analyzed. Seven stress-responsive cis-elements such as ABRE, ARE, HSE, LTR, MBS, as well as a cis-element of ART1 (Al resistance transcription factor 1), GGNVS, were identified in the upstream region of these eight GmMATE genes. Differential gene expression analysis of these eight GmMATE genes in response to Al stress helps us identify GmMATE75 as the candidate gene for Al tolerance in soybean, whose relative transcript abundance increased at 6, 12 and 24 h after Al treatment, with more fold changes in Al-tolerant than Al-sensitive cultivar, which is consistent with previously reported Al-tolerance related MATE genes.
Conclusions
A total of 117 MATE transporters were identified in soybean and their potential functions were proposed by phylogenetic analysis with known plant MATE transporters. The cis-elements and expression patterns of eight soybean MATE genes related to Al detoxification/iron translocation were analyzed, and GmMATE75 was identified as a candidate gene for Al tolerance in soybean. This study provides a first insight on soybean MATE family and their potential roles in soybean response to abiotic stresses especially Al toxicity.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-2559-8) contains supplementary material, which is available to authorized users.
doi:10.1186/s12864-016-2559-8
PMCID: PMC4788864  PMID: 26968518
Abiotic stress; Aluminum toxicity; cis-element; Duplication; Expression analysis; MATE; Phylogenetic analysis; Soybean
25.  Sequence-based ultra-dense genetic and physical maps reveal structural variations of allopolyploid cotton genomes 
Genome Biology  2015;16(1):108.
Background
SNPs are the most abundant polymorphism type, and have been explored in many crop genomic studies, including rice and maize. SNP discovery in allotetraploid cotton genomes has lagged behind that of other crops due to their complexity and polyploidy. In this study, genome-wide SNPs are detected systematically using next-generation sequencing and efficient SNP genotyping methods, and used to construct a linkage map and characterize the structural variations in polyploid cotton genomes.
Results
We construct an ultra-dense inter-specific genetic map comprising 4,999,048 SNP loci distributed unevenly in 26 allotetraploid cotton linkage groups and covering 4,042 cM. The map is used to order tetraploid cotton genome scaffolds for accurate assembly of G. hirsutum acc. TM-1. Recombination rates and hotspots are identified across the cotton genome by comparing the assembled draft sequence and the genetic map. Using this map, genome rearrangements and centromeric regions are identified in tetraploid cotton by combining information from the publicly-available G. raimondii genome with fluorescent in situ hybridization analysis.
Conclusions
We report the genotype-by-sequencing method used to identify millions of SNPs between G. hirsutum and G. barbadense. We construct and use an ultra-dense SNP map to correct sequence mis-assemblies, merge scaffolds into pseudomolecules corresponding to chromosomes, detect genome rearrangements, and identify centromeric regions in allotetraploid cottons. We find that the centromeric retro-element sequence of tetraploid cotton derived from the D subgenome progenitor might have invaded the A subgenome centromeres after allotetrapolyploid formation. This study serves as a valuable genomic resource for genetic research and breeding of cotton.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0678-1) contains supplementary material, which is available to authorized users.
doi:10.1186/s13059-015-0678-1
PMCID: PMC4469577  PMID: 26003111

Results 1-25 (1310071)