|Home | About | Journals | Submit | Contact Us | Français|
This data article contains multi-species alignments of the regulatory region of C. elegans LIM-HOX gene lin-11 and lists of transcription factors that are predicted to bind to lin-11 enhancers and regulate expression in amphid neurons. For further details and experimental findings please refer to the article by Amon and Gupta in Developmental Biology (S. Amon, B.P. Gupta, 2017) .
Value of the data
Six Caenorhabditis species were used to perform sequence alignments of lin-11 intron 3. These are C. briggsae, C. sinica, C. nigoni, C. remanei, C. brenneri, and C. elegans. MussaGL program (http://woldlab.caltech.edu/cgi-bin/mussa) was used at 70% and 80% window thresholds. Multiple alignments were carried out that included C. briggsae and C. nigoni (Fig. 1). In general, conservation decreases as the number of species and alignment threshold are increased. Four-way alignments reveal six distinct conserved blocks at 70% threshold. Some of these blocks are part of larger stretches in 2-way and 3-way alignments. At 80% threshold block 2 lacks conservation when either one of the C. remanei, C. brenneri and C. elegans species are included. Additionally, block 1 is lost in the case of C. elegans. Of the three sequence blocks described in the accompanied article , namely, C3-1, C3-2 and C3-3 that are conserved between C. elegans, C. brenneri, C. remanei and C. briggsae, block 3 corresponds to C3-1, block 5 to C3-2, and block 6 to C3-3 (Fig. 1).
We used a computational tool CIS-BP (http://cisbp.ccbr.utoronto.ca/TFTools.php)  to search for transcription factors (TFs) that may bind to conserved blocks in introns 3 (C3-1) and 7 (C7-1 and C7-2) and potentially lin-11 expression in neurons. A total of 35 TF genes were identified for C3-1, 37 for C7-1, and 46 for C7-2 (Table 1). In addition, we searched for modENCODE dataset (http://www.modencode.org) and found eight TFs that bind to intron 7 sequences Table 1).
The lin-11 intronic sequences from Caenorhabditis species were aligned using MussaGL (multi-species sequence analysis, version 1.1.0 for Mac OS X), an N-way sequence alignment software that was developed by Wold lab (Caltech, USA). The conservation threshold was set at 70% (21 per 30-nucleotide sliding window) and 80% (27 per 30-nucleotide sliding window).
To identify the putative TF genes for C. elegans introns 3 and 7, we used the CIS-BP database software. The setting included motif model ‘PWMs-LogOdds’ and Threshold 8. According to the website, this motif model option scores each position in each sequence with all position weight matrices, using a standard log odds scoring method. For more details see the help page on the website (http://cisbp.ccbr.utoronto.ca/help.html).
We are grateful to Erich Schwarz, Da Yin, Caitlin Schartner, Edward Ralston, Asher Cutter, Barbara Meyer, and Eric Haag for providing the C. nigoni lin-11 sequence data. This work was supported by the Natural Sciences and Engineering Research Council of Canada Discovery grant to BG.
Appendix ASupplementary data associated with this article can be found in the online version at doi:10.1016/j.dib.2017.03.027.
Supplementary material Excel file containing lists of putative lin-11-regulating TF genes based on CIS-BP searches and modENCODE Chip-Seq data..