1. Illumina sequencing and sequence assembly
We dissected a complete adult O. vulgaris
for CNS, including cerebral ganglion, visceral ganglion, pedal ganglion, and part of the peripheral nervous system (). The non-normalized cDNA library is constructed following the manufacturer's instructions (Illumina). High-throughput SOLEXA paired-end sequencing yielded a total of 13,753,396 reads with length of 90 bp and Q20 percentage equal 96.38%. 834,465 sequences were removed because of low quality. These high-quality sequences were assembled with VELVET 
and a summary table for assembly was deposited to Table S1
. For best assembly results, four important parameters were tested, including K-mers, coverage, minimum contigs length, and numbers of contigs. 12,918,931 reads were assembled into 59,859 contigs with a maximum length of 8,970 bp and the N50 length of 450 bp. Then the CD-HIT program 
was performed and the results showed that only 48 (<0.1%) sequences had significant similarities (>98% identity and >95% coverage) against other sequences within the dataset, indicating the assembly was accomplished and high quality. The length distribution for all contigs is presented in . Of these, a total of 31,315 open reading frames (ORFs≥50 aa) have been detected (Figure S1
Detection of the central nervous system (CNS) of O. vulgaris.
Length distribution of contigs obtained from O. vulgaris central nervous system (CNS) transcriptome library.
2. BLASTX searches in Swissprot and NR protein database
Contig gene name annotation was archived through BLASTx and BLASTn searches against the Swissprot,NR (NCBI non-redundant protein database), and NT (NCBI nucleotide sequences database). Using Perl, the description of the most relevant hits with E-value less than 1e−5 were assigned to the query sequences. This search revealed that only 10,412 (17.39%) and 1,815 (3.03%) sequences have a significance blast hit (1e−5). The reason why most of the sequences (79.58%) didn't have a significant blast hit is probably caused by potentially novel genes and a lack of molecular information of closely related species in Cephalopods.
is a list of the 20 sequences, sorted by read number, which revealed the most expressed proteins or peptides. All sequences with high read number have been annotated. This result may illustrate the vivid biological characteristics of the CNS transcriptome, including several neurohormones, neuro-consisted proteins and stress-response proteins. Some proteins may involve features specific to O. vulgaris, such as fatty acid-binding proteins and apolopophorins which reflect the strong lipid metabolism of O. vulgaris, and retinol dehydrogenase which convert retinol into retinal to maintain vision.
List of top 20 sequences sorted by reads.
3. Functional annotation based on GO and KEGG analysis
In order to determine the functions of these sequences, 12,227 genes were selected to annotate with the GO database by Gominer 
. shows the gene number of GO functional annotation analysis in level 2.
Distribution of second level GO annotation in three categories.
By selecting the defining GO term with keywords “neuro” and “nervous”, more than 50 GO terms are involved and their matching genes ranged from 1 to 338. The largest GO categorie related to CNS function is nervous system developments (GO index: 0007399). presents the number of genes matched in 15 GO terms that are relevant to neuronal functions and have the largest numbers matching, demonstrating our transcript database has a great number of neural sequences that will provide a great resource for further research of the development, function and regulatory mechanisms of the O. vulgaris central nervous system.
GO categories with highest number of sequences corresponding to CNS function.
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a pathway-based categorization of orthologous genes that provides useful information for predicting functional profiles of genes 
. To further demonstrate the biological pathways that are active in the CNS of O. vulgaris
, 3,825 genes were mapped into the signaling pathways in the KEGG.
4. Comparisons with the previously published Mollusc species A. californica and L. stagnalis CNS database
The published CNS transcriptome database of A. californica
and L. stagnalis
give us an opportunity to understand the diversity between O. vulgaris
and other Mollusk species. and provides the data of BLASTn or tBLASTx hits between O. vulgaris
and L. stagnalis
(or A. californica
). In the comparison between O. vulgaris
and L. stagnalis
ESTs (10,375 ESTs which assemble into 7,712 unique sequences) by BLASTn and tBLASTx, at E-value cutoff of 1e−5
, about 7.83% (BLASTn) and 26.53% (tBLASTx) of L. stagnalis
cDNAs have hits in O. vulgaris
CNS library. In the mean time, approximately 1.03% (BLASTn) and 5.77% (tBLASTx) of the sequences in O. vulgaris
CNS library have a hit in L. stagnalis
cDNAs. Considering these two databases are constructed from CNS tissues, the main cause of low percentages of blast hits could be the difference in their genetic codes. In addition, these results may be explained by: 1) the sequences below 200 bp are unlikely to have a blast hit 2) it is possible that some genes exist in both species, but have expression levels below the detection limit of the transcriptome studies, and 3) differing expression profiles between different species or developmental stages.
Comparison between the O. vulgaris CNS dataset and L. stagnalis CNS dataset using BLASTn and tBLASTx.
Comparison between the O. vulgaris CNS dataset and A. californica CNS dataset using BLASTn and tBLASTx.
The CNS database of A. californica contains two small EST databases: Normalized from Pedal-Pleural ganglia and normalized CNS library (juvenile 1), which provided 40607 sequences after assemble. 3.97% (BLASTn) and 18.00% (tBLASTx) A. californica ESTs have a hit in our transcriptome data, while 1.90% (BLASTn) and 12.85% (tBLASTx) of O. vulgaris sequences have a significant match in A. californica. The low similarity may be due to: 1) the huge diversity of the two species; 2) the CNS database of A. californica contains both adult and juvenile CNS tissue (the expression of genes may vary at different growth stages); and 3) though the two parts of A. californica database are normalized, some sequences may exist in both databases that significantly downgrade the hit percentage of O. vulgaris cDNAs to the A. californica ESTs database.
5. Comparison with CNS sequence databases of other model organisms
Meanwhile, we chose three different CNS databases belonging to a fish (Tilapia
, an arthropod (Schistocerca gregaria
, and an annelid (Hirudo medicinalis
, all of which were recently published in NCBI, to discuss the diversity among these model organisms in different phylums. We ran BLASTn/tBLASTx searches against the CNS database of fish, dessert locust, and leech. Results are shown in . With E-value cutoff 1e−5
, 3544 cDNAs have a hit in Tilapia, the number of hit in other organisms are 5870(Locust) and 6568(Leech). Considering the scale of these CNS databases and the Mollusk species L. stagnalis
EST database, these hit percentages suggest that O. vulgaris
has similarity with all organisms except Annelid (H. medicinalis
). This illustrates that O. vulgaris
may have both original and evolution characters at the transcriptome level, and this comparison will be helpful for further research in homologous or non-homologous genes of O. vulgaris
Comparison between O. vulgaris dataset and four different species CNS datasets using BLASTn and tBLASTx.
shows the distribution of sequence hits in the locust, fish and leech CNS database. 8,695 sequences match the condition and only 2,213 (25.45%) sequences exist in both 3 databases that are much lower than the same comparison in post studies, which indicate the low percentage of highly conserved genes. This comparison reflects O. vulgaris' large number of homologous genes in both vertebrates and invertebrates, indicating that it may be a good model organism for studying neurobiology.
Distribution of BLASTn and tBLASTx hits of O. vulgaris sequences in the three organisms (green: H. medicinalis, blue: S. gregaria, red: Tilapia) CNS datasets with E-value threshold of 1e−5.
6. Phylogenetic analysis of two CNS-related genes
We identified genes related to nervous system functions that existed in the four databases (Leech, Mollusks, Locust and Fish) by applying a Perl script to select the genes meeting the conditions of length (>1,000 bp), E-value (<1e−10
), and coverage (>30). Then we built the corresponding phylogenetic trees. The tree in is generated by synaptotagmin-7; this protein is a member in the synaptotagmin family believed to be important in the docking and fusion of synaptic vesicles with the plasma membrane, such as neurotransmitter release 
. We found the sequence of O. vulgaris
is most related to the Mollusk species A. californica
. Synaptotagmin-7 widely exists in metazoan and the phylogenetic tree performs well in sorting different categories of animal, indicating that it is a good choice for explaining system evolution. Furthermore, the synaptotagmin-7 of Mollusks is more closely related to arthropods and fish that consistently use the results in comparison to the CNS database between O. vulgaris
Phylogenetic tree of synaptotagmin-7.
We further chose synaptophysin because they were found in A. californica
and L. stagnalis
and are correspond to the phylogenetic tree (). Synaptophysin is a protein that acts as a marker for neuroendocrine tumors and quantification of synapses 
. Even without an ortholog gene in A. californica
or L. stagnalis
, the result still reveals the closer relation of O. vulgaris
Phylogenetic tree of synaptophysin.
7. A putative vertebrate-like Blood-Brain Barrier in O. vulgaris
Cephalopods have a well-developed nervous system, reflected in its precise internal structure and clear healing between ganglions covered with cerebral cortex. In addition, around the CNS there is a cartilaginous skull for protection. These characteristics are similar to vertebrates, indicating that the Cephalopods may be an evolutionary transition to the brain functions of vertebrates. By studying these transitions, we can clearly understand the process of its occurrence. The Blood-Brain Barrier is a kind of internal barrier system related to internal immunity that blocks pathogenic microorganisms and other macromolecules through the blood circulation into brain tissue to maintain the basic stability of the internal environment. It also has the important biological role of maintaining the normal physiological state of the central nervous system 
. The vertebrate Blood-Brain Barrier has three histological bases: brain microvascular endothelial cells (BMVEC) and junctions between BMVEC, continuous basement membrane around the BMVEC and five different types of neighboring glial cells such as astrocytes, perivascular pericytes, microglia, and surrounding neurons. In addition, to ensure that central neurotransmitters are not able to pass the BBB and maintain stability of neurotransmitter concentrations, BMEVC has a unique enzyme system to inactivate the central neurotransmitters such as monoamine oxidase 
, AAAD, and COMT. We hope to interrogate our CNS dataset and compare it with three other invertebrate model organisms, to find whether Cephalopods have specific gene expression indicative of a vertebrate-like BBB.
The related genes involving junctions between EC, pathways across BBB, and the enzyme barrier system were selected for clarifying whether the molecular foundation of vertebrate-like BBB exists in four invertebrates (O. vulgaris, A. californica, S. gregaria, H. medicinalis
). First, we focused on all of the important genes involved in tight junctions (TJ) and adherence junctions (AJ) structure such as Claudins 
, Occludin, Junctional adhesion molecules(JAM) 
, Cytoplasmatic proteins 
and Cadherins 
, Catenins 
. At the beginning we searched the gene name annotation result of O. vulgaris
, two TJ related genes and two AJ related genes were found when E-value<e−10
. In addition, we carried out a tBLASTn search by using the amino sequences of the TJ and AJ related genes downloaded from NCBI against all CNS Datasets. These results were filtered with a Perl script including a number of strict conditions to ensure accuracy: E-value<e−10
, the number of amino acids that align to the query sequences >100 and covered more than 80% of subject sequences. After two screening steps, O. vulgaris
cDNA hit nine of twelve proteins. Considering the alignment features of BLAST and that some proteins have a low conservation between different species, these results indicate that the number of actual proteins will be more than its hit number. This result reveals that O. vulgaris
has complete BMVEC junctions. After the same filter steps, A. californica
cDNA hit only five proteins, S. gregaria
and H. medicinalis
hit four illustrated that the rest of organisms do not have a significant matching in proteins involving junctions between EC ().
List of alignment results of proteins related to TJ and AJ.
Besides specific junctions, the elaborate systems for transporting macromolecules and elimination of neurotransmitter are important part of BBB functions 
. In O. vulgaris
we found all of the transporters and enzymes that were well-studied exist after the filter steps, except the Caveolae 
. While in A. californica
and S. gregaria
, the miss of glucose transporter-1 
reveals the absence of vertebrate-like BBB in these two species. It is impossible for the CNS to function with no energy supply when a physical barrier is established 
. Another result from exploring enzymes such as monoamine oxidase 
, COMT 
, and AAAD 
also suggests that O. vulgaris
has a thorough enzyme system for eliminating the cause of neurotransmitter feedback on the central nervous system. The other organisms are keeping their poor performance in searching the specific transporters and enzymes ().
List of the alignment results of proteins involved in specific transporters and enzymes.
Furthermore,we investigated the profiles of genes involved in TJ signaling pathway, which is considered to responsible for the barrier properties, between O. vulgaris
and A. californica
). In default parameters, the number of genes that mapped into the TJ signaling pathway were different (V:47, A:32). Meanwhile this comparative analysis presented three important unique genes in O. vulgaris:
two transmembrane proteins (JAM and Claudins) and a cytoplasmic TJ accessory protein ZO-1. Claudins are considered to be responsible for permeability restriction in TJ 
and JAM is involved in various of TJ function such as cell-to-cell adhesion, organizing structure, taking part in the formation of TJ as an integral membrane protein together with claudins 
. ZO-1 existed as a carboxy-terminal region, which binds to actin and links the TJ to the cytoskeleton, acts as a central organizer of the TJ complex 
. This result clearly implied that O. vulgaris
have a more complex and integrated TJ functions than the model organism A. californica
To verify the constitutive expression of the genes corresponding to the BBB structure and function, specific primers were designed based on assembled contigs and quantitative real-time PCR were performed. All of the genes showed a ubiquitous expression in all examined tissues, including brain, liver, heart, gill and muscle (Figure S3
). The transcription pattern of these BBB relevant genes had been determined in vertebrates and different kinds of tissues indicated that these genes may not only be involved in BBB but also participated in other physiological functions. For example, ZO-1 mostly expressed in endothelial and epithelial cells forming the TJ assembly, but it still expressed in other tissues not forming TJ that may be involved in signal transduction at cell-cell junctions 
. Glucose transporter 1 expressed in erythrocytes and also in the endothelial cells of barrier tissues, it also has been identified in muscle, fat and tissues with acute insulin-stimulated glucose transport 
. Monoamine oxidases are found in neurons, astroglia and also found in the liver, gastrointestinal tract, and placenta that catalyze the oxidative deamination of monoamines 
. The same results that are similar to above were displayed in other genes related to TJ (myosin 
), AJ (α-Catenin and β-Catenin 
, Cadherins 
), specific transporters (P-glyprptein 
, Multidrug resistance-associated protein 
, Organic anion transporter 
), and enzymes (Catechol-O-methyltransferase 
, Aromatic L-amino acid decarboxylase 
). These observations not only implied the accuracy of contig assembly, but also demonstrated that as vertebrates, all the target genes can be expressed in different tissues including CNS in O. vulgaris
It is certainly to be noted that the CNS of O. vulgaris has a large number of proteins involved in specific junctions, transporters, and enzymes which are definitely indispensable to form an incredible system that may possesses most vertebrate BBB functions. The results in A. californica and S. gregaria indicate that the species with open vascular system may have a different strategy for protecting the basic stability of the internal environment like D. melanogaster. The low hit percentage in H. medicinalis demonstrated that although H. medicinalis have a similar circulatory system, the loose structure of central nervous system limited the development of vertebrate-like BBB system. Based on the results above, only O. vulgaris has the molecular basis of the vertebrate-like BBB, highlighting its use as a model organism for the in-depth study of phylogenetics, structure and function of the BBB.