PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bmcgenoBioMed Centralsearchsubmit a manuscriptregisterthis articleBMC Genomics
 
BMC Genomics. 2009; 10: 317.
Published online Jul 16, 2009. doi:  10.1186/1471-2164-10-317
PMCID: PMC2724416
Identification and characterization of pseudogenes in the rice gene complement
Françoise Thibaud-Nissen,1,2 Shu Ouyang,1,3 and C Robin Buellcorresponding author1,4
1The J. Craig Venter Institute, 9712 Medical Center Dr, Rockville, MD 20850 USA
2Current address: National Center for Biotechnology Information, National Institutes of Health, 9000 Rockville Pike, Bethesda MD 20892 USA
3Current address: Suite 205, 1003 W. 7th Street, Frederick, MD 21701 USA
4Department of Plant Biology, Michigan State University, East Lansing, MI 48824 USA
corresponding authorCorresponding author.
Françoise Thibaud-Nissen: thibaudf/at/ncbi.nlm.nih.gov; Shu Ouyang: ouyangsn/at/mail.nih.gov; C Robin Buell: buell/at/msu.edu
Received December 11, 2008; Accepted July 16, 2009.
Abstract
Background
The Osa1 Genome Annotation of rice (Oryza sativa L. ssp. japonica cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1 Release 5 were investigated as potential pseudogenes as these genes exhibit at least one feature potentially indicative of pseudogenes: lack of transcript support, short coding region, long untranslated region, or, for genes residing within a segmentally duplicated region, lack of a paralog or significantly shorter corresponding paralog.
Results
A total of 1,439 pseudogenes, identified among genes with pseudogene features, were characterized by similarity to fully-supported gene models and the presence of frameshifts or premature translational stop codons. Significant difference in the length of duplicated genes within segmentally-duplicated regions was the optimal indicator of pseudogenization. Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events. A total of 12% of the pseudogenes were expressed. Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes.
Conclusion
These pseudogenes still have a detectable open reading frame and are thus distinct from pseudogenes detected within intergenic regions which typically lack definable open reading frames. Families containing the highest number of pseudogenes are fast-evolving families involved in ubiquitination and secondary metabolism.
Articles from BMC Genomics are provided here courtesy of
BioMed Central