The rapid diversification of experimental techniques, expertise and public domain data has necessitated a shift away from the traditional institutionally-centric research paradigm. Indeed, an inclination towards comprehensive approaches to biological research on a genome-wide scale dictates that any one single institution may not contain the critical mass of physical and intellectual resources necessary to address certain broad biological questions. We describe herein an approach to this challenge that focuses on the creation of inter-institutional research teams that leverage existing internet technologies to bring together wide-ranging expertise in an efficient and effective analysis system.
While the metaphor of research teams often exists at the institutional or local level they do not exist across several institutions for mostly logistical reasons. Effective distributed collaborations require the implementation of an infrastructure that handles a fundamental array of information processes unique to non-local research communities. Researchers must have mechanisms for exhaustive electronic data storage, curation, and sharing. They must be permitted to make observations about the data and the experimental process, and they must have access to computational tools that assist in the extraction of new knowledge from the common warehouse of shared data. Concurrently, researchers in a distributed collaboration must find the bioinformatics core flexible enough to handle the immense diversity of information produced by modern experimental techniques, and structured enough to enforce machine-readable data types for future analysis. Finally, distributed data systems must meet ease-of-use requirements while simultaneously applying explicit control over who has access to data sets and observations.
The criteria for effective distributed collaborations have been tested in theoretical scenarios [1
] and as limited implementations of expanded and distributed laboratory information management systems (LIMS) [3
], but the literature lacks examples of comprehensive bioinformatics systems that support data collection, curation, and analysis. Here, we utilize an opportunity presented by a federally funded attempt to perform a genome-wide survey of heritable mutant phenotypes in the mouse. The test case for our distributed computational system is the Tennessee Mouse Genome Consortium (TMGC) [5
]. In contrast to other funded phenotyping efforts, the TMGC is unique among organizations in its attempt to use the geographically distributed resources of consortium members to perform domain-specific phenotypic analysis of mutangenized mouse pedigrees (Figure ).
Figure 1 Primary collaboration relationships represented in MuTrack. This is a partial depiction of the collaborative effort to study genome-scale mouse mutagenesis and reveals the complexity of distributed collaborations. Mice are mutagenized at two separate (more ...)
The utility of employing the mouse as a model for human disease is well documented [6
]. Traditional methods of site-directed in vivo
mutagenesis are tedious and require prior knowledge of gene function and location [10
]. Alternative approaches, developed to induce primarily single base pair changes in a genome region of interest [11
], are also effective at producing recessive and dominant heritable mutations in the mouse [12
] but lack the specificity of traditional approaches. As a result, any single mutation event may be silent or effective and may lie within a gene directing a visible phenotypic characteristic, a gene without phenotypic consequence, or in a non-coding region [11
]. In order to produce substantive phenotypic anomalies in large-scale germ-cell strategies, such as N
-nitrosourea (ENU) directed mutagenesis, the production and phenotypic classification
of vast numbers of mouse pedigrees from birth through senescence and death is required.
The system implemented to satisfy this bioinformatics task is named MuTrack, and has evolved into the central mechanism that supports the functions of the broad based TMGC consortium. It resides as a collection of database-backed, on-line analysis tools capable of tracking mouse breeding schemes, the shipment of mutant mice throughout the consortium and the exchange of physical samples, ranging from sperm to histological sections. In total, it collects raw and processed data and observations from the twenty-two discrete phenotype testing domains and provides a real-time statistical analysis of possible phenodeviant mouse lineages based on the collected experimental data. It simultaneously allows member researchers to select mice for secondary and tertiary study to test mutant heritability and provides a means to distribute new mutant strains to researchers outside the collaboration. To date, it has aided in the successful identification of 75 new mutant mouse strains, and has screened more than 22,500 individual mice.
Successful development of heritable mouse mutations will contribute to our understanding of human disease states through the development of new mouse models. Of equal consequence, the implementation of a workable and collaborative data sharing architecture represents a significant advancement in the way researchers bring to bear comprehensive high-throughput analysis in biology's information rich environment.