Wikipedia hosts the largest and most prolific community of producers of textual knowledge on the Web. While the volume of knowledge represented within Wikipedia is vast, it is difficult to extract that knowledge in the form of structured data for computation and analysis. Many users within the Wikipedia community have noted the huge potential benefit of integrating a fully semantically aware infrastructure, but previous proposals have involved technical changes that the governing body of Wikipedia has resisted (
8).
Here, we implement a partial solution by decoupling the encoding of semantic relationships from the querying and utilization of those relationships. We provide a mechanism for the massive Wikipedia community to participate in the process of assembling not just a textual resource, but a far more powerful structured knowledge base. Mining and utilizing that knowledge base can then be performed using third-party tools without requiring changes to Wikipedia itself.
Of course, this approach is not without its challenges. Since anyone can create a SWL and there is no way to enforce the use of a particular set of relationship types, it is certain that there will be some inconsistency in how they are used. For example, it is possible that one editor might use the relationship type ‘phosphorylated by’ while another editor might insert ‘phosphate group added by’. Such inconsistencies disrupt semantic queries made over aggregated content such as those available through the Gene Wiki+. However, this is no different from other aspects of Wikipedia articles. Over time, the Wikipedia community has gradually moved toward consensus regarding the use of categories, when to insert normal WikiLinks, when to insert references and how to format them, how to format articles and even how to style the text in the articles. Since Wikipedia is a continuously changing, social artifact there will always be exceptions to the socially defined rules that have emerged to govern its content, but overall, the basic structures remain remarkably stable. If the Wikipedia community takes up SWLs, we expect the same kind of social consensus to emerge with respect to their use. The community will evolve rules defining which properties should apply to which kinds of entities and will police the articles for adherence to these rules in the same way they do now for other article attributes. The key question is whether the community will in fact buy in to the SWL idea.
Right now, there are only a handful of SWLs active on Wikipedia, all of which have been deposited by our team as demonstrations to seed the process. If the SWL concept succeeds, this number should rapidly increase into the tens of thousands, but for that to happen, the Wikipedia editor community must become involved as semantic link authors. In order to recruit this labor, the value of such work needs to be clearly apparent. Since Wikipedia itself does not process the semantic relationships, external applications need to be developed and promoted that make the value of these contributions clear, thereby providing editors with vital positive feedback. Applications like Gene Wiki+ and the infobox-generating userscript provide some first steps in this direction. These applications demonstrate that adding semantic relationships can substantially enhance users interactions with the knowledge in Wikipedia and can enable the production of novel applications relevant to biocuration activities.
Whether these applications will be enough to motivate the Wikipedia community to accept and use SWLs is an open question. In our early interactions with the editor community, there has been some resistance to the use of the SWL template on the grounds that it makes the WikiText more difficult to edit. However, there have also been some enthusiastic responses from community members who see the potential of the idea. As this project unfolds over time, we will work with the editors to come to a solution that the community can wholeheartedly accept.