Public online databases 
supporting life sciences research have become valuable resources for researchers depending on data for use in cheminformatics, bioinformatics, systems biology, translational medicine, and drug repositioning efforts, to name just a few of the potential end user groups. Worldwide funding agencies (governments and not-for-profits) have invested in public domain chemistry platforms. In the United States these include PubChem 
, ChemIDPlus 
, and the Environmental Protection Agency's ACToR 
, while the United Kingdom has funded ChEMBL 
and ChemSpider 
, among others, and new databases continue to appear annually 
We have argued recently that the data quality contained within many of these databases is suspect 
and scientists should consider issues of data quality 
when using these resources. By assimilating various data sources together and meshing data on drugs, proteins, and diseases, these various databases and network and computational methods may be useful to accelerate drug discovery efforts. The development of related cheminformatics platforms or derived models without care given to data quality is a poor strategy for long-term science 
as errors become perpetuated in additional databases. There is real evidence that the integration of large, heterogeneous sets of databases and other types of content is “unreasonably effective” at accelerating the conversion of data into knowledge 
. This implies the need for technical and semantic work to bring databases together that were never designed for interoperability 
, which is in itself a significant task 
As we and others have argued previously, there is another dimension to interoperability than technical formats 
and ontological agreement 
, and even in those cases where data are freely available for download and reuse there are often no clear definitions. Many databases simply “cut and paste” prohibitive copyright schema from traditional websites, or fail to address download and reintegration entirely (ibid
). Since copyright law requires explicit permissions in advance to make use of copyrighted works, it is certainly unsafe to assume data licensing rights for any database that does not explicitly allow it.
The availability of data for download and reuse is an important offering to the community, as these data may be used for the purpose of modeling to develop prediction tools 
. In addition, data can be ingested into internal systems inside pharmaceutical companies to mesh with their existing private data 
, including in the expanding Linked Open Data cloud or in freely available online databases, and can be downloaded and used to enhance their content and to establish linking between data. The Open PHACTS project 
utilizes a semantic web approach to integrate chemistry and biology data across a myriad of data sources, including for chemistry ChEBI, ChEMBL, and DrugBank, and for biology UniProt, Wikipathways, and many others. The chemical structure representations are obtained from ChemSpider, which has previously imported the chemical databases and standardized according to their data model and are making the data available as open data to the project. Many of the primary online databases already have multiple links to external systems. This linking may be achieved by using available database services to form transitory links in by, for example, using a chemical representation such as an InChI 
to probe an application programming interface, search for the compound, and generate the linking URL in real time. Commonly, however, the links are more permanent in nature and are generated by downloading data from the various data sources, depositing a subset of the data (generally the chemical compound and associated database identifier), and using the particular database URL structure to form permanent links. This act of download and deposition of multiple data sources is commonly mixing the various licenses, if licenses are even declared, which, in many cases, they are not.
In some ways, there are analogous difficulties in the exchange of computational models like quantitative structure activity relationship (QSAR) datasets 
—while there are efforts to standardize how the data and models are stored, queried, and exchanged, there has been little consideration of licenses required to enable making the sharing of open source models a reality 
. Similarly, one could consider the creation of maps of disease and how they are shared and reused 
in the same manner.
The potential legal fragility of knowledge products derived from online databases with poorly understood licensing for each of the databases is a real problem, and one that will only increase in severity over time. This realization is not novel; indeed, the chemical blogosphere has been host to many discussions regarding the need for clear data licensing definitions on chemistry-related data. Many scientists likely echo these comments, but we will provide some examples. In particular, Peter Murray-Rust 
espouses the value of “open data” 
to the scientific discovery process and encourages clear licensing of all chemistry data according to Open Knowledge Definition (OKD) 
and the Panton Principles 
Herein we provide an extensive background to the intellectual property around data and databases in the sciences involved in drug discovery, those of biology, chemistry, and related fields, as well as discussion of open data licensing, openness, and open license limitations (Text S1
). More importantly, we provide a set of rules that practitioners might apply when making data or databases available via the Internet or mobile apps 
. Our ultimate goal is to illuminate the legal fragility of the database ecosystem in the drug discovery sciences, and to initiate a conversation about creating best practices.