Scaling the walls of discovery: using semantic metadata for integrative problem solving
Current data integration approaches by bioinformaticians frequently involve extracting data from a wide variety of public and private data repositories, each with a unique vocabulary and schema, via scripts. These separate data sets must then be normalized through the tedious and lengthy process of resolving naming differences and collecting information into a single view. Attempts to consolidate such diverse data using data warehouses or federated queries add significant complexity and have shown limitations in flexibility. The alternative of complete semantic integration of data requires a massive, sustained effort in mapping data types and maintaining ontologies. We focused instead on creating a data architecture that leverages semantic mapping of experimental metadata, to support the rapid prototyping of scientific discovery applications with the twin goals of reducing architectural complexity while still leveraging semantic technologies to provide flexibility, efficiency and more fully characterized data relationships. A metadata ontology was developed to describe our discovery process. A metadata repository was then created by mapping metadata from existing data sources into this ontology, generating RDF triples to describe the entities. Finally an interface to the repository was designed which provided not only search and browse capabilities but complex query templates that aggregate data from both RDF and RDBMS sources. We describe how this approach (i) allows scientists to discover and link relevant data across diverse data sources and (ii) provides a platform for development of integrative informatics applications.