A computer implemented method integrates data from remote disparate data sources by processing a non-transitory media. The non-transitory media stores instructions for detecting data sets in different formats hosted in a plurality of heterogeneous databases that are accessible through a distributed network. The method extracts schema data from the plurality of heterogeneous databases and identifies related fields in two or more of the heterogeneous databases. The method links the related fields in the two or more of the plurality of heterogeneous databases and makes the data accessible through a virtual warehouse. As schemas change, as new data sources and analysis artifacts are created, the computer implemented method and system can act as a meta-data store, a provenance tracking device, and/or a knowledge management service.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT
 The invention was made with United States government support under Contract No. DE-AC05-000R22725 awarded by the United States Department of Energy. The United States government has certain rights in the invention.