Data Integration

KBpedia is a computable scaffolding that is designed to be extended with your own domain schema, vocabularies, and data. The basic process is to decide which portions or all of KBpedia to use as a starting foundation; to map in your own schema and vocabularies in order to create your own extended knowledge graph; and then to incorporate your own instance data leveraging attributes and relations to automate best placements. Some ETL and staging is generally necessary for the data migration.

The largely automatic placements are then followed with the semi-automatic review of final assignments before committing for deployment. These processes need to be embedded into a repeatable workflow with appropriate governance controls. Some machine learning tests or other analytics may be inserted at multiple points to speed processing and increase scope. Each version needs to be tested for coherence and logical consistency before deployment. All of these various steps need to be assisted by processing scripts. The overall workflows and scripts are documented using notebooks such as org-mode or Jupyter.

Some of the current use cases relevant to data integration are: