Machine Learning

The focal objective of KBpedia is to exploit large, public knowledge bases to support artificial intelligence using both supervised and unsupervised machine learning methods. KBpedia is explicitly designed to expose rich and meaningful feature sets to support the broadest range of machine learning methods.


Supervised learning is where positive, labeled examples define the objective function for the machine learner. Accurate labeling is essential — but expensive because it ultimately requires manual vetting. Methods to automate as much of this effort as possible, including providing candidate labels, preparing training sets, or evaluating results, are keys to reducing overall setup costs. Distant supervision is where a knowledge base informs parts of these steps; even sparse portions of the knowledge base can inform semi-supervised learning.

KBpedia is specifically structured to enable meaningful splits across a myriad of dimensions from entities to relations to types that can all be selected to create positive and negative training sets, across multiple perspectives. The disjointedness of the SuperTypes that organize the 58,000 reference concepts in KBpedia provide a powerful selection and testing mechanism. The coherency of KBpedia provides a basis for logic tests to further improve accuracy, including the creation of local gold standards at acceptable cost.

However, KBpedia, with its extremely rich feature sets across all aspects of the knowledge base, is also an excellent basis for unsupervised learning. It is often advisable to include some initial unsupervised learning in a more general supervised learning context.