Data and Knowledge Structures

Seven, large-scale public knowledge bases are "core" to the KBpedia knowledge structure. These seven sources are:[1] 1) Wikipedia - five million articles that capture the key concepts and entities of the base general knowledge encyclopedia, often including structured data and with many linkages; 2) Wikidata - structured data records for tens of millions of individual entities; 3) - an entity and property tagging vocabulary sponsored by the leading search engines that is being used by more than 10 million Web sites; 4) DBpedia - a machine-readable version of parts of Wikipedia in RDF; 5) GeoNames - a geographical database of some 10 million places linked to about 800 distinct feature classes; 6) OpenCyc - an extract of Cyc that represents the common sense and vetted relationships amongst KBpedia's base 55,000 concepts; and 7) UMBEL - the initial organizational structure for the knowledge graph.


Each of these sources has been mapped and re-expressed into the single, coherent knowledge system of KBpedia. The universal categories and semiotic logic of Charles Sanders Peirce have informed the basic organization of KBpedia. The resulting KBpedia knowledge graph is split along the fundamental lines of concepts and topics, entities, events, attributes, annnotations, and relations and their associated natural classifications or types. This resulting combination gives KBpedia a rich set of structural components.

KBpedia is organized into a knowledge graph, KKO, the KBpedia Knowledge Ontology, with an upper structure based on this Peircean logic. KBpedia's knowledge base grammar has also been mapped into the semantic Web language of OWL. Thus, most W3C standards may be applied against the KBpedia structure.


The resulting, combined structure brings consistency across all source knowledge bases. This "core" structure, in turn, is mapped to a further 20 of the most common ontologies and vocabularies (see here. A diversity of other knowledge structures, such as finite state transducers or specialty lists, are also used internally for efficient selections and manipulations. External and domain data are also transformed into these canonical forms for interacting with the overall structure.

1.These references are based on the English versions of Wikipedia, Wikidata and DBpedia, though there are versions in up to 200 different human languages. Given the structure of KBpedia, any of these other languages may be substituted or related to the standard English version.