Data and Knowledge Structures

Seven, large-scale public knowledge bases are "core" to the KBpedia knowledge structure. These seven sources are:[1] 1) Wikipedia - five million articles that capture the key concepts and entities of the base general knowledge encyclopedia, often including structured data and with many linkages; 2) Wikidata - structured data records for tens of millions of individual entities; 3) schema.org - an entity and property tagging vocabulary sponsored by the leading search engines that is being used by more than 10 million Web sites; 4) DBpedia - a machine-readable version of parts of Wikipedia in RDF; 5) GeoNames - a geographical database of some 10 million places linked to about 800 distinct feature classes; 6) OpenCyc - an extract of Cyc that represents the common sense and vetted relationships amongst KBpedia's base 58,000 concepts; and 7) UNSPSC products and services - an eCommerce and logistics resource using the entire top-three levels of this standard taxonomy from the UN, plus thousands of additional lowest-level products and services.

 
 

Each of these sources has been mapped and re-expressed into the single, coherent knowledge system of KBpedia. The universal categories and semiotic logic of Charles Sanders Peirce have informed the basic organization of KBpedia. The resulting KBpedia knowledge graph is split along the fundamental lines of concepts and topics, entities, events, attributes, annnotations, and relations and their associated natural classifications or types. This resulting combination gives KBpedia a rich set of structural components.

KBpedia is organized into a knowledge graph, KKO, the KBpedia Knowledge Ontology, with an upper structure based on this Peircean logic. KBpedia's knowledge base grammar has also been mapped into the semantic Web language of OWL. Thus, most W3C standards may be applied against the KBpedia structure.

 
 

The resulting, combined structure brings consistency across all source knowledge bases. This "core" structure, in turn, is mapped to a further 20 of the most common ontologies and vocabularies (see here. A diversity of other knowledge structures, such as finite state transducers or specialty lists, are also used internally for efficient selections and manipulations. External and domain data are also transformed into these canonical forms for interacting with the overall structure.

 
 
 
 
1.These references are based on the English versions of Wikipedia, Wikidata, DBpedia, and UNSPSC, though there are versions in up to 200 different human languages for these, depending on source. Given the structure of KBpedia, any of these other languages may be substituted or related to the standard English version. Note that until 2019 KBpedia also mapped to the upper level UMBEL ontology, which was retired in that year. UNSPSC was added in early 2020 to v 2.50.