'Core' Knowledge Bases

The core structure for KBpedia is derived from seven (7) main knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL. The conceptual relationships in the KBpedia Knowledge Ontology (KKO) are largely drawn from OpenCyc and UMBEL, though any of the other sources may contribute local knowledge graph structure. Additional reference concepts (RCs) are contributed primarily from Wikipedia, schema.org and GeoNames. Wikidata contributes the bulk of the instance data, though instance records are actually drawn from all sources. DBpedia, Wikidata, and schema.org are also the primary sources for attribute characterizations of the instances.

Two characteristics define what is a core contributor to the KBpedia structure: 1) the scale and completeness of the source; and 2) its contribution of a large number of RCs to the overall KKO knowledge graph. Here is more discussion of these seven core knowledge bases:

Wikipedia Wikipedia is a free Internet encyclopedia written and maintained by volunteer editors. Wikipedia is the largest and most popular general reference work on the Internet and is ranked among the ten most popular websites. It has more than 200 language versions; the English version has more than 5 million articles. The English Wikipedia is the main source of reference concept content in KBpedia and the source of most text. Though local relationships may be used, KBpedia replaces the Wikipedia category structure with its own KBpedia Knowledge Ontology (KKO). Concept and entity linkages (via Wikipedia articles) provide the means for adopting multiple language versions of KBpedia.
Wikidata Wikidata is a collaboratively edited knowledge base of (mostly) entity records, intended to provide a common source of data that can be used by Wikimedia projects such as Wikipedia. There are more than 30 million records in Wikidata, which are also linked to key Wikipedia articles. Many are also linked to Wikimedia images. Wikidata is the primary source of entity and attribute data and images in KBpedia. Most Wikidata records are multilingual, which also provides the means for adopting multiple language versions of KBpedia.
schema.org schema.org is a collaborative, community vocabulary with a mission to create, maintain, and promote schemas for structured data on the Internet in Web pages and email. The schema.org controlled vocabulary covers entities and relationships between entities and actions; it has a well-documented extension model for enlarging the vocabulary for domain-specific purpuses. There are more than 800 types in the vocabulary. Over 10 million sites use schema.org to markup their content. schema.org can be used with many different encodings, including RDFa, Microdata and JSON-LD. Founded by the major search engines of Google, Microsoft, Yahoo and Yandex, these sources and many others are using schema.org to power rich, extensible applications.
DBpedia DBpedia is a project that extracts structured content from the infoboxes on Wikipedia and makes it available using semantic technologies. DBpedia is the source of much of the structured information for Wikipedia concepts and entities contained within KBpedia. The separate DBpedia ontology is also mapped to KBpedia as an external linkage.
GeoNames The GeoNames knowledge base contains over 10,000,000 geographical names corresponding to over 7,500,000 unique features. All features are categorized into one of nine feature classes and further subcategorized into one of 645 feature codes. Beyond names of places in various languages, data stored include latitude, longitude, elevation, population, administrative subdivision and postal codes. All coordinates use the World Geodetic System 1984 (WGS84). GeoName's feature codes were used to organize all of the geographical entity types (geopolitical, settlements, natural areas, and developed areas) in KBpedia.
OpenCyc Cyc is an artificial intelligence project that combines a comprehensive ontology with a knowledge base of everyday common sense. OpenCyc is an open source part of Cyc with more than 300,000 concepts. Cyc has been manually developed since 1984 by Cycorp of Austin, TX. It has been developed and refined through more than 1,000 person-years of effort. OpenCyc provides most of the starting reference concepts and relationships used in the KBpedia Knowledge Ontology (KKO) knowledge, especially in the areas of entity types and in the KBpedia typologies. Though OpenCyc is English only, KKO is multi-lingual, using the Cyc concepts as a language-neutral backbone. As of March 2017, OpenCyc has been removed from the public domain, though thousands of earlier copies remain.
UMBEL UMBEL (Upper Mapping and Binding Exchange Layer) is a logically organized knowledge graph used to aid data interoperability. UMBEL has two means to promote the semantic interoperability of information. It is 1) an ontology of about 35,000 reference concepts, designed to provide common mapping points for relating different ontologies or schema to one another, and 2) a vocabulary for making such mappings, particularly to external sources. The UMBEL design that provides splits between relations, attributes, concepts and entity types (and typologies) is a key intellectual underpinning to the KBpedia system.