The core structure for KBpedia is derived from six (6) main knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and the UNSPSC products and services. The conceptual relationships in the KBpedia Knowledge Ontology (KKO) are largely drawn from OpenCyc, though any of the other sources may contribute local knowledge graph structure. Additional reference concepts (RCs) are contributed primarily from Wikipedia, schema.org, GeoNames, and UNSPSC. Wikidata contributes the bulk of the instance data, though instance records are actually drawn from all sources. DBpedia, Wikidata, and schema.org are also the primary sources for attribute characterizations of the instances.
Two characteristics define what is a core contributor to the KBpedia structure: 1) the scale and completeness of the source; and 2) its contribution of a large number of RCs to the overall KKO knowledge graph. Here is more discussion of these six core knowledge bases:
Wikipedia | Wikipedia is a free Internet encyclopedia written and maintained by volunteer editors. Wikipedia is the largest and most popular general reference work on the Internet and is ranked among the ten most popular websites. It has more than 200 language versions; the English version has more than 5 million articles. The English Wikipedia is the main source of reference concept content in KBpedia and the source of most text. Though local relationships may be used, KBpedia replaces the Wikipedia category structure with its own KBpedia Knowledge Ontology (KKO). Concept and entity linkages (via Wikipedia articles) provide the means for adopting multiple language versions of KBpedia. |
Wikidata | Wikidata is a collaboratively edited knowledge base of (mostly) entity records, intended to provide a common source of data that can be used by Wikimedia projects such as Wikipedia. There are more than 30 million records in Wikidata, which are also linked to key Wikipedia articles. Many are also linked to Wikimedia images. Wikidata is the primary source of entity and attribute data and images in KBpedia. Most Wikidata records are multilingual, which also provides the means for adopting multiple language versions of KBpedia. |
schema.org | schema.org is a collaborative, community vocabulary with a mission to create, maintain, and promote schemas for structured data on the Internet in Web pages and email. The schema.org controlled vocabulary covers entities and relationships between entities and actions; it has a well-documented extension model for enlarging the vocabulary for domain-specific purpuses. There are more than 800 types in the vocabulary. Over 10 million sites use schema.org to markup their content. schema.org can be used with many different encodings, including RDFa, Microdata and JSON-LD. Founded by the major search engines of Google, Microsoft, Yahoo and Yandex, these sources and many others are using schema.org to power rich, extensible applications. |
DBpedia | DBpedia is a project that extracts structured content from the infoboxes on Wikipedia and makes it available using semantic technologies. DBpedia is the source of much of the structured information for Wikipedia concepts and entities contained within KBpedia. The separate DBpedia ontology is also mapped to KBpedia as an external linkage. |
GeoNames | The GeoNames knowledge base contains over 10,000,000 geographical names corresponding to over 7,500,000 unique features. All features are categorized into one of nine feature classes and further subcategorized into one of 645 feature codes. Beyond names of places in various languages, data stored include latitude, longitude, elevation, population, administrative subdivision and postal codes. All coordinates use the World Geodetic System 1984 (WGS84). GeoName's feature codes were used to organize all of the geographical entity types (geopolitical, settlements, natural areas, and developed areas) in KBpedia. |
OpenCyc | Cyc is an artificial intelligence project that combines a comprehensive ontology with a knowledge base of everyday common sense. OpenCyc is an open source part of Cyc with more than 300,000 concepts. Cyc has been manually developed since 1984 by Cycorp of Austin, TX. It has been developed and refined through more than 1,000 person-years of effort. OpenCyc provides most of the starting reference concepts and relationships used in the KBpedia Knowledge Ontology (KKO) knowledge, especially in the areas of entity types and in the KBpedia typologies. Though OpenCyc is English only, KKO is multi-lingual, using the Cyc concepts as a language-neutral backbone. As of March 2017, OpenCyc has been removed from the public domain, though thousands of earlier copies remain. |
UNSPSC | The United Nations Standard Products and Services Code (UNSPSC) is a taxonomy of products and services for use in eCommerce and logistics. It is a four-level hierarchy (segment - family - class - commodity) with each entry coded as an eight-digit number, with an optional fifth level adding two more digits. UNSPSC segments split fairly cleanly into three main economic sectors: primary (raw) products; secondary (manufactured) products; and tertiary services. All three UNSPSC upper levels are included in KBpedia, as well as thousands of the most common commoditiy products and services at the fourth level. KBpedia uses version 20, highly consistent in its upper three levels with the latest version (22) of the UNSPSC. |