KBpedia Adds Major eCommerce Capabilities



CORALVILLE, IA (06/15/2020) -- KBpedia, the open-source knowledge graph that incorporates seven leading public knowledge bases, got a major upgrade today to add e-commerce and logistics to its capabilities. The enhancement comes from adding the United Nations Standard Products and Services Code as KBpedia's seventh core knowledge base. UNSPSC is a comprehensive and logically organized taxonomy for products and services, organized into four levels, with codes and third-party crosswalks to many economic and demographic data sources. It is a leading standard for many industrial, e-commerce, and logistics applications.

"This was a heavy lift for us to incorporate," said Michael Bergman, KBpedia's lead editor. "Given the time and effort involved, we decided to tackle a host of other refinements we had on our plate." Bergman noted many thousands of person-hours and more than 200 complete builds from scratch were devoted to this new version. "This release really fulfills the vision we had when we first began KBpedia's development," Bergman said. "We are excited to make broad outreach with this new version in 2020," he added. The extent of changes caused the editors to advance KBpedia's version numbering from 2.21 to 2.50.

KBpedia is a knowledge graph that provides a computable overlay for interoperating and conducting machine learning across its constituent public knowledge bases of Wikipedia, Wikidata, GeoNames, DBpedia, schema.org, OpenCyc, and, now, UNSPSC. KBpedia contains more than 58,000 reference concepts and their mappings to these knowledge bases, structured into a logically consistent knowledge graph that may be reasoned over and manipulated. KBpedia acts as a computable scaffolding over these broad knowledge bases with the twin goals of data interoperability and knowledge-based artificial intelligence (KBAI).

KBpedia is built from a expandable set of simple text 'triples' files, specified as tuples of subject-predicate-object, that informs how to construct the entire knowledge graph from scratch. This process enables many syntax and logical tests, especially consistency, coherency, and satisfiability, to be invoked at build time. A build may take from one to a few hours on a commodity workstation, depending on the tests. The build process results in validated ontology (knowledge graph) files in the standard W3C OWL 2 semantic language and mappings to individual instances in the contributing knowledge bases.

"We continue to streamline and improve our build procedures," said Fred Giasson, KBpedia's co-editor. "Major changes like what we have just gone through, be it adding a main source like UNSPSC or swapping out or adding a new SuperType, require multiple build iterations to pass the system's consistency and satisfiability checks. We need these build processes to be as easy and efficient as possible, which also was a focus of our latest efforts," he said. Giasson noted that one of the project's next major objectives is to release KBpedia's build and maintenance codes, perhaps including a Python option.

Incorporation of UNSPSC

Though UNSPSC is consistent with KBpedia's existing three-sector economic model (raw products, manufactured products, services), adding it did require structural changes throughout the system. With more than 150,000 listed products and services in UNSPSC, incorporating it needed to balance with KBpedia's existing generality and scope. The approach was to include 100% of the top three levels of UNSPSC -- segments, families, and classes -- plus more common and expected product and service 'commodities' in its fourth level. This design maintains balance while providing a framework to tie-in any remaining UNSPSC commodities of interest to specific domains or industries. This approach led to integrating 56 segments, 412 families, 3700+ classes, and 2400+ commodities to KBpedia. Since some 1300 of these additions overlapped with existing KBpedia reference concepts, all duplicates were checked, consolidated, and reconciled.

All added reference concepts (RCs) were fully specified and integrated with the existing KBpedia structure, and then mapped to all of the other major contributing knowledge bases in KBpedia. Through this process, for example, the editors were able to greatly expand the coverage of UNSPSC items on Wikidata from 1000 or so Q (entity) identifiers to more than 6500. Contributing such mappings back to the community is another effort the KBpedia project will undertake next.

Other Major Refinements

These changes were broad in scope. Effecting them took time and broke open core structures. Opportunities to rebuild the structure in cleaner ways arise when the Tinkertoys get re-assembled. Some of the other major refinements the project undertook during this version upgrade were to:

  • Further analyze and refine the disjointedness between KBpedia's 70 or so typologies. Disjoint assertions are a key mechanism for sub-set selections, various machine learning tasks, querying, and reasoning
  • Increase the number of disjointedness assertions 62% over the prior version, resulting in better modularity. (However, note the actual RCs affected by these improvements is lower than this percentage since many were already specified in prior disjoint pools)
  • Add 37% more external mappings to the system (DBpedia and UNSPSC, principally)
  • Complete 100% of the definitions for RCs across KBpedia
  • Greatly expand the altLabel entries for thousands of RCs
  • Improve the naming consistency across RC identifiers
  • Further clean the structure to ensure that a given RC is specified only once to its proper parent in an inheritance (subsumption) chain, which removes redundant assertions and improves maintainability, readability, and inference efficiency
  • Expand and update the explanations within the demo of the upper KBpedia Knowledge Ontology (KKO) (see kko-demo.n3). This non-working ontology makes it easier to relate the KKO upper structure to the universal categories of Charles Sanders Peirce, which provides the basic organizational framework for KKO and KBpedia, and
  • Integrate the mapping properties for core knowledge bases within KBpedia's formal ontology (as opposed to only offering as separate mapping files); see kbpedia-reference-concepts-mappings.n3 in the distro.

Current Status of the Knowledge Graph

The result of these structural and scope changes was to add about 6,000 new reference concepts to KBpedia, then remove the duplicates, resulting in a total of more than 58,200 RCs in the system. This has increased KBpedia's size about 9% over the prior release. KBpedia is now structured into about 73 mostly disjoint typologies under the scaffolding of the KKO upper ontology. KBpedia has fully vetted, unique mappings (nearly all one-to-one) to these key sources:

  • Wikipedia - 53,323 (including some categories)
  • DBpedia - 44,476
  • Wikidata - 43,766
  • OpenCyc - 31,154
  • UNSPSC - 6,553
  • schema.org - 842
  • DBpedia ontology - 764
  • GeoNames - 680
  • Extended vocabularies - 249.

The mappings to Wikidata alone link to more than 40 million unique Q instance identifiers. These mappings may be found in the KBpedia distro. Most of the class mapping are owl:equivalentClass, but a minority may be subClass or superClass or isAbout predicates as well.

KBpedia also includes about 5,000 properties, organized into a multi-level hierarchy of attributes, direct relations, and representations, most derived from Wikidata and schema.org. Exploiting these properties and sub-properties is also one of the next initiatives for KBpedia.

To Learn More

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce's prescient theories of knowledge representation. Mappings to all external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project's Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record. Such reachthroughs are straightforward to construct.) See further the Github site for further downloads. All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

About KBpedia

The KBpedia knowledge structure combines seven (7) public knowledge bases - Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and the UNSPSC products and services - into an integrated whole. These core KBs are supplemented with mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and data interoperability. KBpedia provides a coherent overlay for retrieving and organizing Wikipedia, Wikidata, or linked data content. KBpedia greatly reduces the time and effort traditionally required for knowledge-based artificial intelligence (KBAI) tasks. KBpedia was first released in October 2016 with some open source aspects, and was made fully open in 2018. KBpedia is sponsored by Cognonto Corporation.

Press Contact

Mike Bergman, Cognonto Corp.