Wikidata Coverage Nearly Complete (98%)

04/08/2019

 

CORALVILLE, IA (04/08/2019) -- Cognonto Corporation today released v 2.10 of KBpedia, which the company claims extends the mappings to Wikidata instances to more than 98% and markedly improves its quality. The developers also note that coverage has increased to very high levels for other aspects of structure and properties within Wikidata. The developers reported they manually inspected all 45,000 mappings of KBpedia reference concepts to Wikidata instances, resulting in many changes and improvements. Cognonto claims the quality of mappings in KBpedia has never been higher.

KBpedia is an open-source, computable knowledge graph that sits astride Wikipedia and Wikidata and other leading knowledge bases. Its baseline 55,000 reference concepts provide a flexible and expandable means for relating data records to a common basis for reasoning and inferring logical relations and for mapping to virtually any external data source or schema. The framework is a clean starting basis for doing knowledge-based artificial intelligence (KBAI) and to train and use virtual agents.

The company reported almost all efforts related to KBpedia v 2.10 were focused on Wikidata, though, with their close alliance, many changes also were reflected to the Wikipedia mappings. As noted with the v 2.00 release, this new version began by mapping Q items (IDs) that have much instance coverage, but were lacking in prior mappings. This attention resulted in adding a net 973 Q IDs to KBpedia. This number is a bit misleading, however, since in the manual inspection phases many duplicates were removed from the system (approx. 2100) and earlier mappings to category Q IDs (approx. 2700) were upgraded to their more specific Q ID instance. Thus, nearly 6,000 Q IDs are now different in this version compared to the prior version 2.00. Since many of the Q IDs also have a direct mapping to a Wikipedia counterpart, these mappings were updated as well. Besides incidental improvements to definitions, linkages and labels that arises when doing such inspections, which were also attended to whenever encountered, no further major changes were made to this newest release.

KBpedia is now in very good shape with respect to the mapping and coverage of Wikidata (with a similar profile for Wikipedia). Across a breadth of measures, Wikidata coverage is high (see the related blog post for implementation documentation):


Wikidata Item No. Items No. Mapped Items Coverage
Q IDs
45,306,576 45,882 00.1%
Q instances 45,306,576 44,458,015 98.1%
Q classes 2,493,795 2,312,116 92.7%
Properties 5,910 3,970 67.2%
P Statements 256,298,963 246,055,199 96.0%
P Qualifiers 38,866,255 31,756,937 81.7%
P References 24,582,259 20,121,794 81.9%

One of the first observations that jumps out of the table is how relatively few mappings (~ 45 K, or 0.1%) are sufficient to capture nearly all (98%) of the instances contained in Wikidata. This is because a Q ID may be an individual instance or a parent to multiple instances. The KBpedia mappings focus on the parents, through which the individual instances may be obtained. By virtue of the additions and Q mapping improvements in this version, KBpedia has expanded its instance reach from about 30 million entities to now 45 million entities.

Another observation is that KBpedia also now captures a significant portion of the structure of Wikidata (93%) as provided by the mappings to Q IDs with significant subClassOf connections (P279), which is where the taxonomy of the knowledge base is defined. A third summary observation is that KBpedia has similarly high levels of coverage to Wikidata properties. However, according to the editors, Michael Bergman and Fred Giasson, this is the least developed area of KBpedia with respect to use cases or cross-knowledge base mappings.

The KBpedia Web site provides a working KBpedia explorer and demo of how the system may be applied to local content for tagging or analysis. KBpedia splits between entities and concepts, on the one hand, and splits in predicates based on attributes, external relations, and pointers or indexes, all informed by Charles Peirce's prescient theories of knowledge representation. Mappings to all external sources are provided in the linkages to the external resources file in the KBpedia downloads. (A larger inferred version is also available.) The external sources keep their own record files. KBpedia distributions provide the links. However, you can access these entities through the KBpedia explorer on the project's Web site (see these entity examples for cameras, cakes, and canyons; clicking on any of the individual entity links will bring up the full instance record. Such reachthroughs are straightforward to construct.) See further the Github site for further downloads. All resources are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. 

About KBpedia

The KBpedia knowledge structure combines seven (7) public knowledge bases - Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL - into an integrated whole. These core KBs are supplemented with mappings to more than a score of additional leading vocabularies. The entire KBpedia structure is computable, meaning it can be reasoned over and logically sliced-and-diced to produce training sets and reference standards for machine learning and data interoperability. KBpedia provides a coherent overlay for retrieving and organizing Wikipedia or Wikidata content. KBpedia greatly reduces the time and effort traditionally required for knowledge-based artificial intelligence (KBAI) tasks. KBpedia was first released in October 2016 with some open source aspects, with remaining restrictions now removed. KBpedia is sponsored by Cognonto Corporation.

Press Contact

Mike Bergman, Cognonto Corp.
1-319-339-0650
mike@cognonto.com