USE CASE |
|
Title: | Browsing the KBpedia Knowledge Graph |
Short Description: |
We describe how to browse the KBpedia knowledge graph
|
Problem: |
We want to understand the coverage and linkages within a
knowledge graph (also called an ontology)
|
Approach: | KBpedia is a computable structure, logically coherent and structurally consistent. The online knowledge graph showcases many of these structural aspects -- concepts, entities, attributes, typologies and aspects -- and also supports direct and inferential browsing of the structure. This online example is not necessarily what may be used by specific clients. The first difference, of course, is that most clients need to expand or bridge off of KBpedia with their own domain schema and data. The second difference is that the look-and-feel and some functionality may be modified for specific uses. |
Key Findings: |
The uses for browsing a knowledge graph include:
These uses do not include the work-related tasks in natural language processing or knowledge-based artificial intelligence discussed under other use cases. |
The example we present herein is based on the concept of ‘currency‘, which you may interactively inspect for yourself online.
The KBpedia knowledge structure combines six major knowledge bases and maps to a further 20 other common vocabularies. KBpedia contains more than 53,000 reference concepts (RCs), organized into a knowledge graph as defined by the KBpedia Knowledge Ontology. KKO is a logically organized and computable structure that supports inference and reasoning.
About 85% of the RCs are themselves entity types — that is, 47,000 natural classes of similar entities such as astronauts or zoo animals — that are organized into about 30 “core” typologies that are mostly disjoint (non-overlapping) with one another. By definition an entity type is also a ‘reference concept’, or RC.
KBpedia’s typologies provide a powerful means for slicing-and-dicing the knowledge structure. The individual entity types provide the tie-in points to about 32 million individual entities. The remaining RCs are devoted to other logical divisions of the knowledge graph, specifically attributes, relations and topics. It is this structure, plus often connections to another 20 leading external vocabularies, that forms the basis of the KBpedia Knowledge Graph.
For the standard RC, each concept has a record with potentially eight (8) main panels or sections, each of which is described below:
Panels are only displayed when there are results for them.
Each entry begins with a header:
Above the header to the left is the listing for the current KBpedia version and its date of release. Next to it is a link for sending an email to a graph administrator should there be a problem with the current entry. Above the header to the right is the search box, itself the topic of another application case.
The Header consists of these possible entries:
http://kbpedia.org/kko/rc/Currency
.
If there is an image for the RC, it is also displayed
The Core Structure for KBpedia is the next panel. Two characteristics define what is a core contributor to the KBpedia structure: 1) the scale and completeness of the source; and 2) its contribution of a large number of RCs to the overall KKO knowledge graph. The KBs in the core structure play a central role in the scope and definition of KBpedia. This core structure of KBpedia is supplemented by mappings to about 20 additional external linkages, which are highly useful for interoperability purposes, but do not themselves contribute as much to the RC scope of the KKO graph. The Core Structure is derived from the six (6) main knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, and OpenCyc.
The conceptual relationships in the KBpedia Knowledge Ontology (KKO) are largely drawn from OpenCyc, UMBEL, or Wikipedia, though any of the other sources may contribute local knowledge graph structure. Additional reference concepts are contributed primarily from GeoNames. Wikidata contributes the bulk of the instance data, though instance records are actually drawn from all sources. DBpedia and Wikidata are also the primary sources for attribute characterizations of the instances. Instance data, by definition, are not part of the core structure.
Here is the Core Structure panel:
The Core Structure panel, like the other panels, has a panel title
followed by a brief description. The Core Structure panel lists the
equivalent class (owl:equivalentClass
), parent super
classes (kko:superClassOf
), child sub classes
(rdfs:subClassOf
), or a closely related concept
(kko:isCloselyRelated
) (not shown). These relationships
define the edges between the nodes in the graph structure, and are also
the basis for logical inferencing.
Sub-classes and super-classes may be determined either as direct assertions or those that are inferred from parent-child relationships in the Knowledge Graph. An inferred relationship includes any of the parent or child ancestors; the direct is the immediate child or parent. Picking one of these links restricts the display to the concepts related to that category. Like familial relationships, the closer the concept is to its lineage relation, the likely closer are the shared attributes or characteristics of the concepts. Such lineage inferences arise from the relations in the KBpedia Knowledge Ontology (KKO).
Each of the related concepts is presented as a live link, which if clicked, will take you to a new entry for that concept. Some of the icons and information for equivalent classes are discussed under other panels below.
In addition to the Core Structure, KBpedia RCs are linked to thousands of classes defined in nearly 20 external ontologies used to describe all kinds of public and private datasets. Some of the prominent external vocabularies include schema.org, the major structured data system for search engines, and Dublin Core, a key vocabulary from the library community. Other external vocabularies cover music, organizations, projects, social media, and the like.
Here is how the External Linkages panel looks, which has many parallels to the Core Structure panel:
The external links, like the core ones, are shown as live links with an icon associated to each source. For RCs that are entity types, the entry might also display the count of entities (orange background with count) or related-aspect entities (blue background with count) linked to that RC (either directly or inferred, depending on the option chosen). Clicking on the specific RC link will take you to that reference concept. Clicking on the highlighted background will take you to a listing of the entities for that RC (based on either its direct or inferred option).
Also, like the short descriptions on each of these panels, clicking the more link expands the description available:
Entities are distinct, nameable, individual things. There are more than 30 million of them in the baseline KBpedia.
Entities may be physical objects or conceptual works or discrete ideas, so long as they may be characterized by attributes shared by other instances within similar kinds or types. Entities may be parts of other things, so long as they have a distinct identity and character. Entities with shared attributes that are the essences of the things may be grouped into natural types, called entity types. These entity types may be further related to other entity types in natural groupings or hierarchies depending on the attributes and their essences that are shared among them.
Here is how the general Entities panel appears:
In this case for currency, there are 2003 instances (individual entities) in the current KBpedia knowledge base. The first few of these are shown in the panel, with the live links then taking you to the an entity report for that instance. Similarly, you can click the Browse all entities button, which then allows you to scroll through the entire listing of entities. Here is how that subsidiary page, in part, appears:
Nearly 85%, or 47,000, of the reference concepts within the KBpedia
Knowledge Ontology (KKO) are entity types, these natural classes of
entities. They are key leverage points for inteoperability and mapping.
Instances (or entities) are related to the KKO graph via the
rdfs:type
predicate, which assigns an entity to one or
more parental classes. It is through this link that you view the
individual entities.
Entities may also be characterized according to one or more of about 80 aspects. Aspects help to group related entities by situation, and not by identity nor definition. Aspects thus provide a secondary means for organizing entities independent of their nature, but helpful for placing the entity in real-world contexts. Not all aspects relate to a given entity.
The Aspects panel has a similar presentation to the other panels:
If an entity with a related aspect occurs in the knowledge system, its aspect label will be shown with then a listing of the top entities for that aspect. Each of these entities is clickable, which will take you to the standard entity record. A button to Browse all entities means there are more entities for that aspect than the short listing will allow; click on it to be able to paginate through the full listing of related entities.
Note, as well, on this panel that we are also highlighting the down arrow at the upper right of the panel. Clicking that causes the entire panel to collapse, leaving only the title. Clicking on the arrow again causes the panel to expand. This convention applies to all of the panels discussed here.
About 85% of all of the reference concepts (RCs) in KBpedia represent classes of entities, which themselves are organized into about 30 core typologies. Most of these typologies are disjoint (lack overlap) from one another, which provides an efficient mechanism for testing subsets and filtering entities into smaller groups for computational purposes. (Another 30 or so SuperTypes provide extended organization of these entities.)
The Typologies panel follows some of the standard design of the other panels. Only the typologies to which the current entry belongs, in this case currency, are shown:
As noted, the major groupings of types reside in
core typologies, which is where the largest
degree of disjointedness occurs. There are some upper typologies (such
as Living Things
over Plants
,
Animals
, etc.) that are used mostly for organizational
purposes; these are the extended ones. The
core typologies are the key ones to focus upon for distinguishing large
groupings of entities.
The last panel section for a concept presents both the parental
(Broader) and child
(Narrower) concepts for the current entry
(again, in this case, currency).
Broader concepts represent the parents (or
grandparental lineage in the case of inference) for the current
reference concept. The broader concept relationship is expressed using
the transitive kko:superClassOf
property. This property is
the inverse of the rdfs:subClassOf
property.
Narrower concepts represent the children (or
grandchild lineages in the case of inference) for the current RC. The
narrower concept relationship is expressed using the transitive
rdfs:subClassOf
property. This property is the inverse of
the kko:superClassOf
property.
Here is the side-by-side panel presentation for these relationships:
Like some of the prior panels, it is possible to toggle between direct and inferred listings of these related concepts. If the RC is an entity type, it may also show counts for all entities subsumed under that type (orange color) or that have aspects of that type (blue color). Clicking on these count icons will take you to a listing of these entities.
This browsing and discovery use case is based on the standard configuration and the baseline KBpedia. Client variants may change the design and functionality of the application. More importantly, however, client applications are invariably extensions to the base KBpedia knowledge structure. These sometimes have some typologies removed because they are not relevant, but more likely have been expanded with the mapping of domain schema, vocabularies, and instances. In these cases, the actual content to be browsed may differ significantly from what is shown.