KBpedia Structural Components
The KBpedia Knowledge Ontology (KKO) has a number of structural dimensions or components, any one of which may be used for feature generation or characterization of the knowledge space. These components, and definitions for them, are:
-
Annotations — an annotation, specifically as an annotation property, is a way to provide metadata or to describe vocabularies and properties used within an ontology; annotations do not participate in reasoning or coherency testing for knowledge graphs (ontologies);
-
Aspects — are aggregations of an entity type that are grouped according to features or views different from the type itself. As examples, the type of "music composer" may have an aspect of being from the 19th century, or "authors" may have the aspects of being Russian or writing novellas. The organization of aspects closely parallels that for SuperTypes;
-
Attributes — are the characteristics, qualities or descriptors that signify individual entities. These attributes are also known through the terms of depth, comprehension, significance, meaning or connotation. Key-value pairs match an attribute with a value; it may be an actual value, one of a set of values, or a descriptive label or string. (If the value is a reference to a different entity, it is known as a relation.) In an RDF statement, an attribute is expressed as a property (or predicate); most in OWL are data properties. In intensional logic, all attributes or characteristics of similarly classifiable items define the membership in that set. Attributes are properties;
-
Attribute Types — an aggregation (or class) of multiple attributes that have similar characteristics amongst themselves (for example, colors or ranks or metrics). As with other types, shared characteristics are subsumed over some essence(s) that give the type its unique character;
-
Core Structure — is derived from seven (7) main knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and the UNSPSC products and services. The conceptual relationships in the KBpedia Knowledge Ontology (KKO) are largely drawn from OpenCyc, though any of the other sources may contribute local knowledge graph structure. Additional reference concepts (RCs) are contributed primarily from Wikipedia, schema.org, GeoNames, and UNSPSC. Wikidata contributes the bulk of the instance data, though instance records are actually drawn from all sources. DBpedia, Wikidata, and schema.org are also the primary sources for attribute characterizations of the instances;
-
Datatypes — are pre-defined ways that attribute values may be expressed, including various literals and strings (by language), URIs, Booleans, numbers, date-times, etc. See XSD (XML Schema Definition) for more information;
-
Documents (or Articles or Records) — most RCs and entities within KBpedia are accompanied by one or more documents, which may be in the form of articles or data records. This information may be further mined using natural language processing or other extractors to supplement the "official" structural components within KBpedia;
-
Entities — are the basic, "real" things (including concepts, beliefs and fictions) in our domain of interest. An entity is an individual object or member of a class; when affixed with a proper name or label it is also known as a named entity (thus, named entities are a subset of all entities). Entities are described and characterized by attributes. Entities are connected or related to one another through relations;
-
Entity Types — are the aggregations or collections or classes of similar entities, which also share some essence;
-
External Linkages (or Mappings) — are any of the relational properties may be used to map external datasets and schema to KBpedia. In its base form, which can be expanded, KBpedia has mappings to more than 20 external sources; see further Extended Mappings;
-
Instances (or Individuals) — are the basic, “ground level” components of an ontology. An instance is an individual member of a class, also used synonymously with entity. The instances in KKO may include concrete objects such as people, animals, tables, automobiles, molecules, and planets, as well as abstract instances such as numbers and words. An instance is also known as an individual, with member and entity also used somewhat interchangeably;
-
Knowledge Base (or Corpus) — a knowledge base (abbreviated KB or kb) is a special kind of database for knowledge management. A knowledge base provides a means for information to be collected, organized, shared, searched and utilized. There are six KBs in KBpedia in its core structure;
-
Metadata — supplementary data that provides information about one or more aspects of the content at hand such as means of creation, purpose, when created or modified, author or provenance, where located, topic or subject matter, standards used, or other annotation characteristics. It is “data about data”, or the means by which data objects or aggregations can be described. Contrasted to an attribute, which is an individual characteristic intrinsic to a data object or instance, metadata is a description about that data, such as how or when created or by whom;
-
Preferred Label (or prefLabels or Title) — the preferred label is the readable string (name) for each of the Reference Concepts in KBpedia. The labels are provided as a readable convenience; the actual definition of the concept comes from the totality of its description, prefLabel, altLabels, and connections (placement) within the KKO knowledge graph. KBpedia's approach is in keeping with the idea of "things not strings";
-
Predicates — see Properties;
-
Properties — are the ways in which classes and instances can either be: 1) described and characterized, in which case they are attributes; or 2) related to one another. Between objects, properties are thus a relationship, and are also known as predicates;
-
Reference Concepts (or RefConcepts or RCs) — are a distinct subset of the more broadly understood concept such as used in the SKOS RDFS controlled vocabulary or formal concept analysis or the very general or abstract concepts common to some upper ontologies. RefConcepts are selected for their use as concrete, subject-related or commonly used notions for describing tangible ideas and referents in human experience and language. RCs are classes, the members of which are nameable instances or named entities, which by definition are held as distinct from these concepts. The KKO knowledge graph is a coherently organized structure (or reference "backbone") of these RefConcepts. There are more than 58,000 RCs in KBpedia;
-
Relations — a connection between any two objects, entities or types. Relations are properties;
-
Relation Types — an aggregation (or class) of multiple relations that have similar characteristics amongst themselves. As with other types, shared characteristics are subsumed over some essence(s) that give the type its unique character;
-
Semsets (or Synsets or Alternative Labels or altLabels) — are collections of alternate labels and terms to describe a concept or entity. These alternatives include true synonyms, but may also be more expansive and include jargon, buzzwords, acronyms, epithets, slang, pejoratives, metonyms, stage names, diminuitives, pen names, derogatives, nicknames, hypochorisms, sobriquets, cognomens, abbreviations, or pseudonyms; in short, any term or phrase that can be a reference to a given entity;
-
SuperTypes (also Super Types) — are a collection of (mostly) similar Reference Concepts. Most of the SuperType classes have been designed to be (mostly) disjoint from the other SuperType classes, these are termed "core"; other SuperTypes used mostly for organizational purposes are termed "extended". There are about 80 SuperTypes in total, with 30 or so deemed as "core". SuperTypes thus provide a higher-level of clustering and organization of Reference Concepts for use in user interfaces and for reasoning purposes;
-
Types (or Classes or Kinds) — are aggregations of entities with many shared attributes (though not necessarily the same values for those attributes) and that share a common essence, which is the defining determinant of the type. See further the description for the type-token distinction;
-
Typologies — are flat, hierarchical taxonomies comprised of related entity types within the context of a given KBpedia SuperType (ST). Typologies have the most general types at the top of the hierarchy; the more specific at the bottom. Typologies are a critical connection point between the TBox (RCs) and ABox (instances), with each type in the typology providing a possible tie-in point to external content.
To learn further about the structural possibilities and uses of features for machine learners, see A (Partial) Taxonomy of Machine Learning Features.