Open Standards

Open standards and best practices are employed by KBpedia in order to: 1) obtain the most accurate results; and 2) facilitate interoperability with external data and systems. Our open standards are mostly based on those from the World Wide Web Consortium (W3C), which established the standards for the original Web and the design of Web pages and Web protocols. Specific W3C standards used by KBpedia include:

World Wide Web Consortium (W3C)
  • Resource Description FrameworkRDF (v 1.1) is the basic data model and language for the semantic Web. A statement, which is also an assertion, is comprised as a triple of subject - predicate - object (or s-p-o). As the standard states, "The abstract syntax has two key data structures: RDF graphs are sets of subject-predicate-object triples, where the elements may be IRIs, blank nodes, or datatyped literals. They are used to express descriptions of resources. RDF datasets are used to organize collections of RDF graphs, and comprise a default graph and zero or more named graphs." RDF gives us the basic scaffolding for knowledge graphs and the description of resources
  • RDF SchemaRDFS (v 1.1) is a data modeling vocabulary extension to RDF that gives us the constructs for defining classes and instances, property domains and ranges, and the subsumption hierarchy capabilities so essential to the basic logic of knowledge graphs
  • Web Ontology LanguageOWL2 (so designated because it is the second version) is the fullest language specification for our knowledge graphs. It provides a complete set of vocabulary grammar to construct knowledge graphs that are decidable and testable using description logics. Our implementations build on RDF and RDFS, and supplement the vocabulary with SKOS
  • Simplified Knowledge Organization SystemSKOS provides a basic vocabulary for knowledge organization systems, such as thesauri, taxonomies, classification schemes and subject heading systems, and a richer pool of label and annotation primitives. All of these are useful when integrating across multiple knowledge bases and schema
  • SPARQLSPARQL (pronounced "sparkle", and is a recursive acronym for SPARQL Protocol and RDF Query Language) is a set of specifications that provide a query language and protocols to retrieve from and manipulate RDF graph content. SPARQL is typically accessed via a Web endpoint to a triple store knowledge base. We also may use the SPARUL extension to enable the RDF store to be updated with INSERT and DELETE methods
  • Semantic Web Rule Language — though only a W3C submission, SWRL is nonethless commonly used as an extension to OWL to provide if-then rule statements. SWRL includes a high-level abstract syntax for Horn-like rules in OWL.

Other standards, such as HTML, are also used where appropriate. We also employ many open source standard libraries and tools, prominently the ontology IDE, Protégé, the OWL API and the search engine Lucene.

In the use of these standards, we apply best practices, many of which we have developed through our client work. Some of these include the use of semsets for capturing the multiple labels that might be applied to a given thing; how to construct and manage ontologies (also known as knowledge graphs); ensuring multi-lingual capabilities; and build and management workflows.

Most supporting KBpedia code is written in Clojure, in part due to its ability to run in the Java virtual machine.