Amount of data

  1. Number of triples
  2. Level of detail
  3. Scope

Number of triples

To calculate the number of triples in the KG we can proceed in two ways. The first consists in recovering the data through the metadata, in particular the triples key. This method is only applied when actual triples cannot be counted by accessing the SPARQL endpoint. Because the metadata is not updated along with the content of the KG. The following query is used for count the number of triples:

SELECT (COUNT(?s) AS ?triples)
WHERE { ?s ?p ?o }

To quantize the metric, if we can count the number of triples in the KG, we assign 1 to the metric, 0 otherwise.

Level of detail

We can only obtain this type of value by executing a SPARQL query. In particular, the number of properties is given to us by this query:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT (COUNT(?o) AS ?triples)
WHERE {
{ ?o a rdf:Property}
UNION
{?o a owl:DatatypeProperty}
UNION
{?o a skos:Property}
UNION
{?o a owl:DatatypeProperty}
UNION
{?o a owl:AnnotationProperty}
UNION
{?o a owl:OntologyProperty}
UNION
{?o a rdfs:subPropertyOf}
UNION
{?o a rdfs:Property}
}

To quantize the metric, if we can count the number of properties in the KG, we assign 1 to the metric, 0 otherwise.

Scope

In this case we simply recover it by searching for the triple with $void:entities$ predicate inside the VoID file. As an alternative if there isn’t a VoID file available, we execute the following query on the SPARQL endpoint.

PREFIX void:<http://rdfs.org/ns/void#>
SELECT ?triples
WHERE {?s void:entities ?triples}

Both of these methods, however, are based on the assumption that the provider of the dataset insert a triple in the KG with this information. Often, however, this does not happen, so in the event that the information is not provided we use another method. We first recover the URI regex or pattern, we recover this information by doing the following query:

SELECT DISTINCT ?o
WHERE
{?s void:uriRegexPattern ?o}

or this for the URI pattern

PREFIX void: <http://rdfs.org/ns/void#>
SELECT DISTINCT ?o
WHERE {?s void:uriSpace ?o}

In case the regex is not available, but we only have the URI space (which is not a regex), we transform it into a regex to use it for comparison. Once we got the regex we use the following query for count the number of entities:

SELECT (COUNT(?s) as ?triples)
WHERE{
{?s ?p ?o}
FILTER(regex(?s,"%s"))
}

(The %s parameter in the regex function is set with the regex that we obtained with the mechanisms indicated above.)

To quantize the metric, if we can count the number of entities in the KG, we assign 1 to the metric, 0 otherwise.