2% makes all the difference on the LOD cloud

In a few talks I have asserted that on the LOD cloud the number of rules is greatly outnumbered by the number of facts. And in fact, I used a picture like this to visualise the ratio (yes, the small dot is really there):

Total size of all rules on LOD vs. size of all facts on LOD.

Today, Alan Bundy asked me if I could back this up with a quotable number, and of course when ones PhD supervisor asks, one does not refuse. So with the help of Joe Raad, we dug out the numbers from the LOD-a-lot crawl at http://lod-a-lot.lod.labs.vu.nl/. Here’s what we got:

First some Background considerations:
Because of the way the Semantic Web representations work, we have to think what it means to be a “rule”. After all, the entire Semantic Web / Linked Open Data cloud consists of atomic triples <subject predicate object>.

All triples in RDF Schema and OWL with the following predicates would count as “rules”, since they are ontological statements that allow the derivation of other statements: owl:sameAs, owl:equivalentClass, owl:equivalentProperty, rdfs:subClassOf, rdfs:subPropertyOf, owl:disjointWith, owl:differentFrom, rdfs:domain, rdfs:range.

The numbers:
We retrieved the following numbers from the LOD-a-lot, the largest publicly available queryable crawl of the Linked Open Data cloud:

  • owl:sameAs: 558,9 M
  • rdfs:subClassOf: 4,4 M
  • owl:equivalentClass: 1 M
  • owl:disjointWith: 450 K
  • rdfs:domain: 206 K
  • rdfs:range: 197,5 K
  • rdfs:subPropertyOf: 80 K
  • owl:equivalentProperty: 8,4 K
  • owl:differentFrom 3,6 K

So in total that is just over 565M statements that would classify as “rules”. The total size of the LOD-a-lot crawl is 28.3B unique statements (the crawl is all deduciplicated). So that would make it just under 2% of the entire LOD cloud). (Notice also the very skewed frequency distribution of these statements; without owl:sameAs it would only by 0.02% of the entire LOD cloud).

Philosophical musings:
So unlike in traditional symbolic AI / KR / KBS / theorem proving thinking, the power of the LOD knowledge base does not  come from a huge amount of rules, but from it comes from the huge amount of “boring” things it knows about the world: “Edinburgh is in Scotland”, “Frank got his PhD in Edinburgh”, “Alan Bundy was his PhD supervisor”  (all of these are actually in the LOD cloud). The rules play the role of “yeast”: it’s a very small amount of the total size, but it allows the thing to grow and to derive interesting new things (eg: Alan Bundy lives in Scotland, which is not expliclty asserted anywhere on the LOD cloud).
I actually think this also says something about human intelligence. Our ability to succesfully operate in the physical and social world comes for a large part from us knowing an endless store of boring atomic facts, plus a little bit of reasoning with comparatively small rulesets (small in comparison to the number of atomic facts). But of course this last paragraph is just idle speculation.

The LOD-a-lot crawl is publicly available at http://lod-a-lot.lod.labs.vu.nl/

It was first published in ISWC 2017, it is online at https://epub.wu.ac.at/6484/
    and https://doi.org/10.1007/978-3-319-68204-4_7

The BibTeX reference is
 author = {Javier David Fernandez Garcia and Wouter Beek and
           Miguel A. Mart{\'i}nez-Prieto and Mario Arias},
 booktitle = {The Semantic Web - ISWC 2017},
 editor = {C. D'Amato and M. Fern{\'a}ndez and V. Tamma and
           F. Lecue and P. Cudr{\'e}-Mauroux and J. Sequeda and
           C. Lange and J. Heflin},
 address = {Cham},
 title = {LOD-a-lot: A Queryable Dump of the LOD Cloud},
 publisher = {Springer International Publishing},
 year = {2017},
 pages = {75--83},
 url = {https://epub.wu.ac.at/6484/}

Comments on Jerome Euzenat’s “A map without a legend”

The Semantic Web Journal is running a 10 year anniversary edition, and Jerome Euzenat wrote a very nice piece for that special issue, entitled “A map without a legend“. It’s actually a rather bold an unashamed case for the explicit representation of knowledge:

  1. Jerome articulates very clearly the value of explicitly expressed knowledge, both in humans and machines, in forms that can be communicated (as opposed to actionable but implicit knowledge that has to be relearned all the time).
  2. He unashamedly expresses the ambition that such explicit knowledge, in formats that are interpretable by machines, can contribute to the next step in the knowledge ecosystem (in a progression from storytelling to teaching, book writing, monasteries, universities and semantic webs)
  3. He uses eScience as a good illustrator for what could be achieved (a good choice, because eScience is a field where more progress in “real semantics” has been made than elsewhere

A minor complaint would be that the final section on knowledge dynamics (and the role of evolutionary mechanisms in knowledge dynamics) is rather disconnected from the main thesis of the rest of the paper.The whole “in defense of explicit knowledge” argument of the paper could have been done without that final section.

Finally, I’d like to point out that Euzenat’s whole argument about the value of explicit knowledge in a form processable by machines is also very relevant to the major debate that’s raging currently in Artificial Intelligence: should we not just fully rely on statistical techniques that learn actionable patterns from data. This paper is a clear articulation of the viewpoint that the answer to this question is ‘no’:

“Nowadays, web users are not expected to provide knowledge, nor to access it. It seems that they are mere data provider, mostly through their actions, e.g. click, buy, like. These data are machine processable, but not open. They are kept secret, in silos, to the exclusive exploitation of a single organisation. They are processed by corporations which eventually learn knowledge from that data. But this knowledge, in turn, is not shared nor even prone to be communicated because not necessarily expressed in an articulated language. Instead, it is directly actioned. Hence, knowledge does not improve.”

Amen to that.

Thoughts on Rich Sutton’s “Bitter Lesson”

I’ve been thinking on and off about Rich Sutton’s piece The Bitter Lesson. Here are three corners that I think Sutton is cutting too quickly:

1. Moore’s Law is way too slow to make any real progress in AI

Of course we should aim for technology that scales with compute power (how could you possibly disagree with this). But compute power alone actually scales too slow. Moore’s Law scales as 2^(N/1.5)  (double compute power ever 1.5 year). But search spaces grow much faster: chess is 35^N, Go is 250^N, natural language ambiguity interpretation is somewhere between 1-10 per word (estimate by Piek Vossen), etc.
So yes, Moore’s Law helps, but not a lot. The real progress has to come from algorithms and representations. Of course those should scale with compute, but that’s just a nice icing on the cake, not the essence of the breakthroughs.

2. Compute power is not the only bottleneck

Saying that our methods should scale with compute is not the same as saying that we should aim for technologies whose only bottleneck is compute power. If that were possible, it would be great: just wait for the chips to get faster. But even the methods of which Sutton says that they scale with compute power (read: ML) actually don’t scale with only compute power. Lack of training data (for example) is a major bottleneck on many ML techniques, no matter how well the compute power increases. So yes, of course things should scale with compute power, but there will always be other bottlenecks too, and Sutton is pretending that for ML there aren’t.

3. Knowledge-based methods do scale with compute

Sutton asserts: “the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation”.  That ignores KR from the past decade. One of the main reasons that we can now compute with knowledge bases on the order of a billion edges is because memory got so cheap. So these methods do scale scale with compute. And yes, they have other bottlenecks (knowledge acquisition suffers from a size/quality trade-off), and therefore more compute does not simply mean more power. But something similar applies to ML: other bottlenecks (e.g. data availability) stop it from scaling arbitrarily with more compute.

So in short:

1. Of course good methods should scale with more compute power, but other bottlenecks will kick in with all methods, with ML as much as with KR.
2. Relying on Moore’s Law will put you in for a very long wait, because it’s growth curve is much too slow for the size of search spaces
3. KR methods do scale with compute (in particular with memory)

Other interesting responses to The Bitter Lesson