Comments on Jerome Euzenat’s “A map without a legend”

The Semantic Web Journal is running a 10 year anniversary edition, and Jerome Euzenat wrote a very nice piece for that special issue, entitled “A map without a legend“. It’s actually a rather bold an unashamed case for the explicit representation of knowledge:

  1. Jerome articulates very clearly the value of explicitly expressed knowledge, both in humans and machines, in forms that can be communicated (as opposed to actionable but implicit knowledge that has to be relearned all the time).
  2. He unashamedly expresses the ambition that such explicit knowledge, in formats that are interpretable by machines, can contribute to the next step in the knowledge ecosystem (in a progression from storytelling to teaching, book writing, monasteries, universities and semantic webs)
  3. He uses eScience as a good illustrator for what could be achieved (a good choice, because eScience is a field where more progress in “real semantics” has been made than elsewhere

A minor complaint would be that the final section on knowledge dynamics (and the role of evolutionary mechanisms in knowledge dynamics) is rather disconnected from the main thesis of the rest of the paper.The whole “in defense of explicit knowledge” argument of the paper could have been done without that final section.

Finally, I’d like to point out that Euzenat’s whole argument about the value of explicit knowledge in a form processable by machines is also very relevant to the major debate that’s raging currently in Artificial Intelligence: should we not just fully rely on statistical techniques that learn actionable patterns from data. This paper is a clear articulation of the viewpoint that the answer to this question is ‘no’:

“Nowadays, web users are not expected to provide knowledge, nor to access it. It seems that they are mere data provider, mostly through their actions, e.g. click, buy, like. These data are machine processable, but not open. They are kept secret, in silos, to the exclusive exploitation of a single organisation. They are processed by corporations which eventually learn knowledge from that data. But this knowledge, in turn, is not shared nor even prone to be communicated because not necessarily expressed in an articulated language. Instead, it is directly actioned. Hence, knowledge does not improve.”

Amen to that.

Thoughts on Rich Sutton’s “Bitter Lesson”

I’ve been thinking on and off about Rich Sutton’s piece The Bitter Lesson. Here are three corners that I think Sutton is cutting too quickly:

1. Moore’s Law is way too slow to make any real progress in AI

Of course we should aim for technology that scales with compute power (how could you possibly disagree with this). But compute power alone actually scales too slow. Moore’s Law scales as 2^(N/1.5)  (double compute power ever 1.5 year). But search spaces grow much faster: chess is 35^N, Go is 250^N, natural language ambiguity interpretation is somewhere between 1-10 per word (estimate by Piek Vossen), etc.
So yes, Moore’s Law helps, but not a lot. The real progress has to come from algorithms and representations. Of course those should scale with compute, but that’s just a nice icing on the cake, not the essence of the breakthroughs.

2. Compute power is not the only bottleneck

Saying that our methods should scale with compute is not the same as saying that we should aim for technologies whose only bottleneck is compute power. If that were possible, it would be great: just wait for the chips to get faster. But even the methods of which Sutton says that they scale with compute power (read: ML) actually don’t scale with only compute power. Lack of training data (for example) is a major bottleneck on many ML techniques, no matter how well the compute power increases. So yes, of course things should scale with compute power, but there will always be other bottlenecks too, and Sutton is pretending that for ML there aren’t.

3. Knowledge-based methods do scale with compute

Sutton asserts: “the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation”.  That ignores KR from the past decade. One of the main reasons that we can now compute with knowledge bases on the order of a billion edges is because memory got so cheap. So these methods do scale scale with compute. And yes, they have other bottlenecks (knowledge acquisition suffers from a size/quality trade-off), and therefore more compute does not simply mean more power. But something similar applies to ML: other bottlenecks (e.g. data availability) stop it from scaling arbitrarily with more compute.

So in short:

1. Of course good methods should scale with more compute power, but other bottlenecks will kick in with all methods, with ML as much as with KR.
2. Relying on Moore’s Law will put you in for a very long wait, because it’s growth curve is much too slow for the size of search spaces
3. KR methods do scale with compute (in particular with memory)

Other interesting responses to The Bitter Lesson