On the Bitter Lesson

The original essay: The Bitter Lesson.

Central idea: in the long-term it’s far better to let computation take the burden of finding greater AI systems via brute forced training or inferencing algorithms. Trying to find them ourselves by leveraging human domain knowledge has led to performance ceilings, wasted time and resource. I guess a Zen master would say, don't design the AI. Let it happen instead.

The idea is ever more relevant in the trajectory of LLMs. This could explain the obsessive emphasis on understanding the scaling laws of GPT and its emergent behaviors by the academia and its feverish pursuit of bigger, larger architecture, which is currently constrained by hardware, social, ethical, and regulatory forces.

The business world view, on the other hand, capitalizes on the delay and uneven distribution of technological advancement. It is a staircase-shaped curve of growth, not a gradual slope. An inefficient market (due to the various external forces) is also a highly profitable market. No sensible CEO would miss the opportunity to capture market value by being the 1st to step on the next tread of that metaphorical staircase. Every step is a new gold rush but also heralds the falloff of companies built on the previous treads - companies must pivot to AI now because their specialized human-domain knowledge applications will be replaced by the generalized skills unlocked by the latest step.

Feels like a market here:

Demand: Companies need generalized AI for increased productivity, either as the product they sell, or as a means of production
Supply: Researchers willing to purse architectures that lead to increasing levels of generalizability, either due to curiosity or greed

My assumption is that the force of the intellectual pursuit does not come even close to the force of capital. Think Sam (representing of the force of $) vs. the OpenAI board (representing force of intellectual pursuit and ethical thinking). Sam are driven to maximize profit at each step of the staircase, even at the cost of delaying the research to unlock the next step - and there is no reason not to suspect such delay being intentional, while researchers, with eyes on the end of the staircase, want to reach AGI as soon as possible.

For my team, my advise is to zoom out and think where we are in the big picture. What forces do we work with and how to make the most out of it both short-term and long-term.

p.s. Putting the use of "brute force scaling" into perspective, this visualization shows the architecture ratios of GPT-2 vs GPT-3. https://bbycroft.net/llm