The Compression Problem
Generating hypotheses is a storage problem. Choosing which ones to advance is not.
In biotech, progress is often framed as a data problem.
More sequencing.
More screening.
More real-world evidence.
And now, increasingly—
More AI.
The implicit assumption is straightforward—if we can generate and access enough information, insight will follow.
That assumption is worth revisiting.
Biology does not lack for information.
Over the past several decades, the industry has built one of the largest filing cabinets in any field of science:
Genomic datasets
High-throughput screens
Clinical trial results
Real-world evidence
The constraint has never been storage.
It has been understanding.
The human genome was the ultimate filing cabinet. Thirty years later the constraint is still understanding, not storage.
There is a difference between storing information and compressing it.
A system that stores information can recall facts.
A system that compresses information extracts structure.
It learns what matters.
This distinction sits at the center of the current AI narrative.
Today’s systems are remarkably effective at accessing and organizing information. With the right inputs, they can synthesize large volumes of data into coherent outputs.
They retrieve.
They recombine.
They respond.
But they do not, in any meaningful sense, learn from experience after deployment. New information is incorporated into context—temporarily—not into the underlying structure of the system.
A larger filing cabinet is still a filing cabinet.
This raises a more important question.
If AI is being positioned as a driver of drug discovery, what exactly is it improving?
In most industries, the answer to that question can remain vague. AI shows up in operating margins, headcount efficiency, or workflow optimization—diffuse benefits that are difficult to isolate and often, challenging to quantify.
Biotech is different.
Value is created at discrete inflection points:
Target selection
Preclinical validation
Clinical outcomes
Each of these ultimately resolves into a small set of variables:
Time
Cost
Probability
For AI to be economically meaningful in drug development, it must move at least one of them.
If it does not, it is not changing value. It is changing workflow.
This becomes particularly relevant in the current wave of AI-native biotech companies.
These are not companies that simply use AI. They are companies whose reason for existence is AI.
Implicitly, they are making a stronger claim:
That AI can improve the core economics of drug discovery.
That claim can be evaluated.
If AI primarily increases idea generation, the result is predictable:
More targets
More programs
More shots on goal
But not necessarily a higher probability that any individual program succeeds.
In that world, activity increases.
Productivity may not.
If, on the other hand, AI improves selection—the ability to distinguish signal from noise, causality from correlation—then the impact is different.
Fewer, better programs.
Capital allocated more efficiently.
Higher probability of success.
Early on, these two paths are difficult to distinguish.
Both produce:
Compelling narratives
Large datasets
Platform claims
References to proprietary models
Only one changes outcomes.
This is where the distinction between storage and compression becomes more than conceptual.
Generating hypotheses is a storage problem.
Choosing which ones to advance is a compression problem.
Biology has historically struggled with the second.
There is also a practical challenge.
Even if AI is improving outcomes, it may be difficult to prove in real time.
Drug development operates on long timelines.
Sample sizes are small.
Attribution is unclear.
Was the success due to better biology—or better selection?
The answer is often unknowable in the moment.
Which creates a dynamic very familiar in biotech.
Narratives emerge before evidence.
Signals are inferred before they are measured.
And credibility becomes a function of belief rather than data—at least temporarily.
None of this is to suggest that AI will not matter in drug discovery.
It already does.
The question is where.
If it remains a retrieval system, it will expand the universe of ideas.
If it becomes a learning system, it may change how those ideas are evaluated.
That distinction is not semantic.
It is economic.
Biology does not lack for hypotheses.
It lacks for knowing which ones deserve to survive.
The next bottleneck in drug development may not be generating ideas—
but developing systems, human or otherwise, that know which ones to kill.
