Paper 02 - Generative Chemistry - Z-Screen Pilot Release

Why it matters

A normal screen measures which wells looked interesting. It often cannot tell why a chemical series worked, which pieces of the molecule mattered, or what to make next.

Z-Screen keeps the chemistry recipe attached to every RNA response. With that linkage, the system can learn which building blocks tend to push cells toward which states, and then use that learning to prioritize compounds that have not been measured yet. The data accumulates rather than expires.

What we did

We analyzed 615,793 RNA profiles across 12 combinatorial libraries and 4 cell lines. Every profile was linked to the compound's building-block identity, so the model could learn from chemical structure instead of treating each compound as an isolated label.

The analysis worked through six questions in order. Is the RNA signal reproducible. Do building blocks predict held-out compound responses. Does the full chemical structure add information beyond building-block identity. Can new tuples land near known reference drugs. Do chemical programs recur across cell lines. Does imaging add an unbiased secondary signal.

We also compared overlapping control compounds with LINCS L1000, the largest public transcriptional reference. That check asks the basic sanity question: is Z-Screen seeing biology the rest of the field would recognize.

What we found

Six results that turn a screen into a learning model.

FINDING 01

The readout is stable enough to model.

Named controls reproduced strongly across all four cell lines. That sets the floor: when the model misses a compound later, "the RNA was too noisy" cannot be the easy explanation.

0.99+Control reproducibility - 3/4 cell lines

FINDING 02

The chemical recipe predicts biology.

In the strongest systems, building-block models predicted held-out compound RNA states better than a centroid baseline. The model is learning which chemical pieces carry which biological programs.

-33.2%Prediction error reduction

FINDING 03

Full structure adds extra signal.

Building-block identity was the most reliable coordinate. In the best-sampled setting, layering in full chemical-structure information improved the model further. As library coverage grows, the model can move from library grammar toward broader chemical reasoning.

-29.3%Error with recipe + structure

FINDING 04

New library compounds land near recognizable drug states.

Some library tuples produced RNA neighborhoods near known reference compounds. These are not mechanism calls; they are candidate state neighborhoods a discovery team can triage and test quickly.

0.901Cosine to ZF-104 - ZEL024 / HEK293

FINDING 05

Some chemical programs survive a change in cell type.

At chemistry-resolved positions, several building-block programs pointed in similar directions across paired cell lines. The transfer is partial. It is also strong enough to make cross-cell reuse worth pursuing seriously.

0.602Cross-cell agreement signal

FINDING 06

An outside reference sees the same direction.

For the compounds that overlap with LINCS L1000, Z-Screen RNA signatures agreed with the public reference without any Z-Screen-specific calibration. The overlap is small, but it confirms the pilot is reading transcriptional biology the field would recognize.

11 / 11 positiveMatched LINCS comparisons

What this enables

A chemistry-biology design map that compounds across campaigns.

Once a screen has learned a relationship between chemistry and cell state, each campaign feeds the next. Paper 03 stress-tests that map on harder chemistry. Paper 04 uses it to project responses across cell types. Paper 05 connects it to CRISPR atlases. What carries forward between campaigns is the compounding map of chemical-biology itself, not a single ranked list of hits.

A screen that learns from every molecule it tests.