Z-Screen Pilot Release/ Preprints / Paper 02
PAPER 02 - Generative Chemistry

A screen that learns from every molecule it tests.

Most screens end as a spreadsheet of hits and misses. Z-Screen leaves behind a map between chemical building blocks and the cell states they produce. In the best-sampled pilot systems, that map predicted RNA responses for held-out compounds and helped identify measured tuples that landed near known reference-state neighborhoods.

Paper 02 · 14 pages · April 2026 · CC-BY 4.0
CHEMISTRY → PHENOTYPE - BUILDING-BLOCK MODEL VS BASELINE 0.0 0.3 0.6 0.9 ZEL024 / HEK293 33% less error ZEL031 / THP1 21% less error ZEL024 / H1650 positive gain ZEL031 / A549 positive gain CENTROID BASELINE BUILDING-BLOCK MODEL
Hero figure - Building-block models predicted held-out RNA responses better than simple baselines in the strongest systems.
TL;DR

Because Z-Screen knows the recipe for every compound, the dataset can ask which chemical pieces move which cell programs. In the strongest pilot case, the model predicted held-out RNA states well below the error of a simple baseline, so the screen functions as a design map you can keep using rather than a one-shot ranked list.

Why it matters

A normal screen measures which wells looked interesting. It often cannot tell why a chemical series worked, which pieces of the molecule mattered, or what to make next.

Z-Screen keeps the chemistry recipe attached to every RNA response. With that linkage, the system can learn which building blocks tend to push cells toward which states, and then use that learning to prioritize compounds that have not been measured yet. The data accumulates rather than expires.

What we did

We analyzed 615,793 RNA profiles across 12 combinatorial libraries and 4 cell lines. Every profile was linked to the compound's building-block identity, so the model could learn from chemical structure instead of treating each compound as an isolated label.

The analysis worked through six questions in order. Is the RNA signal reproducible. Do building blocks predict held-out compound responses. Does the full chemical structure add information beyond building-block identity. Can new tuples land near known reference drugs. Do chemical programs recur across cell lines. Does imaging add an unbiased secondary signal.

We also compared overlapping control compounds with LINCS L1000, the largest public transcriptional reference. That check asks the basic sanity question: is Z-Screen seeing biology the rest of the field would recognize.

What we found

Six results that turn a screen into a learning model.

FINDING 01

The readout is stable enough to model.

Named controls reproduced strongly across all four cell lines. That sets the floor: when the model misses a compound later, "the RNA was too noisy" cannot be the easy explanation.

0.99+Control reproducibility - 3/4 cell lines
FINDING 02

The chemical recipe predicts biology.

In the strongest systems, building-block models predicted held-out compound RNA states better than a centroid baseline. The model is learning which chemical pieces carry which biological programs.

-33.2%Prediction error reduction
FINDING 03

Full structure adds extra signal.

Building-block identity was the most reliable coordinate. In the best-sampled setting, layering in full chemical-structure information improved the model further. As library coverage grows, the model can move from library grammar toward broader chemical reasoning.

-29.3%Error with recipe + structure
FINDING 04

New library compounds land near recognizable drug states.

Some library tuples produced RNA neighborhoods near known reference compounds. These are not mechanism calls; they are candidate state neighborhoods a discovery team can triage and test quickly.

0.901Cosine to ZF-104 - ZEL024 / HEK293
FINDING 05

Some chemical programs survive a change in cell type.

At chemistry-resolved positions, several building-block programs pointed in similar directions across paired cell lines. The transfer is partial. It is also strong enough to make cross-cell reuse worth pursuing seriously.

0.602Cross-cell agreement signal
FINDING 06

An outside reference sees the same direction.

For the compounds that overlap with LINCS L1000, Z-Screen RNA signatures agreed with the public reference without any Z-Screen-specific calibration. The overlap is small, but it confirms the pilot is reading transcriptional biology the field would recognize.

11 / 11 positiveMatched LINCS comparisons
What this enables

A chemistry-biology design map that compounds across campaigns.

Once a screen has learned a relationship between chemistry and cell state, each campaign feeds the next. Paper 03 stress-tests that map on harder chemistry. Paper 04 uses it to project responses across cell types. Paper 05 connects it to CRISPR atlases. What carries forward between campaigns is the compounding map of chemical-biology itself, not a single ranked list of hits.

Access

Preprint, data, and analysis repo.

Public release. The preprint PDF is hosted here; the canonical workspace and per-paper analysis scripts live on Zenodo with a citable DOI.