Most screens end as a spreadsheet of hits and misses. Z-Screen leaves behind a map between chemical building blocks and the cell states they produce. In the best-sampled pilot systems, that map predicted RNA responses for held-out compounds and helped identify measured tuples that landed near known reference-state neighborhoods.
Because Z-Screen knows the recipe for every compound, the dataset can ask which chemical pieces move which cell programs. In the strongest pilot case, the model predicted held-out RNA states well below the error of a simple baseline, so the screen functions as a design map you can keep using rather than a one-shot ranked list.
A normal screen measures which wells looked interesting. It often cannot tell why a chemical series worked, which pieces of the molecule mattered, or what to make next.
Z-Screen keeps the chemistry recipe attached to every RNA response. With that linkage, the system can learn which building blocks tend to push cells toward which states, and then use that learning to prioritize compounds that have not been measured yet. The data accumulates rather than expires.
We analyzed 615,793 RNA profiles across 12 combinatorial libraries and 4 cell lines. Every profile was linked to the compound's building-block identity, so the model could learn from chemical structure instead of treating each compound as an isolated label.
The analysis worked through six questions in order. Is the RNA signal reproducible. Do building blocks predict held-out compound responses. Does the full chemical structure add information beyond building-block identity. Can new tuples land near known reference drugs. Do chemical programs recur across cell lines. Does imaging add an unbiased secondary signal.
We also compared overlapping control compounds with LINCS L1000, the largest public transcriptional reference. That check asks the basic sanity question: is Z-Screen seeing biology the rest of the field would recognize.
Named controls reproduced strongly across all four cell lines. That sets the floor: when the model misses a compound later, "the RNA was too noisy" cannot be the easy explanation.
In the strongest systems, building-block models predicted held-out compound RNA states better than a centroid baseline. The model is learning which chemical pieces carry which biological programs.
Building-block identity was the most reliable coordinate. In the best-sampled setting, layering in full chemical-structure information improved the model further. As library coverage grows, the model can move from library grammar toward broader chemical reasoning.
Some library tuples produced RNA neighborhoods near known reference compounds. These are not mechanism calls; they are candidate state neighborhoods a discovery team can triage and test quickly.
At chemistry-resolved positions, several building-block programs pointed in similar directions across paired cell lines. The transfer is partial. It is also strong enough to make cross-cell reuse worth pursuing seriously.
For the compounds that overlap with LINCS L1000, Z-Screen RNA signatures agreed with the public reference without any Z-Screen-specific calibration. The overlap is small, but it confirms the pilot is reading transcriptional biology the field would recognize.
Once a screen has learned a relationship between chemistry and cell state, each campaign feeds the next. Paper 03 stress-tests that map on harder chemistry. Paper 04 uses it to project responses across cell types. Paper 05 connects it to CRISPR atlases. What carries forward between campaigns is the compounding map of chemical-biology itself, not a single ranked list of hits.
Public release. The preprint PDF is hosted here; the canonical workspace and per-paper analysis scripts live on Zenodo with a citable DOI.