HomeArtificial IntelligenceGoogle AI Releases C2S-Scale 27B Mannequin that Translate Complicated Single-Cell Gene Expression...

Google AI Releases C2S-Scale 27B Mannequin that Translate Complicated Single-Cell Gene Expression Knowledge into ‘cell sentences’ that LLMs can Perceive


A workforce of researchers from Google Analysis, Google DeepMind, and Yale launched C2S-Scale 27B, a 27-billion-parameter basis mannequin for single-cell evaluation constructed on Gemma-2. The mannequin formalizes single-cell RNA-seq (scRNA-seq) profiles as “cell sentences”—ordered lists of gene symbols—so {that a} language mannequin can natively parse and purpose over mobile states. Past benchmarking good points, the analysis workforce experiences an experimentally validated, context-dependent pathway: CK2 inhibition (silmitasertib/CX-4945) mixed with low-dose interferon amplifies antigen presentation, a mechanism that might make “chilly” tumors extra conscious of immunotherapy. The result’s ~50% enhance in antigen presentation in vitro beneath the mixed situation.

Understanding the mannequin

C2S-Scale converts a high-dimensional expression vector into textual content by rank-ordering genes and emitting the top-Ok symbols as a gene-name sequence. This illustration aligns single-cell knowledge with commonplace LLM toolchains and permits duties resembling cell-type prediction, tissue classification, cluster captioning, perturbation prediction, and organic QA to be phrased as textual content prompts and completions.

https://github.com/vandijklab/cell2sentence

Coaching knowledge, stack, and launch

C2S-Scale-Gemma-2-27B is constructed on Gemma-2 27B (decoder-only Transformer), skilled on Google TPU v5, and launched beneath CC-BY-4.0. The coaching corpus aggregates >800 public scRNA-seq datasets spanning >57M cells (human and mouse) with related metadata and textual context; pretraining unifies transcriptomic tokens and organic textual content right into a single multimodal corpus.

The important thing outcome: an interferon-conditional amplifier

The analysis workforce constructed a dual-context digital display screen over >4,000 medicine to search out compounds that increase antigen presentation (MHC-I program) solely in immune-context-positive settings—i.e., major affected person samples with low interferon tone—whereas having negligible impact in immune-context-neutral cell-line knowledge. The mannequin predicted a hanging context cut up for silmitasertib (CK2 inhibitor): sturdy MHC-I upregulation with low-dose interferon, little to none with out interferon. The analysis workforce experiences in-lab validation in human neuroendocrine fashions unseen in coaching, with the mixture (silmitasertib + low-dose interferon) producing a marked, synergistic enhance in antigen presentation (≈50% of their assays).

The amplifier lowers the response threshold to interferon moderately than initiating antigen presentation de novo; flow-cytometry readouts present HLA-A,B,C upregulation solely beneath mixed remedy (together with IFN-β and IFN-γ), throughout two neuroendocrine fashions, with consultant MFI good points (e.g., 13.6% @10 nM and 34.9% @1000 nM silmitasertib in a single mannequin).

Key Takeaways

  • C2S-Scale 27B (Gemma-2) encodes scRNA-seq profiles as textual “cell sentences,” enabling LLM-native single-cell evaluation workflows.
  • In a two-context digital display screen (>4,000 compounds), the mannequin predicted an interferon-conditional amplifier: CK2 inhibition (silmitasertib) boosts MHC-I antigen-presentation solely with low-dose IFN.
  • Moist-lab assessments in human neuroendocrine cell fashions confirmed the prediction, with ~50% antigen-presentation enhance for silmitasertib+IFN versus both alone; this stays preclinical/in vitro.
  • Open weights and utilization docs are reside on Hugging Face (vandijklab) with each 27B and 2B Gemma variants for analysis use.

C2S-Scale 27B is a technically credible step for LLMs in biology: translating scRNA-seq into “cell sentences” lets a Gemma-2 mannequin run programmatic queries over cell states and perturbations, and in apply it surfaced an interferon-conditional amplifier—silmitasertib (CK2 inhibition)—that will increase MHC-I antigen presentation solely with low-dose IFN, a mechanism the workforce then validated in vitro. The worth right here isn’t headline rhetoric however the workflow: text-native screening throughout >4k compounds beneath twin immune contexts to suggest a context-dependent pathway which will convert immune-“chilly” tumors towards visibility. That mentioned, all proof is preclinical and bench-scale; the best learn is “hypothesis-generating AI” with open weights enabling replication and stress-testing, not a medical declare.


Take a look at the Technical Paper, Mannequin on HF, GitHub Web page and Technical particulars . Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to observe us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you possibly can be part of us on telegram as nicely.


Michal Sutter is a knowledge science skilled with a Grasp of Science in Knowledge Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and knowledge engineering, Michal excels at remodeling advanced datasets into actionable insights.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments