RNA & protein pocket discovery · structural ranking

Structural analysis for RNA and protein small-molecule targets — with benchmarked limits, and where we won’t guess

From sequence or structure: predicted 3D structure, a conformational ensemble, a ranked pocket shortlist with cross-frame stability, and integrated evidence — family classification, precedent ligands, interaction fingerprints, functional annotations. Validated on real drug discovery targets: HCV IRES IIa recovered from the apo structure alone at strict top-1 (RNA); CDK2, KRAS G12C, and MDM2 rank-1 recovery verified against PDB ligand-contact ground truth (protein).

For RNA, H-type pseudoknots and G-quadruplexes are excluded, not guessed at. For protein, pockets in low-confidence structure regions (pLDDT < 0.7) are flagged as lower-confidence, not silently ranked.

See worked examples Read the findings

Live 3D

Rank 1 · 71% strictRank 2Rank 3

Open 2GDI demo →

AUGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUA · GCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAG · CUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC · UAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCU · AUGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUA · GCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAG · CUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC · UAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCU ·

3 / 7

Strict @1 with our ensemble + ranker (vs 0/7 single-frame fpocketR)

6 / 7

Near @1 with our ensemble + ranker (vs 2/7 single-frame fpocketR)

Apo/holo pairs surveyed — binding-site RMSD exceeded global RMSD in 75%

// Get in touch

A structural biology partner, not a subscription

We validate on real benchmarks, publish our own limitations, and recover real drug sites from sequence alone — HCV IRES IIa at strict top-1 (RNA) and CDK2, KRAS G12C, and MDM2 at rank-1 against PDB ligand-contact ground truth (protein) are examples, not cherry picks.

Work with us as:

→A dedicated inference server built for your program — RNA, protein, or both
→Integration into an existing discovery pipeline
→A single-target assessment to evaluate the pipeline on your own data

Engagements are scoped per project — no tiers, no subscription.

Get in touch See worked examples

// What we built

Why ensemble ranking matters for RNA pocket discovery

v0.2 architectural contribution

The field has converged on the answer that standard cavity detection over-predicts on RNA — fpocket’s default parameters mistake the polar grooves of duplex RNA for binding pockets. The Weeks lab formalised this as an RNA-tuned wrapper, fpocketR (Veenbaas et al., PNAS 2025), and we use the same RNA-tuned parameters. Detection alone, however, is not where customer recovery happens.

On the same seven cleft-binder targets, the single-frame fpocketR-style detection (and ours — the two are empirically equivalent) leaves the rank-1 pocket at the experimental binding site in only 2 of 7 cases at near-recovery and 0 of 7 at strict. The v0.2 contribution is what we add on top: a five-frame ANM conformational ensemble, cross-frame pocket clustering, and a cluster ranker based on persistence × binding-residue stability. Both ranker features are RNA-applicable by construction.

Rank-1 recovery on 7 cleft-binder targets3 / 7 strict· 6 / 7 near — with our ensemble + geometric ranker0 / 7 strict· 2 / 7 near — with fpocketR-style single-frame detection alone

Locked benchmark, deterministic re-run. Per-target lift figures and the full comparison table (vanilla fpocket / fpocketR params / our params / ensemble + ranker) on the methodology page. strict@1 = rank-1 cluster overlaps the experimental binding site by ≥ 50% of residues; near@1 = ≥ 30%.

Read full methodology

// Capabilities

What the platform actually does

One integrated workflow, end to end, for RNA or protein. Sequence in, ranked top-3 pocket shortlist out, with integrated evidence, full per-cluster metadata, and a customer-facing PDF report.

013D structure prediction from sequence

Predict the 3D tertiary structure of any target from sequence alone. RNA uses RhoFold; protein uses AlphaFold DB with an ESMFold fallback for targets without a precomputed entry. Deterministic outputs, suitable for downstream pocket detection. Single-sequence prediction is the RNA default; an MSA-driven path is available for targets with diverse-tail evolutionary signal.

02Conformational ensemble generation

Sample a five-frame conformational ensemble around the predicted structure to capture the kind of motion that drives pocket formation. ProDy ANM normal-mode sampling, applied the same way to RNA and protein — deterministic, low-frequency-mode-driven, chosen over short MD because force-field equilibration drift demoted real binding-site clusters in our pilots.

03Cavity detection and pocket ranking

Detect cavities across the ensemble with fpocket and rank them with our cross-frame geometric scoring function (persistence × binding-residue stability). RNA uses RNA-tuned parameters (consistent with the published fpocketR approach, Veenbaas et al. 2025) — fpocket’s own druggability score is protein-trained and unreliable on RNA, so we don’t use it there. On protein targets fpocket runs closer to native and its druggability score is legitimate metadata. The cross-frame ranker on the RNA benchmark lifts rank-1 recovery from 0/7 strict (single-frame detection) to 3/7 strict, and from 2/7 near to 6/7 near.

04Pre-pilot MSA tractability screening

Before running the full RNA pipeline we estimate whether your target’s evolutionary profile carries the diversity needed for MSA-driven prediction to help. Empirical screen — at least one homolog at <77% identity, or a non-trivial fraction in the 70-80% identity band. RNA-specific; calibrated on the RNA benchmark.

05Integrated evidence layer

Each ranked pocket is cross-referenced against real external evidence, not just geometry: PLIP protein-ligand interaction fingerprints on representative co-crystal structures (RNA support required setting PLIP’s DNARECEPTOR flag, which it does not enable by default); Foldseek structural neighbor search (protein only — Foldseek has no RNA support, honestly reported as not applicable rather than skipped silently); ClinVar + UniProt functional annotation overlap per pocket (protein); and six rules-based confidence flags with no learned calibration.

06Customer-facing PDF reports + downloadable bundle

Each run produces a downloadable bundle: ensemble PDB, JSON pocket data, residue lists, residues.csv, and a branded PDF report with full per-cluster metadata. Designed to drop into existing medicinal-chemistry workflows — we hand you the geometry; you bring the chemistry.

// How it works

From sequence to ranked shortlist

A single deterministic pipeline. Input a sequence (or a PDB upload); pick up a top-3 ranked shortlist of candidate pockets with full geometric metadata.

Predict and ensemble

AI structure prediction generates the 3D tertiary structure from sequence. A five-frame conformational ensemble is sampled around the prediction. MSA-driven prediction available where the pre-pilot screen indicates.

Detect and cluster

Cavities are detected on each frame and clustered across the ensemble at 4 Å. Persistent cavities — those that survive the conformational sampling — are kept; transient or single-frame artefacts are filtered out.

Rank and return

Persistent cavities are ranked by our RNA-applicable scoring function. You receive a top-3 shortlist with residue lists, geometric metadata, the ensemble PDB and a branded PDF report.

// Why work with RNAfold

Honest scope, transparent methodology

The differentiators that matter once the science is right.

[scope]

Explicit scope, before you rely on it

Cleft-shaped binding pockets are in scope. Groove binders, surface interfaces, H-type pseudoknots and G-quadruplexes are flagged out-of-scope upfront by the tractability screen. We tell you whether the pipeline should help on your specific target — not after the fact.

See scope & screen →

[findings]

The finding that qualifies our own assumption

A 78-pair apo/holo survey: the binding site is the part of the molecule that moves most on ligand binding. That limits how much any apo-structure pocket prediction — including ours — can be trusted. Written up as a technical finding, not marketing.

Read the findings →

[docs]

Transparent methodology

Full pipeline, third-party attribution and licences, validation methodology — all on /methodology. Every run gets the same disclosure in the PDF report. We don’t hide what we integrate; we name it where it belongs.

Read methodology →

[bench]

Reproducible benchmark

Seven cleft-binder targets, locked methodology, deterministic re-runs. The numbers below are what the pipeline actually produces, including the one neither case we don’t recover. No survivorship; no cherry-picking.

See benchmark →

// v0.2.0 benchmark

What the pipeline actually recovers

Seven cleft-binder targets with deposited co-crystal structures — six riboswitch families plus one group I intron. As-shipped configuration: single-sequence prediction by default; MSA mode where the pre-pilot screen indicates. Numbers are exact, deterministic, and reproducible.

TargetFamilyLengthArmpLDDTRMSD ÅRank-1 result

TargetArmResult

2GDI

TPP RF0005978 ntsingle-seq0.732.4Strict @171%

2GDI

TPP RF00059 · 78 nt

single-seqStrict @1

4GXY

B12 RF00174161 ntMSA0.754.3Near @135%

4GXY

B12 RF00174 · 161 nt

MSANear @1

2GIS

SAM-I RF0016294 ntsingle-seq0.821.3Near @138%

2GIS

SAM-I RF00162 · 94 nt

single-seqNear @1

5C45

FMN RF0005054 ntsingle-seq0.7510.2Near @140%

5C45

FMN RF00050 · 54 nt

single-seqNear @1

3DIL

out-of-scope fold class

Group I intron174 ntMSA0.7311.6Neither0%

3DIL

Group I intron · 174 nt

MSANeither

2HOJ

TPP (thi-box)83 ntsingle-seq0.7514.0Strict @153%

2HOJ

TPP (thi-box) · 83 nt

single-seqStrict @1

4LVV

THF RF0183189 ntMSA0.822.5Strict @150%

4LVV

THF RF01831 · 89 nt

MSAStrict @1

Strict @1 = at least one cluster in the rank-1 position with ≥ 50 % binding-site residue overlap.Near @1 = ≥ 30 %. RMSD = backbone C3′ RMSD vs experimental chain. Top-cluster overlap is shown after the result.

3DIL (group I intron) is a known out-of-scope fold class for v0.2 — we surface it in the table rather than hide it. Reproducer + per-cell records described on the methodology page.

// protein validation

What the pipeline actually recovers on protein targets

Seven real drug discovery targets, verified against ligand-contact ground truth pulled directly from PDB structures — not tested against memory or reputation. Rank-1 recovery on four targets (CDK2 canonical ATP site; KRAS G12C switch-II/sotorasib pocket; MDM2 p53-binding interface; carbonic anhydrase II catalytic zinc site); top-3 recovery — but rank-2, not rank-1 — on two targets (FKBP12 rapamycin-binding pocket; ADRB2 orthosteric site). The correct pocket is present in every report; the ranker is a shortlisting aid, not a definitive top-1 predictor.

Report the top-3, never “the top.” That framing isn’t hedging — it’s the direct, load-bearing implication of the FKBP12 and ADRB2 results in the table below, where the correct pocket is real and present, just not ranked first.

TargetClassLengthStructure providerTop-1 rankGround-truth overlap

TargetRankOverlap

CDK2

Kinase (ATP site)298 aaAlphaFold DBRank 122/22 (100%)

CDK2

Kinase (ATP site) · 298 aa

122/22 (100%)

KRAS G12C

GTPase (switch-II pocket)189 aaAlphaFold DBRank 16/21 (29%)

KRAS G12C

GTPase (switch-II pocket) · 189 aa

16/21 (29%)

MDM2

PPI (p53-binding interface)491 aaAlphaFold DBRank 18/22 (36%)

MDM2

PPI (p53-binding interface) · 491 aa

18/22 (36%)

Carbonic anhydrase II

Metalloenzyme (catalytic zinc site)260 aaAlphaFold DBRank 18/11 (73%)

Carbonic anhydrase II

Metalloenzyme (catalytic zinc site) · 260 aa

18/11 (73%)

FKBP12

correct pocket at rank 2, not rank 1

Immunophilin (rapamycin pocket)108 aaAlphaFold DBRank 22/14 (14%)

FKBP12

Immunophilin (rapamycin pocket) · 108 aa

22/14 (14%)

ADRB2

correct pocket at rank 2, not rank 1

GPCR (orthosteric site)413 aaAlphaFold DBRank 220/20 (100%)

ADRB2

GPCR (orthosteric site) · 413 aa

220/20 (100%)

TEM-1 β-lactamase

lands on a real, independently-published allosteric site (Ambler 99–114) — not the specific Bowman-lab cryptic pocket this target was originally chosen to test against

Enzyme (allosteric, ensemble-only)286 aaAlphaFold DBRank 1qualitative

TEM-1 β-lactamase

Enzyme (allosteric, ensemble-only) · 286 aa

1qualitative

Ground-truth overlap = pipeline pocket residues that match real ligand-contact residues (≤4.5–6.0 Å) from a deposited PDB co-crystal structure, computed directly against the structure file with the numbering crosswalk applied where the target’s literature convention (e.g. Ambler, Kabat) differs from pipeline-sequential numbering.

All seven targets use AlphaFold DB as the structure provider — the ESMFold fallback exists for targets without a precomputed AlphaFold DB entry but has not yet been exercised on a real target. Structure providers, pocket detectors, and confidence flags are stated per-target in every report, never assumed.

Six rules-based confidence flags (no learned calibration) and per-pocket evidence — PLIP interaction fingerprints, Foldseek structural neighbors, ClinVar + UniProt functional overlap — run on every target and are shown in full in the per-target reports on /demos. The RNA methodology write-up on /methodology predates the protein pipeline and does not yet cover it.

// Worked examples

See the pipeline output

Three live worked examples spanning the v0.2 outcome classes — strict@1 (2GDI, single-seq), strict@1 via the opt-in MSA path (4LVV) and near@1 with global RMSD honestly reported (5C45).

78 nt · single-seqlive

TPP riboswitch (2GDI)

Live worked example. The pipeline recovers the TPP binding-site cluster at rank 1 with 71% binding-site residue overlap. Top-3 with full per-cluster metadata + interactive 3D viewer.

live demorank-1 strictoverlap 71%

Open worked example

89 nt · MSAlive

THF riboswitch (4LVV)

Live worked example. The pre-pilot screen flags 4LVV’s diverse-tail homologs; MSA mode lifts rank-1 recovery from neither (19% overlap, single-seq) to strict (50% overlap, MSA).

msa opt-inrank-1 strictoverlap 50%

Open worked example

54 nt · single-seqlive

FMN riboswitch (5C45)

Live worked example. Smallest target in the benchmark (54 nt). Backbone RMSD 10 Å but the rank-1 cluster picks up 40% of the FMN binding-site residues — near@1, not strict. The case for reporting both global and local quality metrics.

near @1overlap 40%small RNA

Open worked example