Zero-shot Protein Interaction Retrieval
P2PSigLip
Interaction Search Platform
Dual-encoder retrieval for screening, dataset analytics, and all-vs-all exports.
Contact Us
Zhang Zhao Laboratory · China Agricultural University
Address: Room 332, Horticulture Building, China Agricultural University, Beijing, China
Prof. Zhao Zhang
Email: [email protected]
Site maintainer: JiCheng Tang (M.Sc. student)
Email: [email protected]
Model code: GitHub
FAQs
Quick answers to common questions about screening, dataset browsing, exports, and limits.
What is P2PSigLip optimized for, and what are its limitations?
P2PSigLip is optimized for fast, scalable interaction retrieval in proteome-scale settings.
Like any model, predictions can be affected by sequence redundancy, incomplete annotations, and domain-specific biases.
We recommend using results as ranked hypotheses and validating with orthogonal evidence or experiments.
How should I interpret the score (0–1) and select a threshold?
The score is a confidence-like model output: higher generally indicates stronger predicted interaction.
Thresholds are not universal; choose based on your tolerance for false positives.
A practical workflow is to start at 0.60, then adjust after inspecting the score curve,
score distribution, and downstream validation capacity.
How do I design a screening workflow for wet-lab validation?
Prioritize candidates that are consistently high-scoring across related targets, supported by known biology (co-expression, localization),
and robust to reasonable threshold changes. Consider filtering by domain knowledge and using the network view to avoid isolated artifacts.
What’s the difference between “Target vs Library” and “Set A vs Set B”?
Target vs Library ranks a library against one or multiple query proteins.
Set A vs Set B computes all pairwise scores between two sets for batch interaction discovery.
How do I build and export an interaction network (and when should I hide self-loops)?
Use Dataset Browser → Interaction network. Always select the correct dataset (species),
paste your protein IDs, and set a threshold. Hiding self-loops typically improves readability when you focus on inter-protein edges.
Export PNG for slides and manuscripts.
How are all-vs-all exports organized (Top200 vs Top1000)?
Top200 is a compact per-target shortlist for fast browsing and downstream filtering.
Top1000 is provided as a single ZIP containing multiple parquet parts for large-scale analysis.
Use parquet to keep types stable and speed up loading in Python/R.
Data privacy: do you store my uploaded sequences?
Uploaded files are processed to run the job and stored only as needed for task execution and short-term history.
If you have sensitive data or need a dedicated run, contact us for a private workflow and data handling options.
Why did my task history disappear, and how can I keep it stable?
The site uses guest-mode identifiers in your browser. Clearing cookies will remove access to your task history.
For stability, avoid clearing cookies and export results you care about (CSV/ZIP/PNG).
Limits
Limits protect shared compute resources and keep the web experience responsive. For larger requests, use the all-vs-all service.
Upload
Max upload per file: -
Target vs Library
Library sequences: - · Target IDs: -
Set A vs Set B
|A|: - · |B|: - · |A|×|B|: -
Task history
Stored per browser: latest - tasks. Guest mode: clearing cookies removes history.
Need a larger job?
Use All-vs-all Service to get an estimate, or email
- for scheduling and pricing.