Contact us


Aug 23, 2007:
New version 2.3.1 of IE plug-in solves problem of PMID not loading. PubMed changes to its URL caused it.

Jul 17, 2007:
All interactions from IntAct are now integrated into CBioC.

June 23, 2006:
See "Annotate While You Read" in Science Magazine's NetWatch.

June 1, 2006:
New feature: display facts from multiple PMIDs by entering a comma-separated list in the search box.

In Brief

CBioC (Beta) allows extraction and collaboration for data curation.

After install, loads when you visit PubMed.

Gets interactions from PubMed abstracts.

Allows you to vote and modify extracted data.

Also shows data from BIND, DIP, MINT, GRID, IntAct.


- BioAI Group
- CSE Dep.
- PubMed

CBioC: Collaborative Bio Curation

The volume of existing biomedical articles is huge and it grows day by day. From 1994 to 2004, close to 3 million biomedical articles were published by US and European researchers alone. Added to the approximately 15 million abstracts already in PubMed, this represents over 800 new articles per day and a myriad of individual new facts to survey for information relevant to a particular research question.

Currently two approaches are pursued to extract and combine facts from biomedical publications. The first approach of hiring human curators is expensive, and thus does not scale-up. It also leads to bias. The second approach of using automated information extraction systems only has a recall and precision of around 60%.

We present here a new approach to the problem through mass collaboration, where the community of researchers that writes and reads the biomedical texts will be able to contribute to the curation process, dictating the pace at which it is done.

Overview of our Approach

Automated text extraction is used as a starting point to bootstrap the database, but then it is up to biologists improve upon the extracted data, "ironing out" inconsistencies by subsequent edits on a massive scale.

CBioC runs as a web browser extension and allows unobtrusive use of the system during the regular course of research in PubMed. It can also be accessed directly (withouth having to install a plug-in).

Statistics for CBioC

CBioC Statistics
AbstractsIntegrated Data
Total Processed:1,804,300 BIND Interactions:114,684
With Interactions:53%GRID Interactions:58,366
Interactions MINT Interactions:51,721
Total Protein/Protein:1,274,799 DIP Interactions:52,068
Total Gene/Disease:376,425 IntAct Interactions:93,148
Total Gene/Bio-Process:287,414