One of the major challenges of the Covid-19 epidemic is managing the rapidly expanding scientific corpus that is published in journals, by health-related organizations, and on preprint servers. A new information retrieval (IR) research challenge aims to identify the best methods for retrieving scientific literature for the current and all future rapidly evolving pandemics. Researchers from the Department of Medical Informatics & Clinical Epidemiology (DMICE) are among the organizers of this new research challenge related to Covid-19. OHSU medical students overseen by DMICE are also annotating the output of systems for relevance to topics in the challenge.
The challenge is called TREC-COVID and aims to develop and evaluate methods to optimize search engines for the current and rapidly expanding number of scientific papers about Covid-19 and related topics. The challenge is being organized by a group of IR researchers from the Allen Institute for Artificial Intelligence (AI2), the National Institute of Standards and Technology (NIST), the National Library of Medicine (NLM), Oregon Health and Science University (OHSU), and the University of Texas Health Science Center at Houston (UTHealth). A press release and official Web site for the project have been posted. DMICE Chair William Hersh, MD is also maintaining a page about the project.
TREC-COVID applies well-known IR evaluation methods from the NIST Text Retrieval Conference (TREC), an annual challenge evaluation that evaluates retrieval methods with data from news sources, Web sites, social media, and biomedical publications. In an IR challenge evaluation, there is typically a collection of documents or other content, a set of topics based on real-world information needs, and relevance assessments to determine which documents are relevant to each topic. Different research teams submit runs of the topics over the collection from their own search systems, from which metrics derived from recall and precision are calculated using the relevance judgments.
The document collection for TREC-COVID comes from AI2, which has created the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles about COVID-19 and other coronaviruses. CORD-19 is updated weekly, although fixed versions will be used for each round of TREC-COVID. It includes not only articles published in journals but also those posted on preprint servers, including bioRxiv, medRxiv, and others. A preprint about the dataset and an article describing it also mention OHSU.