OHSU Informatics Researchers Team with Mayo Clinic and UTHealth on New Grant to Advance Patient Cohort Discovery

Many clinical research studies fail to reach patient enrollment goals, leading to their not enrolling sufficient numbers to meet study goals or achieve adequate statistical power. One way to increase enrollment is identify individuals who might be candidates for such studies by processing the data in their electronic health record (EHR). This research problem has driven the work of two Department of Medical Informatics & Clinical Epidemiology (DMICE) faculty – Steven Bedrick, PhD, Associate Professor and William Hersh, MD, Professor – who have been collaborating with colleagues from Mayo Clinic (Hongfang Liu, PhD) and University of Texas Houston Health Science Center at Houston (Kirk Roberts, PhD).

These researchers were recently awarded a 5-year, $3 million grant from the National Library of Medicine (NLM) to develop and evaluate new methods to identify patient cohorts for clinical research studies based on patient data in the EHR. The new grant builds on their previous work and adds a new dimension to make their methods more generalizable across institutions by adhering to the data being in a common data model. While actual patient data will not leave the premises of the participating institutions, each will maintain their own data in the Observational Medical Outcomes Partnership (OMOP) Common Data Model so that algorithms can be developed and trained in a more generalizable manner.

Once the foundational systems and OMOP-formatted data are in place, each site will use common information queries and evaluate the output of their systems internally. Different methods, including those applying machine learning, will be applied across the different sites and compared for their efficacy. Other sites will be able to implement, train, and use these models at their own sites.

This work started a decade ago, when Dr. Hersh and others used a small set de-identified records in the Text Retrieval Conference (TREC) challenge evaluation sponsored by the National Institute for Standards and Technology (NIST).(1) This led to the initial NLM grant, which increased the size of the collection and resulted in publications from Mayo Clinic(2) and Oregon Health & Science University.(3)

Drs. Roberts and Hersh continue to lead other TREC challenge evaluations, such as this year’s TREC 2021 Clinical Trials Track that uses patient data for a clinical trials search task. In this effort, the search topics are (synthetic) patient descriptions and the retrieval corpus is a large set of clinical trial descriptions from ClinicalTrials.Gov.


1. Voorhees EM, Hersh W. Overview of the TREC 2012 Medical Records Track. In: The Twenty-First Text REtrieval Conference (TREC 2012) Proceedings. 2012.
2. Wang Y, Wen A, Liu S, Hersh W, Bedrick S, Liu H. Test collections for electronic health record-based clinical information retrieval. JAMIA Open. 2019 Oct 1;2(3):360–8.
3. Chamberlin SR, Bedrick SD, Cohen AM, Wang Y, Wen A, Liu S, et al. Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task. JAMIA Open. 2020 Oct;3(3):395–404.