2020 TREC Precision Medicine Track

2020 Precision Medicine Track

The 2020 track focuses on an important use case in precision medicine for clinical decision support: providing useful precision medicine-related evidence to clinicians treating cancer patients. The 2020 track builds on the prior tracks (2017, 2018, 2019), with a particular focus on identifying high-quality evidence for a specific cancer treatment.

As with the 2017-2019 tracks, we will be using synthetic cases created with the help of precision oncologists at the University of Texas MD Anderson Cancer Center. Each case will describe the patient's disease (type of cancer), the relevant genetic variants (which genes are mutated), and the proposed treatment. The cases are semi-structured and require minimal natural language processing.

Participants of the track will be challenged with retrieving biomedical articles, in the form of article abstracts (largely from MEDLINE/PubMed), specifically focusing on articles that provide strong evidence for/against the treatment in the specific population (unlike in prior years, there will be no clinical trials task).

New this year is the addition of treatments to the topics. Previously, the whole goal was to retrieve treatments (at least from the literature articles). Now, the focus is instead to identify critical evidence for or against the treatment in the specific population (type of cancer, genetic mutation(s), demographics). The demographics field has also been removed. This means that strong evidence for the treatment (whether positive or negative) should be ranked over weaker evidence. There are often many treatments for a particular type of cancer and particular genetic mutations, so a useful clinical decision support tool will help oncologists narrow the treatment decision to the one most likely to help the patient. This is why strong negative evidence is important: it helps eliminate the treatment so that a more efficacious treatment can be chosen instead. The idea is to provide oncologists with the evidence that best helps them make a decision when evaluating competing alternatives.

From a historical perspective, this is similar to classic "PICO" studies, where the problem/population (P) is the cancer and its mutations, the intervention (I) is the treatment, the comparison (C) is an alternative treatment, which would come from a different search query, and the outcome (O) is generally survival for cancer patients, but other outcomes are possible (e.g., quality of life). We will provide some guidelines on what constitutes evidence quality in the coming weeks.

Tentative Schedule

This schedule has been constructed so as to avoid another major medical IR task, TREC-COVID.

Date	Note
May 2020	Document collection available for download.
May 2020	Applications for participation in TREC 2020 due.
June 2020	Topics available for download.
August 27, 2020	Submission deadline
October/November 2020	Relevance judgments and individual evaluation scores released.
November 18–20, 2020	TREC 2020 conference at NIST in Gaithersburg, MD, USA (maybe)

Task Description

Documents

Scientific Abstracts: The MEDLINE 2019 baseline will be used for the scientific abstracts. The 2019 baseline is a snapshot (roughly mid-December 2018) of PubMed abstracts. This is the same corpus as last year.

2019 MEDLINE Baseline on Google Drive

Topics

The topics for the track consist of synthetic patient cases created in cooperation with MD Anderson precision oncologists. The topics consist of the disease, genetic variants, and the proposed treatment. For example:

	Patient1	Patient2
Disease:	melanoma	melanoma
Variant:	BRAF (V600E)	BRAF (V600E)
Treatment:	Dabrafenib	Cobimetinib

Obtaining the Topics

The topics will be provided below once available:

topics2020.xml

The topics are formatted in XML:

<topics task="2020 TREC Precision Medicine"> <topic number="1"> <disease>melanoma</disease> <gene>BRAF (V600E)</gene> <treatment>Dabrafenib</treatment> </topic> ... </topics>

Additionally, the 2017-2019 topics might be useful:

Evaluation

The evaluation will follow standard TREC evaluation procedures for ad hoc retrieval tasks. Participants may submit a maximum of five automatic or manual runs, each consisting of a ranked list of up to one thousand IDs (PMIDs for MEDLINE abstracts and NCT IDs for trials). The highest ranked results for each topic will be pooled and judged by physicians trained in medical informatics.

Assessors will be instructed to judge abstracts according to each of the three topic dimensions (disease, gene, treatment). Each of these corresponds to 3-4 categories (e.g., a disease can be an "exact", "more general", "more specific", or "not disease" match). Please read the Relevance Guidelines for more details.

Scientific Abstracts: The goal of retrieving scientific abstracts is to identify evidence for the treatment in the population, where stronger evidence should be ranked over weaker evidence. In medicine, strong evidence can be either for or against the use of that treatment in the population.

As in past evaluations of medically-oriented TREC tracks, we are fortunate to have the assessment conducted by the Department of Medical Informatics of the Oregon Health and Science University (OHSU). We are extremely grateful for their participation.

Submission Instructions

The tentative submission deadline will be August 27, 2020.

Submission File Format

The format for run submssions follows the standard trec_eval format. Each line of the submission file should follow the form:

TOPIC_NO Q0 ID RANK SCORE RUN_NAME

where TOPIC_NO is the topic number (1–30), 0 is a required but ignored constant, ID is the identifier of the retrieved document (PMID or NCT ID), RANK is the rank (1–1000) of the retrieved document, SCORE is a floating point value reprenting the similarity score of the document, and RUN_NAME is an identifier for the run. The RUN_NAME is limited to 12 alphanumeric characters (no punctuation).

The file is assumed to be sorted numerically by TOPIC_NO, and SCORE is assumed to be greater for docments that should be retrieved first. For example, the following would be a valid line of a run submission file:

PubMed Abstracts:

1 Q0 28348404 1 0.9999 my-run

The above line indicates that the run named "my-run" retrieves for topic number 1 document 28348404 at rank 1 with a score of 0.9999.

TREC Precision Medicine / Clinical Decision Support Track