2015 TREC Clinical Decision Support Track

2015 Clinical Decision Support Track

Similar to the 2014 track, the focus of the 2015 Clinical Decision Support Track will be the retrieval of biomedical articles relevant for answering generic clinical questions about medical records.

We will be using short case reports, such as those published in biomedical articles, as idealized representations of actual medical records. A case report typically describes a challenging medical case, and it is often organized as a well-formed narrative summarizing the portions of a patient's medical record that are pertinent to the case.

Participants of the track will be challenged with retrieving for a given case report full-text biomedical articles that answer questions related to several types of clinical information needs. Each topic will consist of a case report and one of three generic clinical question types, such as "What is the patient's diagnosis?" Retrieved articles will be judged relevant if they provide information of the specified type that is pertinent to the given case. The evaluation of submissions will follow standard TREC evaluation procedures.

Tentative Schedule

Date	Note
February, 2015	Document collection available for download.
March, 2015	Applications for participation in TREC 2015 due.
April, 2015	Topics available for download.
28 July, 2015	Task A submission deadline.
30 July, 2015	Task B submission deadline.
October, 2015	Relevance judgments and individual evaluation scores released.
November 17–20, 2015	TREC 2015 conference at NIST in Gaithersburg, MD, USA.

Task Description

Documents

The target document collection for the track is the Open Access Subset of PubMed Central (PMC). PMC is an online digital database of freely available full-text biomedical literature. Because documents are constantly being added to PMC, to ensure the consistency of the collection, for the 2014 task we obtained a snapshot of the open access subset on January 21, 2014, which contained a total of 733,138 articles. For continuity with last year's track, we will be using the same snapshot.

While out of date (over 18 months old by the submission deadline), this allows for better utilization of the 2014 topics and judgements. The full text of each article in the open access subset is represented as an NXML file (XML encoded using the NLM Journal Archiving and Interchange Tag Library), and images and other supplemental materials are also available.

Document Identifiers

Each article in the collection is identified by a unique number (PMCID) that will be used for run submissions. The PMCID is specified by the <article-id> element within each article's NXML file. Please note that although each article is represented by multiple identifiers (e.g., PubMed, PMC, Publisher, etc.), we are only concerned with PMCIDs for this task. The various identifier types are specified using the pub-id-type attribute of the <article-id> element. Valid values of pub-id-type that indicate a PMCID include pmc and pmcid.

For example, the PMCID of article 3148967 may be specified in the article's NXML file as follows.

<article-id pub-id-type="pmc">3148967</article-id>

To make processing the documents easier, we have also renamed each article NXML according to the article's PMCID. For example, the document for article 3148967 is named 3148967.nxml.

Obtaining the Collection

The document collection may be obtained in one of two ways. For participants who are only interested in indexing the text of the articles in the collection (most participants), we have prepared 4 bundles containing all 733,138 articles in the January 21, 2014 snapshot, which can be downloaded from the links below.

Each of the 4 files listed above is around 2–3 GB in size. The article NXMLs in each archive are split into multiple directories to allow for easy directory listings. Please note that the directory structure was created merely as a convenience and is not meant to convey any information about the articles.

Participants wishing to utilize additional media other than text, such as the images and videos included in the articles, can download the full document bundles directly from the PMC Open Access FTP Service. However, be aware that the size of the full collection is around 2 TB with the additional media and takes several days to completely download.

We have prepared a simple python script for participants wishing to obtain the full collection. The script downloads only the articles present in the January 21, 2014 snapshot and can be obtained from the links below.

The script should work with most recent versions of python and has been tested with versions 2.6, 2.7, and 3.3 on Linux, OS X, and Windows. Please let the organizers know if you encounter any trouble using it. Participants can use the above script to download the entire collection, including images and videos, by executing the following shell command.

$ python download.py -i file_list.txt.gz -o TREC-CDS

file_list.txt.gz is a compressed list of the article archives included in the January 21, 2014 snapshot and TREC-CDS is the local directory where the collection will be downloaded. For additional usage information, please enter the following.

$ python download.py --help

Interested participants are free to devise their own method for obtaining the full collection. However, please note that the articles listed in file_list.txt.gz constitute the definitive collection for the track. Because articles are added to the PMC Open Access Subset every day, you will still want to use file_list.txt.gz in order to restrict the downloaded files to those present in the January 21, 2014 snapshot.

Downloading the additional media associated with the full-text articles is entirely optional. None of the topics will require this information. However, we are providing this option for participants who may be interested in analyzing the medical images included in many of the articles as part of their retrieval strategies.

Alternatively, for those who do not wish to parse the NXML, a recent text file snapshot of the Open Access Subset has been made available from PMC. Note, since this snapshot was taken at a different time, their text file document collection likely has many more articles than the official PMC snapshot for the task. Additionally, it is likely missing a few documents from the official collection. Any article that is not in the official PMC snaphot will not be evaluated, so use with care. We only mention this alternate collection because last year several participants were concerned with the difficulties of parsing the PMC NXML. The organizers have never used this alternate collection, and thus cannot vouch for it in any way. Update: If you intend to use the text-only files, we have made a list of the valid PMCIDs (pmcid.list.txt), a mapping file between the latest text-only snapshot and the valid PMCIDs (pmcid.map.1.txt, pmcid.map.2.txt), as well as a map of the articles that appear to be missing (pmcid.missing.map.txt). Again, you use these at your own risk. We believe that given the latest snapshot and proper use of the above files, the vast majority of the valid articles will be available, yet we cannot guarantee this.

Duplicate PMCIDs

We have been made aware of the existence of duplicated documents in the collection. While the duplicates will likely not impact retrieval results, we are not going to consider them when conducting the relevance assessment. Participating groups have provided lists of the files that will not be judged, as they are duplicates of other files remaining in the collection. The lists of files can be obtained from the link below.

Topics

The topics for the track are medical case narratives created by expert topic developers that will serve as idealized representations of actual medical records. The case narratives describe information such as a patient's medical history, the patient's current symptoms, tests performed by a physician to diagnose the patient's condition, the patient's eventual diagnosis, and finally, the steps taken by a physician to treat the patient.

There are many clinically relevant questions that can be asked of a given case narrative. In order to simulate the actual information needs of physicians, the topics are annotated according to the three most common generic clinical question types (Ely et al., 2000) shown in the table below. Participants will be tasked with retrieving biomedical articles useful for answering generic questions of the specified type about each case report.

Type	Generic Clinical Question	Number of Topics
Diagnosis	What is the patient's diagnosis?	10
Test	What tests should the patient receive?	10
Treatment	How should the patient be treated?	10

For example, for a case report labeled "diagnosis" participants should retrieve PMC articles a physician would find useful for determining the diagnosis of the patient described in the report. Similarly, for a case report labeled "treatment," participants should retrieve articles that suggest to a physician the best treatment plan for the condition exhibited by the patient described in the report. Finally, for "test" case reports participants should retrieve articles that suggest relevant interventions that a physician might undertake in diagnosing the patient.

In addition to annotating the topics according to the type of clinical information required, we are also providing two versions of the case narratives. The topic "descriptions" contain a complete account of the patients' visits, including details such as their vital statistics, drug dosages, etc., whereas the topic "summaries" are simplified versions of the narratives that contain less irrelevant information. A topic's description and its summary are functionally equivalent: the set of relevant documents is identical for each version. However, we are providing the summary versions for participants who are not interested in or equipped for processing the detailed descriptions.

In order to make the results of the track more meaningful, we require that participants use only all topic descriptions or only all topic summaries for any given run submission. Participants are, of course, free to submit multiple runs so that they can experiment with the different representations. Participants will be required to indicate on the run submission form which version of the topics they used.

The table below shows examples of the kind of case-based topics we will be using for the track. The PMCIDs listed in the last column are relevant for the given cases because they can assist a physician in determining the patient's diagnosis or treatment.

No.	Type	Summary	Relevant Articles
1.	Diagnosis	Description: A 26-year-old obese woman with a history of bipolar disorder complains that her recent struggles with her weight and eating have caused her to feel depressed. She states that she has recently had difficulty sleeping and feels excessively anxious and agitated. She also states that she has had thoughts of suicide. She often finds herself fidgety and unable to sit still for extended periods of time. Her family tells her that she is increasingly irritable. Her current medications include lithium carbonate and zolpidem. Summary: 26-year-old obese woman with bipolar disorder, on zolpidem and lithium, with recent difficulty sleeping, agitation, suicidal ideation, and irritability.	1087494 1434505 2031887
2.	Treatment	Description: A 21-year-old female is evaluated for progressive arthralgias and malaise. On examination she is found to have alopecia, a rash mainly distributed on the bridge of her nose and her cheeks, a delicate non-palpable purpura on her calves, and swelling and tenderness of her wrists and ankles. Her lab shows normocytic anemia, thrombocytopenia, a 4/4 positive ANA and anti-dsDNA. Her urine is positive for protein and RBC casts. Summary: 21-year-old female with progressive arthralgias, fatigue, and butterfly-shaped facial rash. Labs are significant for positive ANA and anti-double-stranded DNA, as well as proteinuria and RBC casts.	1065341 1459118 1526641

Additionally, based on participant feedback, we will conduct two rounds of evaluation (Tasks A & B). Teams may participate in either or both tasks. Task A will be identical to the 2014 track. In Task B, participants will be provided with a diagnosis field for the treatment and test topics. This field will be free text much like the description and summary. For instance, the diagnosis of the treatment topic above is "lupus". The diagnosis is not guaranteed to be stated in the description or summary for treatment and test cases, which is consistent with how physicians write cases in practice. Presumably, providing the diagnosis may improve retrieval systems by (a) providing additional relevant information if the diagnosis is not stated in the case, or (b) emphasizing a key piece of information in the case if the diagnosis is stated. Note that in some cases more than one diagnosis is possible. Further, these will only be made available after the deadline for Task A, ensuring participants in the first task can only utilize the description or summary.

Obtaining the Topics

The topics for Tasks A and B are provided below.

Topic numbers are specified using the number attribute of each <topic> element and topic types (i.e., diagnosis, test, and treatment) are specified with the type attribute. Topic descriptions are given in <description> elements and topic summaries are given in <summary> elements. Below is an example of the format.

<topics> <topic number="1" type="test"> <description>Description of topic 1</description> <summary>Summary of topic 1</summary> <diagnosis>Diagnosis or List of Diagnoses</diagnosis> </topic> ... </topics>

Sample Topics

Participants are free to use last year's topics and judgements in designing their systems:

Note these topics do not have a <diagnosis> field, as that is a new feature of this year's track.

Additionally, the 2013 ImageCLEF medical task utilized similar cases:

case-based-topics-imageclef-2013.xml

Please take extreme caution when using the ImageCLEF topics. They are provided here only for reference. We have reformatted them somewhat to match this track's topic format, but differences remain. In particular, the ImageCLEF topics only contain the shorter <summary> tags, and all the topics should be considered to be of type diagnosis.

Evaluation

The evaluation will follow standard TREC evaluation procedures for ad hoc retrieval tasks. Participants may submit a maximum of five automatic or manual runs, each consisting of a ranked list of up to one thousand PMCIDs. The highest ranked articles for each topic will be pooled and judged by medical librarians and physicians trained in medical informatics. Assessors will be instructed to judge articles as either "definitely relevant" for answering questions of the specified type about the given case report, "definitely not relevant," or "potentially relevant." The latter judgement may be used if an article is not immediately informative on its own, but the assessor believes it may be relevant in the context of a broader literature review. Because we plan to use a graded relevance scale, the performance of the retrieval submissions will be measured using normalized discounted cumulative gain (NDCG).

As in past evaluations of medically-oriented TREC tracks, we are fortunate to have the assessment conducted by the Department of Medical Informatics of the Oregon Health and Science University (OHSU). We are extremely grateful for their participation.

Submission Instructions

The submission deadline is currently projected to be 28 July 2015 (Task A) and 30 July 2015 (Task B).

Submission File Format

The format for run submssions is standard trec_eval format. Each line of the submission file should follow the form:

TOPIC_NO Q0 PMCID RANK SCORE RUN_NAME

where TOPIC_NO is the topic number (1–30), 0 is a required but ignored constant, PMCID is the PubMed Central identifier of the retrieved document, RANK is the rank (1–1000) of the retrieved document, SCORE is a floating point value reprenting the similarity score of the document, and RUN_NAME is an identifier for the run. The RUN_NAME is limited to 12 alphanumeric characters (no punctuation). The file is assumed to be sorted numerically by TOPIC_NO, and SCORE is assumed to be greater for docments that should be retrieved first. For example, the following would be a valid line of a run submission file:

1 Q0 3148967 1 0.9999 my-run

The above line indicates that the run named "my-run" retrieves for topic number 1 document 3148967 at rank 1 with a score of 0.9999.

TREC Precision Medicine / Clinical Decision Support Track