The vast majority of clinical trials fail to meet their patient recruitment goal. NIH has estimated that 80% of clinical trials fail to meet their patient recruitment timeline and, more critically, many (or most) fail to recruit the minimum number of patients to power the study as originally anticipated. Efficient patient trial recruitment is thus one of the major barriers to medical research, both delaying trials and forcing others to terminate entirely.
Historically, clinical trial recruitment was driven by the trial coordinators (e.g., direct contact with clinical specialists or searching the electronic health record for eligible patients), but recently it has become increasingly common for patients to directly search for trials for enrollment (oftentimes in consultation with their clinician). The 2023 TREC Clinical Trials track simulates this scenario. Instead of using synthetic patient cases as in the 2021 and 2022 tracks, the 2023 track uses a simulated "questionnaire" that the patient or their clinician would fill out in order to identify eligible clinical trials. The track will use several high-level disorder questionnaire templates (e.g., glaucoma, COPD, anxiety), with each template having 5-12 fields customized to that disorder (e.g., the type 2 diabetes template has fields for HbA1c, glucose, BMI, insulin, etc.). For each template, there will be several topics representating synthetic patients with that condition but different values for the fields.
Participants of the track will be challenged with retrieving clinical trials from ClinicalTrials.gov, a required registry for clinical trials in the United States. Clinical trial descriptions can be quite long, but the core aspect of the trial description are the inclusion/exclusion criteria. These are not all-inclusive statements about the trial to the point that other trial information can be ignored, but they are key aspects to defining trial eligibility. The evaluation will further be broken down into eligible, excludes, and not relevant to allow retrieval methods to distinguish between patients that do not have sufficient information to qualify for the trial (not relevant) and those that are explicitly excluded (excludes).
Date | Note |
---|---|
10 May 2023 | Document collection available for download (not the same as the 2021-2022 collection) |
11 May 2023 | Draft Topic Template available |
31 May 2023 | Applications for participation in TREC 2023 due (contact organizers thereafter) |
mid June 2023 | Topics available for download along with final Topic Templates |
28 August 2023 | Submission deadline |
October 2023 | Relevance judgments and individual evaluation scores released |
November 15–17, 2023 | TREC 2023 conference at NIST in Gaithersburg, MD, USA (maybe, could be virtual) |
Clinical Trials: A May 8, 2023 snapshot of ClinicalTrials.gov will be used as the corpus. You can download those files below, grouped into batch files by trial ID.
The files are formatted using the ClinicalTrials.gov XML schema.
The topics for the track consist of synthetic patient descriptions based on questionnaire templates. We currently have draft topic templates for eight different disorders. In the actual topics, none of the fields is required (i.e., they may be left blank) and there is no guaranteed format for the provided responses (i.e., each field is natural language, not structured). In essense, this simulates a patient or clinician filling out the questionnaire and they can leave out any information that they do not have available. Obviously, the more information provided about a patient and the more consistent the format in which it is provided, the better they can be matched. It is reasonable and expected in a clinical environment, however, that not all possible information will be available, nor is it worth the time and cost to acquire. See the example topics below for how a template is instantiated as a patient-specific topic
< Glaucoma Template > | Patient1 | Patient2 |
---|---|---|
diagnosis | POAG | uveitic glaucoma |
intraocular pressure | 19 mmHg | 22 mmHg |
visual field | advanced damage | |
visual acuity | 20/80 | 20/200 |
prior cataract surgery | no | no |
prior LASIK surgery | no | no |
comorbid ocular diseases | uveitis |
< COVID-19 Template > | Patient3 | Patient4 |
---|---|---|
diagnosis | PCR-confirmed | never |
symptoms | fever, cough, headache, fatigue | |
hospitalization | yes | no |
ventilation | no | no |
vaccination status | unvaccinated | fully vaccinated |
oxygen saturation | 92% | |
comorbid respiratory diseases | asthma |
<topics task="2023 TREC Clinical Trials"> <topic number="-1" template="glaucoma"> <field name="diagnosis">POAG</field> <field name="intraocular pressure">19 mmHg</field> <field name="visual field"></field> <field name="visual acuity">20/80</field> <field name="prior cataract surgery">no</field> <field name="prior LASIK surgery">no</field> <field name="comorbid ocular diseases"></field> </topic> </topics>
The 2021 and 2022 topics had a very different structure (free-text narratives along the lines of a case report or electronic health record note), but could perhaps be useful to participants.
The evaluation will follow standard TREC evaluation procedures for ad hoc retrieval tasks. Participants may submit a maximum of five automatic or manual runs, each consisting of a ranked list of up to one thousand IDs (NCT IDs provided by ClinicalTrials.gov). The highest ranked results for each topic will be pooled and judged by physicians trained in medical informatics. Assessors will be instructed to judge trials as either eligible (patient meets inclusion criteria and exclusion criteria do not apply), excluded (patient meets inclusion criteria, but is excluded on the grounds of the trial's exclusion criteria), or not relevant. Because we plan to use a graded relevance scale, the performance of the retrieval submissions will be measured using normalized discounted cumulative gain (NDCG).
The tentative submission deadline will be sometime around August 2023 (follow the mailing list for updates).
The format for run submissions follows the standard trec_eval format. Each line of the submission file should follow the form:
where TOPIC_NO is the topic number (1–30), 0 is a required but ignored constant, ID is the identifier of the retrieved document (PMID or NCT ID), RANK is the rank (1–1000) of the retrieved document, SCORE is a floating point value representing the confidence score of the document, and RUN_NAME is an identifier for the run. The RUN_NAME is limited to 12 alphanumeric characters (no punctuation).
The file is assumed to be sorted numerically by TOPIC_NO, and SCORE is assumed to be greater for documents that should be retrieved first. For example, the following would be a valid line of a run submission file:
The above line indicates that the run named "my-run" retrieves for topic number 1 document NCT00760162 at rank 1 with a score of 0.9999.