The TB Portals Program actively collects international TB patient case data, including clinical, imaging, and bacterial genomic information, from both drug-sensitive and resistant cases.

These de-identified data across domains are linked together as cases in our database and publicly available for viewing and analysis. These data can be explored through any of the TB Portals tools.

Types of Data

Clinical Data

Patient case records include de-identified clinical, social, and behavioral data. Examples of clinical information available:

Sex
Registration Date
Age of Onset
Case Definition
Diagnosis Code
DST Profile
Localization
Comorbidity
Outcome
Treatment Start/End Dates
Regimen Drugs
Regimen Start/End Dates

Genomics

Mtb genomic information is linked to its respective patient case. Genomic sequences collected by the TB Portals Program are deposited in SRA. Examples of genomic information available:

Whole genome sequences
Spoligotype
Lineage
Drug-resistance conferring mutations

Images

Chest X-ray and CT images in DICOM and JPG format, as well as radiologist and AI-generated annotation data, are linked with TB patient cases. Examples of imaging information available:

Radiologist report
Affect Level
Affect Pleura
Affected Segments
Dissemination
Limfoadenopatia
Lung Capacity Decrease
Lung Cavity Size
Plevritis
Cardiomegaly
Consolidation
Fibrosis
Hilar Lymphadenopathy
Nodule
Pleural Effusion
Qure TB diagnosis prediction

Sources and Data Collection

The TB Portals contain de-identified data from tuberculosis (TB) patient cases that have been contributed by multiple institutions from different clinical and research contexts. They have been collected as part of routine practice in TB clinics, research studies, and clinical trials. Some of the data originate from historical records, while others are being collected prospectively. There is no single identifiable data collection protocol that is uniformly enforced. Therefore, TB Portals data are structured and should be regarded as a natural history study, not an epidemiological study.

TB Portals Consortium cases

Many of the cases in the TB Portals have been contributed by the TB Portals Consortium. These cases are chosen based on the scientific interest of the TB Portals Consortium member, usually with a heavy selection focus on multi- and extensively- drug resistant TB. In these cases, information collection begins at the time of first diagnosis, collecting both prospective and available retrospective data. Imaging is conducted during the course of treatment at regular intervals. Genomic samples from select cases are likewise collected at regular intervals. Standardization of these data is enforced using a central data entry form, and all of the clinical data from TB Portals Consortium cases have been curated and validated as accurate by physicians at the time of data entry.

External cases

The TB Portals also contain patient cases that have been submitted externally, and Application Program Interfaces (APIs) were created to enforce data standardization. As with TB Portals Consortium contributed data, the case selection follows each study or data contributor’s specific protocol.

Standards

All data within the TB Portals follows the HL7 Fast Healthcare Interoperability Resources (HL7 FHIR) standard. We utilize a uniform data dictionary with generally accepted medical terminology and data field values. This central data dictionary enables users to explore the data and conduct analyses across all data sources using TB Portals data descriptors and fields. While this standardization reconciles different naming conventions, variability in case selection and data collection methods remain a feature of the TB Portals dataset.