About the Data
Where does the Data Come from?
N3C does not recruit participants; instead, it receives de-identified harmonized EHR data from 83 Data Transfer Agreements and >230 sites. These sites provide data on individuals tested for COVID-19 or exhibiting related symptoms.
N3C Cohort Definition (Phenotype)
N3C identifies patients and controls by establishing a common COVID-19 phenotype that will define the data pull for the limited data set.
The latest COVID-19 phenotype documentation identifies lab-confirmed, suspected, and possible cases of COVID-19 and matches them to controls based on demographic factors in a 1:2 ratio (cases:control).
For more detailed information, you can view the latest phenotype on the Github Wiki.
The Phenotype Explorer is a tool within the N3C enclave designed to help researchers interactively explore and visualize COVID-19 phenotype definitions and criteria.
Data Format
N3C ingests data from formats like PCORnet, ACT, OMOP, and TriNetX, harmonizing them into the OMOP 5.3.1 standard CDM for comprehensive analytics. The OMOP CDM standardizes the organization of clinical EHR data, allowing integration from various sources. For more details on data harmonization, visit here. Learn more about OMOP Vocabulary here (1) (2).
The central table in the OMOP vocabulary system is the table, concept
.
Available Data
The data contains real world data from patients who were tested for COVID-19 or whose symptoms are consistent with COVID-19. It also contains data from individuals infected with pathogens such as SARS 1, MERS and H1N1, which can support comparative studies.
Data is focused only on retrospective electronic health record data. Specific variables available may vary depending on the contributing institutions.
The Data Dictionary catalogues available data in N3C based on OMOP Common Data Model Specifications,
List of Available Data
Demographic Information:
Age
Gender
Race/Ethnicity
Geographic location
Social determinants of Health
Clinical Diagnoses and Conditions:
COVID-19 diagnosis (e.g., PCR test results, ICD-10 codes)
Comorbidities (e.g., diabetes, hypertension)
Other medical conditions (e.g., cardiovascular diseases, respiratory diseases)
Laboratory Results:
Blood tests (e.g., complete blood count, metabolic panel)
Biomarkers (e.g., inflammatory markers, D-dimer)
Viral load measurements
Vital Signs and Physiological Measurements:
Blood pressure
Heart rate
Respiratory rate
Body temperature
Medication and Treatment Data:
Prescription medications
Dosages and frequencies
Treatment protocols for COVID-19 and other conditions
Procedures and Interventions:
Surgeries
Medical procedures (e.g., intubation, ventilation)
Therapeutic interventions (e.g., oxygen therapy, antiviral treatment)
Clinical Outcomes:
Admission, Transfer, Discharge
Hospitalizations
Intensive care unit (ICU) admission
Mortality
Long COVID Clinic Visits
Longitudinal Data:
Time-stamped records of clinical events and measurements, allowing for longitudinal analyses and outcome assessments over time.
Clinician Free-Text Notes
Natural Language Processing, NLP, derived concepts are applied to clinical notes.
CMS, Center for Medicare and Medicaid
Data Levels
Three levels of data are available for analysis. You will request access to a data level for each project.
Level | Data Description | Eligible Users | Access Requirements | Appropriate Projects (e.g.) |
---|---|---|---|---|
Level 3 Limited Data Set (LDS) | Patient data that retain the following protected health information
Zip codes are truncated to the first 3 digits, and | Researchers from U.S.-based institutions |
| Example: Studies considering absolute timing, such as determining if a patient's primary COVID-19 infection occurred during the Delta wave. |
Level 2 De-identified Data Set | Patient data from the LDS with the following changes:
| Researchers from U.S.-based and foreign institutions. |
| Example: An analysis of Comorbidity Patterns in COVID-19 Patients by examining general trends and demographic factors. |
Synthetic Data Set (available in Education Enclave) | Data that are computationally derived from the LDS that resemble patient information statistically in Level 3 Data but are not actual patient data. | Citizen scientists and researchers from U.S.-based and foreign institutions. |
| Not available for research, but is used for data science training in the Education Enclave. |
PPRL Data Privacy-Preserving Record Linkage Data Set | Restricted external datasets that have been linked to N3C Data using Privacy Record Linkage. Example: RECOVER Data Guide; CMS; Mortality Evidence | Researchers from U.S.-based institutions | Special procedures for gaining access to PPRL data as part of a Level 3 access request. | Example: Analyze the impact of COVID-19 on healthcare utilization among Medicaid and Medicare patients using CMS Medicaid data alongside EHR data. |
Extremal Data Sets Can be imported into | Publicly available data (e.g., U.S. Census and regional data) for use alongside EHR data. Example: |
| No special requirements. Users can request ingestion of additional external datasets here. | Example: Analyze the correlation between socio-economic factors from U.S. Census data and COVID-19 health outcomes using de-identified EHR data. |
Dashboards
Visit the N3C Dashboard, which provides detailed visualizations and insights into COVID-19 patient data, including demographics, mortality, comorbidities, medication usage, and regional distribution. It also features tools for exploring institutional collaborations, data contributions, and publications resulting from N3C research. Some helpful dashboards include:
Related Enclave Applications
Data Catalog
Browse the most commonly accessed data tables, such as Level 1 and Level 2 data, notional data for learning, and commonly used external data sets.
Once your project workspace is created, the dataset (from the data catalogue) requested in your DUR will be linked to your project workspace for dataset creation and analysis.
→ Request External Datasets
Publicly available datasets (PADs) can be recommended for ingestion into the N3C Data Enclave. To ensure security and privacy, N3C and NCATS have established a formal policy for incorporating external data. After submitting a Request for Use and Access form, datasets will be evaluated, and if approved, made accessible to researchers.
More information:
More Resources: