Learn more about the Data
About the Data
The All of Us Research Program offers a structured, tiered-data access model to accommodate various levels of data sensitivity and user requirements. This model is designed to promote inclusivity and transparency while safeguarding participant privacy and ensuring data security. For more details, please read the Precision Medicine Initiative Data Security Policy Principles and Framework and the Privacy and Trust Principles.
Read more about the All of Us Program and how data was collected.
Data Sources
All of us integrates data from various sources including surveys, electronic health records (EHRs), bio samples, physical measurements, and wearables like Fitbit. More about data that is included in All of Us Data Sets here.
View the data roadmap for more information on data that has been included and data that is planning to be included.
Data Curation Process
The All of Us Research Program uses the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) to standardize EHR data for all researchers. Read more about the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and more about the Data Curation Process (1) (2)
Here are some tutorials to understand OHDSI Standardized Vocabularies.
Data Dictionary
The All of Us Data Dictionary details participant data, privacy modifications, and specifies whether fields are standard OMOP or custom to All of Us. It also outlines data cleaning methods, lists custom concept IDs, and tracks changes via versioning.
Explore the Registered Tier Data Dictionary and Controlled Tier Data Dictionary.
For a searchable database of available concepts with metadata, please visit the ODHSI Athena tool at Athena.
OMOP
The OMOP Common Data Model (CDM) is essential for harmonizing and standardizing the health data collected from diverse sources. In the All of Us Research Hub, the OMOP model provides a structured format for integrating data across different healthcare systems by standardizing data from raw clinical data (e.g., from electronic health records (EHR), surveys, wearables, and genomics) into a standardized format within a common vocabulary system that allows for consistent querying and analysis across diverse datasets.
Learn more about how the data is structured using the OMOP CDM:
Curated Data Sets
All of Us Research Program data in its final format, after harmonization and refinement, are referred to as a curated dataset.
Restriction on Identifiable Data: Directly identifiable information, such as names, contact details, participant IDs, IP addresses, and raw medical records with potential identifiers, is not released in the Consolidated Data Repository (CDR).
Data Encryption and Anonymization: All participant data is encrypted, and obvious identifiers are removed from research data. Identifiable information like names and addresses is kept separate from health information
Independent Security Reviews: External reviewers assess and test the program's security measures regularly, ensuring they are effective against current threats
There are three tiers of data access: Public (no login required); Registered (login required); and Controlled (additional approval required). Learn more about selecting the right data tier for your project. More information about privacy differences between data tiers.
Public Tier | Data Included: |
|
What is it: Anonymized, aggregate-level data that poses negligible risks to the privacy of research participants.
Where to access: Accessible without login into the All of Us Research Hub. These data are available to everyone through Data Snapshots and the Data Browser, an interactive tool on the Research Hub. *Note that counts may differ between Data Snapshots and Data Browser due to the lag time in the curation process.
|
|
Registered Tier | Data Included: |
|
What is it? Includes participant-level data with transformations to protect privacy. Date Transformation: All dates are consistently, shifted backwards for each participant by a random number between 1 to 365 days. How to Access? Approved researchers with a login to the secure Researcher Workbench. More information: Explore the Registered Tier Data Dictionary |
Fields Removed:
Fields Generalized:
|
Controlled Tier | Data included: |
|
What is it? Contains all Registered Tier data plus data elements that may not directly identify individual participants but could increase re-identification risk when combined with other data. How to access? Access is granted to researchers who meet additional requirements on top of a login to the secure Researcher Workbench. More information: Explore Controlled Tier Data Dictionary. |
Fields Included
|
Access Data in the Research Hub.
Create an account. Data is made available on the secure All of Us Research Hub, where researcher activity is monitored. Authorization for access to the registered and controlled data tiers will be user based, rather than project based.
Complete Mandatory Researcher Registration and Training: Researchers must register with the program, complete ethics training, and agree to a responsible data use code of conduct before accessing data.
Read and Agree to the Data User Code of Conduct.
Obtain a Data Passport: Upon account creation, authorized users will receive a “data passport”, a prerequisite for accessing the registered and controlled data tiers and for creating workspaces for research projects.
Create a project workspace for each unique research project.
Submit project descriptions for each project workspace created, which are made public and searchable to support auditing, public engagement, and compliance with privacy and transparency principles.
Notes about USC IRB Approval
The Researcher Workbench employs a data passport model, through which authorized users do not need IRB review to begin a research project. Most authorized users will not be conducting human subjects research with All of Us data for two reasons:
(1) The research will not directly involve participants, only their data.
(2) the data available in the Researcher Workbench has been carefully checked and altered to remove identifying information while preserving its scientific utility. Nevertheless, we encourage anyone using All of Us data to apply the ethical principles of research with human participants to their work.
However, please note that a USC IRB review is required prior to initiating a USC affiliated research project due to the following conditions set by the Human Research Protection Program at USC that a Data Use Agreement (DUA) is needed to access the data set (Not Human Subjects Research Worksheet, p. 6).
Please pursue the following with USC IRB based on the listed conditions:
NHSR* self-determination
Indicated for non research purposes using NHSR data
(1) no intent for research; or intent for conference presentation dependent on reach of conference, AND
(2) Using the following NHSR Data:
All of Us tier 1, tier 2
N3C tier 1, tier 2
Requires NO ACTION on part of researcher.
NOTE: N3C Code of Conduct indicates that data can only be used for research purposes and should be publicly disseminated in some form
*NHSR: Not Human Subjects Research
NHSR Determination
Indicated for non-human subjects research
(1) Any intent for research on NHSR; or intent for conference presentation dependent on reach of conference , AND
(2) Using the following NHSR Data:
All of Us tier 1, tier 2
N3C tier 1, tier 2
ACTION: iStar item 1.1
NOTE: A journal may request proof of an NHSR determination upon reviewal of submission. In this circumstance, a researcher without an NHSR Determination cannot retroactively request one.
IBR Review, exempt category 4 Indicated for Secondary Research uses of Identifiable Private Information or Identifiable Biospecimens.
(1) Having any Research or Public dissemination intent, AND
(2) Conducting human subjects' research using the following data:
All of Us – tier 3 (genomic data) (identifiable biospecimen)
N3C – tier 3 limited data (zip code, treatment dates, identifiable private information, no genomic data)
ACTION:
(1) Submit an Exempt Review in sections 5.1 in iStar.
(2) Use Social Behavioral/Secondary Research protocol template