Tools for Code and Analysis

The applications you use in the enclave will depend on your tasks and prior knowledge. Refer to the following recommendations. More information here.

image-20240718-055110.png
Tools are optimized for different analysis and source types

Analytic and Operations Applications

In the enclave, the left side bar contains “Applications”. From here you can access all applications related to Analytics and Code Operations. Here are some relevant applications for the average N3C researcher:

  • Code Workbook

  • Code Repository

  • Contour

  • Fusion

  • Notebook

  • Reports

image-20240714-014240.png
Find all Applications in the Enclave

Manual Data Entry

Fusion

A spreadsheet-like tool for data analysis, limited to importing up to 2000 rows of an enclave table.

Use for:

  • It is preferable for manual data entry into smaller data sets, such as curating lists of concept sets.

  • It is useful for keeping track of developed concept sets and utilized to easily input them into Logic Liaison Templates.

  • Fusion also allows you to create datasets based on your spreadsheets. You can either sync a w


  • hole sheet to a dataset or select a table range to be synced. After the data is successfully synced to a dataset in Foundry, the data can be imported to any other Enclave application.

More Information:

 

 


Code Based Analysis and Visualization

The Code Workbook and Repository provides tools for discovering, exploring, and analyzing clinical data. Researchers can request additional packages via the N3C Support Desk.

R & Python: Fully supported with pre-installed packages like tidyverse (R) and Pandas, SciKit-Learn (Python). The Code Workbook offers a graphical interface for data analysis and workflow management. Foundry documentation here.

Apache Spark: Spark SQL handles and queries structured data, supporting filtering, joining, and aggregating large datasets. It integrates natively or with R (SparkR) and Python (PySpark). Foundry documentation here.

Code Workbook:

Prepare original analytic datasets from raw OMOP tables. Multiple transformations can be strung together to create an analysis pipeline using SQL, R, Python, or a mix of these.

Use for:

Workbooks allow you to import and transform datasets using available code templates for various purposes:

  • Cleaning and joining raw data from external sources to produce curated datasets.

  • Analyzing processed data to derive useful insights.

  • Training and applying models for predictive analysis. Ex. Investigating the results of a clinical trial by testing out different p-values.

  • Creating parameterized visualizations for reports to share with others.

  • One-time capture of data that is then used in another analytical application.

More Information:

 

Code Repository:

This can also be used to prepare original analytic datasets from raw OMOP tables for the team's statistician. But the repository is best used to share code across multiple Code Workbooks or projects, or to develop a robust production pipeline.

Use for:

  • A daily pipeline at high data scale which requires incremental compute.

  • A high-visibility pipeline with strict governance requiring the ability to revert to previous versions of historical code, or to gate code changes on successful unit tests.

NOTE: There is no restriction on downloading Code that you developed in the Enclave, as long as your code does not contain patient data (raw or derived) embedded in it (e.g. as comments to your code)

 

More information:

 


Use Logic Liasion Code Templates in your Code Workbook

Logic Liaison code templates accelerate N3C analysis by providing commonly used variables and methods to quickly add custom elements. To find these templates, enter “Logic Liaison Templates” into the N3C Knowledge Store search field.

These code templates can added to your Project Workspace or used directly in your Code Workbooks. There are two types of templates:

→ Logic Liaision Facts Templates

are used to generate the base fact tables. Other fact templates utilize the day-level and person-level datasets of the base fact templates to efficiently generate additional derived variables

→ Logic Liasion Quality Control, QC Templates

are used to assess available data, missing or sparse data by site, and overall data quality in the Enclave.

More information:

 




Code-Free Analysis and Dashboards

Contour

Contour offers a point-and-click, programming-free interface for developing data analysis pipelines on tables at scale. These analyses can generate data summaries and visualizations, which can be integrated into interactive, dynamically updating dashboards.

Use For:

  • User-friendly graphical interface for filtering, merging, and modifying datasets. Useful for initial data filtering and preprocessing.

  • Organize complex analyses into analytical paths.

  • Create interactive dashboards to share findings.

  • Produces basic visualizations like histograms and heat maps.

  • Leverage Contour expression language for more advanced transformations and aggregations.

  • Automatically handles Apache Spark DataFrame operations, resulting in tables for further analysis.

  • Save analysis results as a new dataset for use in other Foundry tools. Indicated for one-time capture of data that is then used in another analytical application.


Report Findings

Notepad

Notepad is a tool within the Enclave used to consolidate various research artifacts, such as summary datasets, statistical analyses, and visualizations, into a single coherent document. It allows users to embed formatted tables, charts, and images from multiple sources, add titles and captions, create sections, and provide narrative structure using Markdown, all through a point-and-click interface.

It is used to report results for secure team dissemination within the Enclave environment.

The main difference between Notepad and a Contour Dashboard is that Notepad provides a static report with figures that cannot be dynamically changed by the reader.

More Information:

 

 


Other Applications

Data Lineage (Monocle)

The Data Lineage tool It also provides details on dataset schemas, build dates, and the code that generated them, facilitating build scheduling and verification of data curation methods. The application allows you to

  • find datasets

  • visualize data pipelines on a real time basis

  • assess the origins and relationships of datasets through an intuitive, color-coded interface

More Information:

 

More information: