Research Data Assets & Services

Questions? Contact Academic Research Services (ARS)

Overview

The Research Data Assets and Services program is dedicated to developing enterprise data assets and informatics tools that provide UCSF researchers with easy access to the full breadth and depth of UCSF clinical data. Our primary goal is to support data-driven research that translates to improved healthcare outcomes.

The program includes a dedicated team of informatics experts, data scientists, and software/data engineers who are continually developing, extending, and supporting the use of UCSF clinical and biomedical data to engage in premier research. This includes the establishment of data warehouses and datamarts that are subject to different structural and semantic standards for multi-study use and for participation in clinical research networks (CRNs) as well as the maintenance of data team relationships with individual UCSF researchers as well as with collaborating institutions and CRNs.

Key Focus Areas

At the heart of our work are the DeID Data Assets, developed in collaboration with UCSF Bakar Computational Health Sciences Institute, offered as self-serve data from UCSF Information Commons. The DeID Data Assets comprise fully linkable, de-identified, self-service, multi-modal data for more than 4 million patients available on various platforms – SQL Server, Information Commons AWS, and Wynton HPC – accessible via programming or no-code applications. Our team fosters a vibrant community of research users by conducting weekly office hours and tutorials, building knowledge bases, and hosting and moderating digital forums for the sharing of user-to-user knowledge.

In addition to developing and maintaining the DeID Data Assets, our team creates and maintains various representations of UCSF’s electronic health record (EHR) data in the form of common data models (CDMs), such as the PCORnet CDM and OHDSI’s OMOP CDM. These data assets enable UCSF to lead and participate in multi-institutional initiatives, including the UC Health Data Warehouse and Data Discovery Platform, REACHnet CDRN, the All of Us Research Program, BRIDGE2AI/CHoRUS, and more.

Self-serve Data and UCSF Information Commons

Researchers have two avenues for accessing clinical data for observational research: data steward services and self-serve data assets.

The data steward provides high-touch service to deliver highly specified, analysis-ready dataset based on the needs for a given study. If the data required includes PHI, this process involves IRB approval, and the data steward will deliver the minimum necessary data within the scope of the IRB.

In contrast to data steward services, our program’s self-serve data assets provide access to broad and deep, multi-modal, de-identified data sources without the need for IRB approval or an intermediary to extract the data. Our team provides the data assets in a format close to Epic Caboodle data warehouse (consistent with the source system our clinical data originates in) and in a standardized, harmonized format based on the OMOP Common Data Model (which supports external collaborations for larger scale research). The benefits of using these self-serve data assets include direct access at any time, and the ability to reuse it across multiple projects.

The data assets we build are designed to be linkable across multiple data types, at the patient, population, cellular, genomic, and molecular level, to enable answering complex research questions. The integrated research data assets, built by our team and the research IT partners, along with the tools for easier access and analysis, computational environments, and services we provide to support research community, are known as UCSF Information Commons.

Research Data Assets

  • De-Identified Clinical Data Warehouse (UCSF Health), which includes administrative and clinical data for over 4M patients including
    • Structured EHR Data, de-identified version of Epic’s CDW*
    • Structured Dentistry Data*
    • De-Identified Clinical Notes**
    • Genomic Testing Data for UCSF 500 and Foundation Medicine patients**
  • De-Identified OMOP (UCSF Health)
    • Structured EHR data in standardized format*
  • De-Identified Radiology Images (UCSF Health, select cohorts)***
  • Coming Soon:
    • De-identified Historical clinical notes from STOR (UCSF Legacy EHR)
    • De-identified Cancer Registry Data
    • De-identified SFDPH/ZSFG Structured EHR Data
  • UCSF Operational OMOP for UC HDW
  • Quality Improvement Program (QIP) OMOP
  • All of Us Research Program OMOP
  • American Society of Hematology Research Collaboration (ASH-RC) OMOP
  • COVID Datamart
  • PCORnet CDM
    • The PCORnet Blood Pressure Control Laboratory
    • Researching COVID to Enhance Recovery (RECOVER)
    • Adult Congential Heart Disease (ACHD) Research Program
    • COVID-19 Citizen Science Research Program

* this asset is developed & supported by the ARS Research Data Assets team

** this asset is developed & supported in partnership with BCSHI

*** this asset is available through our partnership with the CI2, BCHSI, and CTSI as part of the Information Commons

User-friendly Data Exploration Tools

  • PatientExploreR, a web-based application that provides the ability to search UCSF de-identified clinical data and explore characteristics of groups of patients with an easy-to-use cohort selection interface. The patient cohorts generated by the PatientExploreR search can be visualized and filtered using the tool's interactive plotting features. Users can view and download data for individual patients, or export detailed data for a cohort to further analyze with other tools.
  • EMERSE, an open-source, user-friendly interface to search de-identified clinical notes
  • ATLAS, "an open source software tool for researchers to conduct scientific analyses on standardized observational data converted to theOMOP Common Data Model V5" - https://github.com/OHDSI/Atlas
    Access UCSF's DeID OMOP ATLAS

Services

We provide support to UCSF research community through the following initiatives:

  • Development and support of the community knowledge base around data assets and best practices for using them in research
  • User group office hours and tutorials for user training and knowledge sharing
  • Guided user onboarding process
  • Consultations and collaborations with informatics and data practitioners on our team

Links to Key Resources