A clinical data retrieval and management training program for researchers

View the Project on GitHub galterdatalab/crdm-training


Enterprise Data Warehouses (EDWs) play an increasingly important role on academic medical campuses, housing clinical and other enterprise-wide data and making it available for research and strategic purposes in the learning health system. Here, we will create an end-to-end training program that introduces clinician researchers to clinical database architecture and clinical coding standards, teaches them how to translate their research questions into queries that will allow them to extract data properly, and how to do so in a way that supports transparency and reproducibility while still respecting guidelines for proper data sharing. This work will build on longstanding partnerships with the leadership and data analysts of the Northwestern Medicine Enterprise Data Warehouse (NMEDW). We will work together to promote improved communication and collaboration between data analysts and clinical researchers to make them better partners in research projects. To promote reusability of research reports and database queries within Northwestern’s research community, we will provide workflows for preservation and discovery using InvenioRDM, a next-generation research data management (RDM) system. Our ultimate goal is to bolster support for our local research community to use clinical research data from the NMEDW and also parlay this experience to develop a blueprint of best practice workflows for clinical research data education and training that could be applied in libraries at other institutions.


In 2018, the Northwestern Medicine Enterprise Data Warehouse (NMEDW) received over 600 research data requests, up nearly 50% from the previous year and more than 200% from 2014. This significant increase in demand for clinical research data highlights the need for a more comprehensive training program for system users. At the same time, resource constraints have made it difficult to provide training for clinician researchers that helps to close the communication gap between these researchers and the data analysts who help them extract their research population data of interest from the database. We believe that by providing proper training we can help these researchers translate their domain knowledge more effectively to their analyst partners, and vice versa. This improved communication pathway would not only increase efficiency but also help to avoid miscommunication that causes delays and can even result in incomplete or incorrect inclusion criteria for research cohorts. In addition, current research reports, which include project title and description, inclusion and exclusion criteria, SQL queries, and data plots are not stored in ways that support reuse and preservation. While the analysts themselves have access to past reports, researchers have no way to search for and explore previous projects that may help them realize efficiencies and advance their own work. These improvements would benefit the NMEDW, with more efficient workflows and less one-on-one analyst time needed; the researcher patrons, who will have an easier research start-up process and a foundation of training and reference studies on which they can build their work; the university, which will save time and research funding that could be applied elsewhere; and the larger community, as we will make project findings and resources available to others who seek to support this critical need on their campuses.


  1. To create an end-to-end Clinical Data Retrieval and Management Program for researchers that teaches them how clinical data is collected, stored, and retrieved, how to identify their research population of interest, how to create practical data retrieval workflows for their clinical research projects, and best practices for ensuring that the research reports for these projects are reproducible and reusable.

  2. To promote improved communication and collaboration between data analysts and clinical researchers to make them better partners in research projects.

  3. To enhance reusability of clinical reports and database queries by creating workflows and training for preserving them in our next-generation research data management system and making them discoverable to Northwestern’s research community.

  4. To use our work and experience to provide a template for clinical research data education and training for other institutions.

Funding Acknowledgement

Developed resources reported on this website are supported by the National Library of Medicine (NLM), National Institutes of Health (NIH) under cooperative agreement number 1UG4LM012346. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Back to home