A clinical data analysis project is not unlike typical data analysis projects or software projects. Therefore, the conventional wisdom and tricks for managing a successful project are also applicable here. At the same time, clinical projects also have unique traits, such as high standards for planning, development, validation, and delivery under strict time constraints.
Although many factors determine if a project can execute efficiently, we believe a few aspects are critical for long-term success, especially when managing clinical data analysis projects at scale.
As a general principle, all the team members involved in a project should take basic training on project management and understand how to work as a development team. Fitzpatrick and Collins-Sussman (2012) provides some valuable tips on this topic. As always, setting a clear goal and following a system development lifecycle (SDLC) is essential.
Having a clean architecture design for your code improves the project’s robustness and flexibility for future changes. For example, we should understand how to separate business logic from other layers; know what should be created as reusable components and what should be written as one-off analysis scripts; write low coupling, high cohesion code, and so on. Martin, Grenning, and Brown (2018) offers some helpful insights on this topic.
Knowing what you can do is essential. Create a core capabilities list for your team.
Sometimes, it is also critical to understand what not to do. For example, the hidden cost of integrating with external systems or involving other programming languages can be prohibitively high. Remember, a simple, robust solution is almost always preferable to a complex solution that requires high maintenance and constant attention.
For A&R deliverables in clinical project development, a clearly defined process or system development lifecycle (SDLC) is crucial to ensure regulatory compliance.
SDLC for the A&R deliverables can be defined in four stages.
- Planning: a planning stage to define the scope of a project.
- Development: a development stage to implement target deliverables.
- Validation: a validation stage to verify target deliverables.
- Operation: an operation stage to deliver work to stakeholders.
Importantly, we should not consider SDLC as a linear process. For example, if the study team identifies a new requirement in a development or validation stage, the team should return to the planning stage to discuss and align the scope. An agile project management approach is suitable and recommended for an A&R clinical development project. The goal is to embrace an iterative approach that continuously improves target deliverables based on frequent stakeholder feedback.
There are many good tools to implement agile project management strategy, for example:
The planning stage is important in the SDLC lifecycle as the requirements for all A&R deliverables are gathered and documented.
In the planning stage, a project leader should identify all the deliverables, e.g., a list of tables, listings, and figures (TLFs). For each TLFs, the team should prepare the necessary specifications:
- mock-up tables
- validation level (e.g., independent review or double programming)
The project leader should also align work assignments with team members. The purpose is to answer the question of “who is doing what?”
To enable reproducibility, the project leader should also review the startup file
.Rprofile discussed in Section 10.2) and define:
- R version
- Repository of R packages with a snapshot date
- Project package library path
After project initiation, modifying
.Rprofile will be a risk of reproducibility
and should be handled carefully if necessary.
After a project is initiated, the study team starts to develop TLFs based on pre-defined mock-up tables assigned to each team member.
The analysis code and relevant description can be saved in R Markdown files
The use of R Markdown allows developers to assemble narrative text, code, and its
comments in one place to simplify documentation.
It would be helpful to create a template and define a name convention for all TLFs deliverables.
For example, we can use the
tlf_ prefix in the filename to indicate that the R Markdown file is for delivering TLFs.
Multiple TLFs with similar designs can be included in one R Markdown file.
For example, in the
esubdemo project, we have six R Markdown files to create TLFs.
If there are any project-specific R functions that need to be developed,
the R functions can be placed in the
R/ folder as discussed in Section 10.1.
Validation is a crucial stage to ensure the deliverables are accurate and consistent. After the development stage is completed, the project team needs to validate the deliverables, including R Markdown files for TLFs deliverables and project-specific R functions. The level of validation is determined at the define stage.
In an R package development, the validation or testing is completed under the
The testthat R package can be used to streamline the validation process.
More details of the
testthat package for R package validation can be found in Chapter 12 of the R package book.
It is recommended to have a name convention to indicate the type of validation.
For example, we can use
to classify the validation type.
It is recommended to follow the same organization
for files in testthat folder as
R/ folder and
Every single file in the
R/ folder and
vignettes/ folder should have a testing file saved in
tests/testthat/ folder to validate the content.
For example, in
esubdemo project, we can have a list of testing files below.
tests/testthat ├── test-independent-test-tlf-01-disposition.R ├── test-independent-test-tlf-02-population.R ├── test-independent-test-tlf-03-baseline.R ├── test-independent-test-tlf-04-efficacy.R ├── test-independent-test-tlf-05-ae-summary.R ├── test-independent-test-tlf-06-ae-spec.R └── test-independent-test-fmt.R
To validate the content of a table, we can save the last datasets ready for table generation as a
A validator can reproduce the TLF and compare it with the original result saved in the
A test is passed when the results match.
Customers can directly review the formatting of the TLFs by comparing them with the mock-up.
To validate a figure, we can use the snapshot testing strategy.
After the validator completes the testing of project-specific functions and R Markdown files,
the process to execute and report testing results is the same for a standard R package.
devtools::test() function automatically executes all testing cases and summarizes the testing results in a report.
After completing the validation, the validator updates the status in a validation tracker. The project lead reviews the tracking sheet to make sure all required activities in the SDLC are completed, and the tracking sheet has been filled correctly. The deliverables are ready for customer review after all the validation steps are completed. Any changes to the output requested by customers are documented.
After completion of development and required validation of all A&R deliverables,
the project lead runs compliance checks for a project-specific R package similar to other R packages.
devtools::check() is a convenient way to run compliance checks or
R CMD check.
R CMD check is an automated check of the contents in the R package
for frequently encountered issues before submission to CRAN.
Since the project-specific R package is not submitted to CRAN, some checks can be customized and skipped in
The project lead should work with the study team to ensure
all reported errors, warnings, and notes by
devtools::check() are fixed.
The project lead can also use the R package
pkgdown to build a complete website for a project-specific R package.
pkgdown website is a convenient way to run all analyses in batch and
integrate outputs in a website, which comprehensively covers
project-specific R functions, TLF generation programs, outputs and validation tracking information, etc.
For example, in the
esubdemo project, we created the
pkgdown website at https://elong0527.github.io/esubdemo/.
Many of the tasks in SDLC can be completed automatically.
An organization can leverage CI/CD workflow to automatically enable those tasks,
such as running testing cases and creating a
For example, in the
esubdemo project, we set up
for it. This can be done by using