R for Clinical Study Reports and Submission

January 2022

Disclaimer

All opinions expressed are those of the presenter and not Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Kenilworth, NJ, USA.
Some slides need to be scrolled down to see the full content.

Clarification from FDA

FDA: Statistical Software Clarifying Statement

“FDA does not require use of any specific software for statistical analyses, and statistical software is not explicitly discussed in Title 21 of the Code of Federal Regulations [e.g., in 21CFR part 11]. However, the software package(s) used for statistical analyses should be fully documented in the submission, including version and build identification.”

Motivation

As an organization, we need to ensure compliance and reduce the risk of using R and R packages in regulatory deliverables.

R is widely used in clinical trial study design.
- Following Lachin and Foulkes (1986)
- gsDesign: an R package for group sequential design under proportional hazards.
- simtrial, gsDesign2, and gsdmvn: experimental R packages for group sequential design under non-proportional hazards
- Bookdown: https://keaven.github.io/gsd-deming/
R is flexible for handling novel missing data approaches following estimand framework in ICH E9 (R1)
- Recurrent event data: Gao et al. (2017) Diao et al. (2020)
- Robustness: Liu et al. (2021)
R is used in Bayesian statistics.
- stan and network meta-analysis for drug reimbursement analysis
R is widely used in visualization
- SafetyGraphics
- forestly: Interactive forest plot for DMC safety monitoring in clinical trials

Background

Clinical study report is a key deliverable for clinical trials to regulatory agencies. (e.g., FDA, CFDA)
- ICH E3 Structure and Content of Clinical Study Reports
We try to fill in gaps to streamline workflow using R for clinical trial development:
- To develop, validate, and deliver analysis results.
- To submit analysis results to regulatory agencies in eCTD format.
Focus on table, listing, figure (TLFs) delivered in RTF/Microsoft Word format
- In the pharmaceutical industry, RTF/Microsoft Word play a central role in preparing clinical study reports.
- Different organization can have different table standards

R Consortium pilot submission

Challenges and assumptions

Challenges

How to submit internal developed (proprietary) R packages?
How to follow ICH and/or FDA guidances in preparing eCTD package?
How to enhance reproducibility from FDA reviewer’s perspective?

Assumptions

Focus on analysis and reporting given available ADaM datasets.

Deliverables

Open source pilot eCTD submission package to FDA
- Proprietary R package
- R scripts for analysis
- Analysis data reviewer guide (ADRG)
- etc.
Development: https://github.com/RConsortium/submissions-pilot1
eCTD package: https://github.com/RConsortium/submissions-pilot1-to-fda
R consortium announcement

FDA response

“Using R version 4.1.1, FDA was able to run the submitted code and confirm the applicant’s tables and the submitted figure in report-tlf pdf file.”
“Using FDA developed code, a statistical analyst was able to independently generate tables using the submitted data.”
Original FDA response

Future work

Address minor issues reported by FDA reviewers.
Pilot R Shiny app submission.
Pilot submission with advanced analysis methods (e.g., study design, missing data, Bayesian, etc).

Reproducibility

Reproducibility spectrum

Requirement from FDA

FDA Study Data Technical Conformance Guide:

“Sponsors should provide the software programs used to create all ADaM datasets and generate tables and figures associated with primary and secondary efficacy analyses. Furthermore, sponsors should submit software programs used to generate additional information included in Section 14 CLINICAL STUDIES of the Prescribing Information (PI)26 if applicable. The specific software utilized should be specified in the ADRG. The main purpose of requesting the submission of these programs is to understand the process by which the variables for the respective analyses were created and to confirm the analysis algorithms. Sponsors should submit software programs in ASCII text format; however, executable file extensions should not be used.”

Recommendations

Although FDA did not expect submitted R code is executable, sponsor shall enhance reproducibility.

Fixed R version: e.g., R 4.1.0
Fixed R package snapshot date: e.g., 2021-08-31
- Posit Package Manager
- CRAN Time Machine by MRAN
Flexibility of input and output path:
- Define path as parameter.
eCTD deliverables dry-run in Windows environment
Provide steps to follow in ADRG appendix

Implementation

Philosophy

We share the same philosophy described in Section 1.1 of the R Packages book and quote here.

“Anything that can be automated, should be automated.”
“Do as little as possible by hand. Do as much as possible with functions.”
“The goal is to spend your time thinking about what you want to do rather than thinking about the minutiae of package structure.”

Tools for reporting and submission

Tools:

r2rtf: create production-ready tables and figures in RTF format.
pkglite: represent and exchange R package source code as text files.
cleanslate (under internal validation): create portable R environments.

Bookdown: https://r4csr.org/

r2rtf: design

r2rtf is designed to:

Generate highly customized tables
Limit package dependency
Target regulatory deliverable
Support pipes (%>%)

r2rtf: minimal example

r2rtf is designed to be pipe-friendly (%>%)

head(iris) %>%
  rtf_body() %>%           # Step 1 Add table attributes
  rtf_encode() %>%         # Step 2 Convert attributes to RTF encode
  write_rtf("minimal.rtf") # Step 3 Write to a .rtf file

r2rtf: function illustration

pkglite: compact package representations

pkglite reimagines the way to represent R packages.

A tool for packing and restoring R packages as plaintext assets that are easy to store, transfer, and review
A grammar for specifying the file packing scope that is functional, precise, and extendable
A standard for exchanging the packed asset that is unambiguous, human-friendly, and machine-readable

library("pkglite")

"/path/to/pkg/" %>%
  collate(file_ectd(), file_auto("inst/")) %>%
  pack()

pack(
  "/path/to/pkg1/" %>% collate(file_ectd()),
  "/path/to/pkg2/" %>% collate(file_ectd()),
  output = "/path/to/pkglite.txt"
)

"/path/to/pkglite.txt" %>% unpack(output = "/path/to/output/", install = TRUE)

Website: https://merck.github.io/pkglite/

cleanslate: portable R environments

Create a project folder with specific context (.Rproj, .Rprofile, .Renviron)
Install a specific version of R into the project folder
Install a specific version of Rtools into the project folder
(without administrator privileges)

library("cleanslate")

"portable-project/" %>%
  use_project(repo = "https://url/snapshot/2021-11-20/") %>%
  use_rprofile() %>%
  use_renviron() %>%
  use_r_version(version = "4.1.1") %>%
  use_rtools(version = "rtools40")

Summary of workflow as user stories

Within a regulatory R environment:

As a statistician, I use tidyverse, r2rtf, and internal tools to define the mock-up table, listing and figure (TLFs) for statistical analysis of a clinical trial.
- More than 100 TLFs for efficacy and safety of a drug or vaccine.
As a programmer, I use tidyverse, r2rtf, and internal tools to develop and/or validate analysis results based on mock-up TLFs.
As a statistican/programmer, I use pkglite and internal tools to prepare proprietary R packages and analysis R scripts for eCTD submission package.
As an internal/external reviewer, I use cleanslate to re-construct a portable environment (if required) to reproduce analysis results.

More details: https://r4csr.org/

Folder structure

We recommended to use R package structure to organize standard tools, analysis projects, and Shiny apps.

Consistency: everyone works on the same folder structure.
Reproducibility: analysis can be executed and reproduced by different team members months/years later.
Automation: automatically check the integration of a project.
Compliance: reduce compliance issues.

More details: https://r4csr.org/project-folder.html

Example: https://github.com/elong0527/esubdemo

Project management

Setting up for success
- Work as a team
- Design clean code architecture
- Set capability boundaries
- Contribute to the community
The Software Development Life Cycle
- Planning: define the scope of a project
- Development: implement target deliverables
- Validation: verify target deliverables
- Operation: deliver work to stakeholders

More details: https://r4csr.org/project-management.html

Cross-industry collaborations

R Validation Hub: https://www.pharmar.org/
- Focus on designing a framework that assesses the quality of an R package
- Presentations: https://www.pharmar.org/present/
R-based submission pilots to FDA:
- https://rconsortium.github.io/submissions-wg/
- Focus on improving practices of R-based clinical trial regulatory submission.
R/Pharma: https://rinpharma.com/
- Annual conference focus on the use of R in the development of pharmaceuticals.
R Consortium R Adoption Seminar Series
- https://www.r-consortium.org/webinars

Future directions

Enhance compliance, reproducibility, traceability, and automation
- Automation of analysis, documentation, review, and testing
- Linkage among data, TLFs, and final reports
- R package qualification
Enable advanced study design and statistical methods
Introduce interactive visualization and reporting (with/without backend server)

Thank you

References

Diao, Guoqing, Guanghan F Liu, Donglin Zeng, Yilong Zhang, Gregory Golm, Joseph F Heyse, and Joseph G Ibrahim. 2020. “Efficient Multiple Imputation for Sensitivity Analysis of Recurrent Events Data with Informative Censoring.” Statistics in Biopharmaceutical Research, 1–9.

Gao, Fei, Guanghan F Liu, Donglin Zeng, Lei Xu, Bridget Lin, Guoqing Diao, Gregory Golm, Joseph F Heyse, and Joseph G Ibrahim. 2017. “Control-Based Imputation for Sensitivity Analyses in Informative Censoring for Recurrent Event Data.” Pharmaceutical Statistics 16 (6): 424–32.

Lachin, John M, and Mary A Foulkes. 1986. “Evaluation of Sample Size and Power for Analyses of Survival with Allowance for Nonuniform Patient Entry, Losses to Follow-up, Noncompliance, and Stratification.” Biometrics, 507–19.

Liu, Siyi, Shu Yang, Yilong Zhang, et al. 2021. “Multiply Robust Estimators in Longitudinal Studies with Missing Data Under Control-Based Imputation.” arXiv Preprint arXiv:2112.06000.