January 2022

Disclaimer

  • All opinions expressed are those of the presenter and not Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Kenilworth, NJ, USA.

  • Some slides need to be scrolled down to see the full content.

Clarification from FDA

“FDA does not require use of any specific software for statistical analyses, and statistical software is not explicitly discussed in Title 21 of the Code of Federal Regulations [e.g., in 21CFR part 11]. However, the software package(s) used for statistical analyses should be fully documented in the submission, including version and build identification.”

Motivation

As an organization, we need to ensure compliance and reduce the risk of using R and R packages in regulatory deliverables.

Background

  • Clinical study report is a key deliverable for clinical trials to regulatory agencies. (e.g., FDA, CFDA)
  • We try to fill in gaps to streamline workflow using R for clinical trial development:
    • To develop, validate, and deliver analysis results.
    • To submit analysis results to regulatory agencies in eCTD format.
  • Focus on table, listing, figure (TLFs) delivered in RTF/Microsoft Word format
    • In the pharmaceutical industry, RTF/Microsoft Word play a central role in preparing clinical study reports.
    • Different organization can have different table standards

R Consortium pilot submission

Challenges and assumptions

Challenges

  • How to submit internal developed (proprietary) R packages?
  • How to follow ICH and/or FDA guidances in preparing eCTD package?
  • How to enhance reproducibility from FDA reviewer’s perspective?

Assumptions

  • Focus on analysis and reporting given available ADaM datasets.

Deliverables

FDA response

  • “Using R version 4.1.1, FDA was able to run the submitted code and confirm the applicant’s tables and the submitted figure in report-tlf pdf file.”
  • “Using FDA developed code, a statistical analyst was able to independently generate tables using the submitted data.”
  • Original FDA response

Future work

  • Address minor issues reported by FDA reviewers.
  • Pilot R Shiny app submission.
  • Pilot submission with advanced analysis methods (e.g., study design, missing data, Bayesian, etc).

Reproducibility

Reproducibility spectrum

Requirement from FDA

“Sponsors should provide the software programs used to create all ADaM datasets and generate tables and figures associated with primary and secondary efficacy analyses. Furthermore, sponsors should submit software programs used to generate additional information included in Section 14 CLINICAL STUDIES of the Prescribing Information (PI)26 if applicable. The specific software utilized should be specified in the ADRG. The main purpose of requesting the submission of these programs is to understand the process by which the variables for the respective analyses were created and to confirm the analysis algorithms. Sponsors should submit software programs in ASCII text format; however, executable file extensions should not be used.”

Recommendations

Although FDA did not expect submitted R code is executable, sponsor shall enhance reproducibility.

  • Fixed R version: e.g., R 4.1.0
  • Fixed R package snapshot date: e.g., 2021-08-31
  • Flexibility of input and output path:
    • Define path as parameter.
  • eCTD deliverables dry-run in Windows environment
  • Provide steps to follow in ADRG appendix

Implementation

Philosophy

We share the same philosophy described in Section 1.1 of the R Packages book and quote here.

  • “Anything that can be automated, should be automated.”
  • “Do as little as possible by hand. Do as much as possible with functions.”
  • “The goal is to spend your time thinking about what you want to do rather than thinking about the minutiae of package structure.”

Tools for reporting and submission

Tools:

  • r2rtf: create production-ready tables and figures in RTF format.
  • pkglite: represent and exchange R package source code as text files.
  • cleanslate (under internal validation): create portable R environments.

Bookdown: https://r4csr.org/

r2rtf: design

r2rtf is designed to:

  • Generate highly customized tables
  • Limit package dependency
  • Target regulatory deliverable
  • Support pipes (%>%)

r2rtf: minimal example

r2rtf is designed to be pipe-friendly (%>%)

head(iris) %>%
  rtf_body() %>%           # Step 1 Add table attributes
  rtf_encode() %>%         # Step 2 Convert attributes to RTF encode
  write_rtf("minimal.rtf") # Step 3 Write to a .rtf file

r2rtf: function illustration

pkglite: compact package representations

pkglite reimagines the way to represent R packages.

  • A tool for packing and restoring R packages as plaintext assets that are easy to store, transfer, and review
  • A grammar for specifying the file packing scope that is functional, precise, and extendable
  • A standard for exchanging the packed asset that is unambiguous, human-friendly, and machine-readable
library("pkglite")

"/path/to/pkg/" %>%
  collate(file_ectd(), file_auto("inst/")) %>%
  pack()

pack(
  "/path/to/pkg1/" %>% collate(file_ectd()),
  "/path/to/pkg2/" %>% collate(file_ectd()),
  output = "/path/to/pkglite.txt"
)

"/path/to/pkglite.txt" %>% unpack(output = "/path/to/output/", install = TRUE)

Website: https://merck.github.io/pkglite/

cleanslate: portable R environments

  • Create a project folder with specific context (.Rproj, .Rprofile, .Renviron)
  • Install a specific version of R into the project folder
  • Install a specific version of Rtools into the project folder
  • (without administrator privileges)
library("cleanslate")

"portable-project/" %>%
  use_project(repo = "https://url/snapshot/2021-11-20/") %>%
  use_rprofile() %>%
  use_renviron() %>%
  use_r_version(version = "4.1.1") %>%
  use_rtools(version = "rtools40")

Summary of workflow as user stories

Within a regulatory R environment:

  • As a statistician, I use tidyverse, r2rtf, and internal tools to define the mock-up table, listing and figure (TLFs) for statistical analysis of a clinical trial.

    • More than 100 TLFs for efficacy and safety of a drug or vaccine.
  • As a programmer, I use tidyverse, r2rtf, and internal tools to develop and/or validate analysis results based on mock-up TLFs.

  • As a statistican/programmer, I use pkglite and internal tools to prepare proprietary R packages and analysis R scripts for eCTD submission package.

  • As an internal/external reviewer, I use cleanslate to re-construct a portable environment (if required) to reproduce analysis results.

More details: https://r4csr.org/

Folder structure

We recommended to use R package structure to organize standard tools, analysis projects, and Shiny apps.

  • Consistency: everyone works on the same folder structure.
  • Reproducibility: analysis can be executed and reproduced by different team members months/years later.
  • Automation: automatically check the integration of a project.
  • Compliance: reduce compliance issues.

More details: https://r4csr.org/project-folder.html

Example: https://github.com/elong0527/esubdemo

Project management

  • Setting up for success
    • Work as a team
    • Design clean code architecture
    • Set capability boundaries
    • Contribute to the community
  • The Software Development Life Cycle
    • Planning: define the scope of a project
    • Development: implement target deliverables
    • Validation: verify target deliverables
    • Operation: deliver work to stakeholders

More details: https://r4csr.org/project-management.html

Cross-industry collaborations

Future directions

  • Enhance compliance, reproducibility, traceability, and automation
    • Automation of analysis, documentation, review, and testing
    • Linkage among data, TLFs, and final reports
    • R package qualification
  • Enable advanced study design and statistical methods
  • Introduce interactive visualization and reporting (with/without backend server)

Thank you

References

Diao, Guoqing, Guanghan F Liu, Donglin Zeng, Yilong Zhang, Gregory Golm, Joseph F Heyse, and Joseph G Ibrahim. 2020. “Efficient Multiple Imputation for Sensitivity Analysis of Recurrent Events Data with Informative Censoring.” Statistics in Biopharmaceutical Research, 1–9.

Gao, Fei, Guanghan F Liu, Donglin Zeng, Lei Xu, Bridget Lin, Guoqing Diao, Gregory Golm, Joseph F Heyse, and Joseph G Ibrahim. 2017. “Control-Based Imputation for Sensitivity Analyses in Informative Censoring for Recurrent Event Data.” Pharmaceutical Statistics 16 (6): 424–32.

Lachin, John M, and Mary A Foulkes. 1986. “Evaluation of Sample Size and Power for Analyses of Survival with Allowance for Nonuniform Patient Entry, Losses to Follow-up, Noncompliance, and Stratification.” Biometrics, 507–19.

Liu, Siyi, Shu Yang, Yilong Zhang, et al. 2021. “Multiply Robust Estimators in Longitudinal Studies with Missing Data Under Control-Based Imputation.” arXiv Preprint arXiv:2112.06000.