# 3  Analysis population

Following ICH E3 guidance, we need to summarize the number of participants included in each efficacy analysis in Section 11.1, Data Sets Analysed.

``````library(haven) # Read SAS data
library(dplyr) # Manipulate data
library(tidyr) # Manipulate data
library(r2rtf) # Reporting in RTF format``````

In this chapter, we illustrate how to create a summary table for the analysis population for a study.

The first step is to read relevant datasets into R. For the analysis population table, all the required information is saved in the ADSL dataset. We can use the `haven` package to read the dataset.

``adsl <- read_sas("data-adam/adsl.sas7bdat")``

We illustrate how to prepare a report data for a simplified analysis population table using variables below:

• USUBJID: unique subject identifier
• ITTFL: intent-to-treat population flag
• EFFFL: efficacy population flag
• SAFFL: safety population flag
``````adsl %>%
select(USUBJID, ITTFL, EFFFL, SAFFL) %>%
#> # A tibble: 4 × 4
#>   USUBJID     ITTFL EFFFL SAFFL
#>   <chr>       <chr> <chr> <chr>
#> 1 01-701-1015 Y     Y     Y
#> 2 01-701-1023 Y     Y     Y
#> 3 01-701-1028 Y     Y     Y
#> 4 01-701-1033 Y     Y     Y``````

## 3.1 Helper functions

Before we write the analysis code, let’s discuss the possibility of reusing R code by writing helper functions.

As discussed in R for data science, “You should consider writing a function whenever you’ve copied and pasted a block of code more than twice”.

In Chapter 2, there are a few repeating steps to:

• Format the percentages using the `formatC()` function.
• Calculate the numbers and percentages by treatment arm.

We create two ad-hoc functions and use them to create the tables in the rest of this book.

To format numbers and percentages, we create a function called `fmt_num()`. It is a very simple function wrapping `formatC()`.

``````fmt_num <- function(x, digits, width = digits + 4) {
formatC(
x,
digits = digits,
format = "f",
width = width
)
}``````

The main reason to create the `fmt_num()` function is to enhance the readability of the analysis code.

For example, we can compare the two versions of code to format the percentage used in Chapter 2 and `fmt_num()`.

``````formatC(n / n() * 100,
digits = 1, format = "f", width = 5
)``````
``fmt_num(n / n() * 100, digits = 1)``

To calculate the numbers and percentages of participants by groups, we provide a simple (but not robust) wrapper function, `count_by()`, using the dplyr and tidyr package.

The function can be enhanced in multiple ways, but here we only focus on simplicity and readability. More details about writing R functions can be found in the STAT 545 course.

``````count_by <- function(data, # Input data set
grp, # Group variable
var, # Analysis variable
var_label = var, # Analysis variable label
id = "USUBJID") { # Subject ID variable
data <- data %>% rename(grp = !!grp, var = !!var, id = !!id)

left_join(
count(data, grp, var),
count(data, grp, name = "tot"),
by = "grp",
) %>%
mutate(
pct = fmt_num(100 * n / tot, digits = 1),
n = fmt_num(n, digits = 0),
npct = paste0(n, " (", pct, ")")
) %>%
pivot_wider(
id_cols = var,
names_from = grp,
values_from = c(n, pct, npct),
values_fill = list(n = "0", pct = fmt_num(0, digits = 0))
) %>%
mutate(var_label = var_label)
}``````

By using the `count_by()` function, we can simplify the analysis code as below.

``````count_by(adsl, "TRT01PN", "EFFFL") %>%
select(-ends_with(c("_54", "_81")))
#> # A tibble: 2 × 5
#>   var   n_0    pct_0   npct_0         var_label
#>   <chr> <chr>  <chr>   <chr>          <chr>
#> 1 N     "   7" "  8.1" "   7 (  8.1)" EFFFL
#> 2 Y     "  79" " 91.9" "  79 ( 91.9)" EFFFL``````

## 3.2 Analysis code

With the helper function `count_by`, we can easily prepare a report dataset as

``````# Derive a randomization flag

var_label = "Participants in Population"
) %>%
select(var_label, starts_with("n_"))``````
``````pop1 <- bind_rows(
var_label = "Participants included in ITT population"
),
var_label = "Participants included in efficacy population"
),
var_label = "Participants included in safety population"
)
) %>%
filter(var == "Y") %>%
select(var_label, starts_with("npct_"))``````

Now we combine individual rows into one table for reporting purpose. `tbl_pop` is used as input for r2rtf to create the final report.

``````names(pop) <- gsub("n_", "npct_", names(pop))
tbl_pop <- bind_rows(pop, pop1)

tbl_pop %>% select(var_label, npct_0)
#> # A tibble: 4 × 2
#>   var_label                                    npct_0
#>   <chr>                                        <chr>
#> 1 Participants in Population                   "  86"
#> 2 Participants included in ITT population      "  86 (100.0)"
#> 3 Participants included in efficacy population "  79 ( 91.9)"
#> 4 Participants included in safety population   "  86 (100.0)"``````

We define the format of the output using code below.

``````rel_width <- c(2, rep(1, 3))
colheader <- " | Placebo | Xanomeline line Low Dose| Xanomeline line High Dose"
tbl_pop %>%
# Table title
rtf_title(
"Summary of Analysis Sets",
"(All Participants Randomized)"
) %>%
# First row of column header
col_rel_width = rel_width
) %>%
# Second row of column header
rtf_colheader(" | n (%) | n (%) | n (%)",
border_top = "",
col_rel_width = rel_width
) %>%
# Table body
rtf_body(
col_rel_width = rel_width,
text_justification = c("l", rep("c", 3))
) %>%
# Encoding RTF syntax
rtf_encode() %>%
# Save to a file
write_rtf("tlf/tbl_pop.rtf")``````

The procedure to generate an analysis population table can be summarized as follows:

• Step 1: Read data (i.e., `adsl`) into R.
• Step 2: Bind the counts/percentages of the ITT population, the efficacy population, and the safety population by row using the `count_by()` function.
• Step 3: Format the output from Step 2 using r2rtf.