4 Baseline characteristics

Following ICH E3 guidance, we need to summarize critical demographic and baseline characteristics of the participants in Section 11.2, Demographic and Other Baseline Characteristics.

In this chapter, we illustrate how to create a simplified baseline characteristics table for a study.

There are many R packages that can efficiently summarize baseline information. The table1 R package is one of them.

library(table1)
library(r2rtf)
library(haven)
library(dplyr)
library(tidyr)
library(stringr)
library(tools)

As in previous chapters, we first read the adsl dataset that contains all the required information for the baseline characteristics table.

adsl <- read_sas("data-adam/adsl.sas7bdat")

For simplicity, we only analyze SEX, AGE and, RACE in this example using the table1 R package. More details of the table1 R package can be found in the package vignettes.

The table1 R package directly creates an HTML report.

ana <- adsl %>%
  mutate(
    SEX = factor(SEX, c("F", "M"), c("Female", "Male")),
    RACE = toTitleCase(tolower(RACE))
  )

tbl <- table1(~ SEX + AGE + RACE | TRT01P, data = ana)
tbl

	Placebo (N=86)	Xanomeline High Dose (N=84)	Xanomeline Low Dose (N=84)	Overall (N=254)
SEX
Female	53 (61.6%)	40 (47.6%)	50 (59.5%)	143 (56.3%)
Male	33 (38.4%)	44 (52.4%)	34 (40.5%)	111 (43.7%)
Age
Mean (SD)	75.2 (8.59)	74.4 (7.89)	75.7 (8.29)	75.1 (8.25)
Median [Min, Max]	76.0 [52.0, 89.0]	76.0 [56.0, 88.0]	77.5 [51.0, 88.0]	77.0 [51.0, 89.0]
RACE
Black or African American	8 (9.3%)	9 (10.7%)	6 (7.1%)	23 (9.1%)
White	78 (90.7%)	74 (88.1%)	78 (92.9%)	230 (90.6%)
American Indian or Alaska Native	0 (0%)	1 (1.2%)	0 (0%)	1 (0.4%)

The code below transfer the output into a dataframe that only contains ASCII characters recommended by regulatory agencies. tbl_base is used as input for r2rtf to create the final report.

tbl_base <- tbl %>%
  as.data.frame() %>%
  as_tibble() %>%
  mutate(across(
    everything(),
    ~ str_replace_all(.x, intToUtf8(160), " ")
  ))


names(tbl_base) <- str_replace_all(names(tbl_base), intToUtf8(160), " ")
tbl_base
#> # A tibble: 11 × 5
#>   ` `        Placebo      `Xanomeline High Dose` `Xanomeline Low Dose` Overall  
#>   <chr>      <chr>        <chr>                  <chr>                 <chr>    
#> 1 ""         "(N=86)"     "(N=84)"               "(N=84)"              "(N=254)"
#> 2 "SEX"      ""           ""                     ""                    ""       
#> 3 "  Female" "53 (61.6%)" "40 (47.6%)"           "50 (59.5%)"          "143 (56…
#> 4 "  Male"   "33 (38.4%)" "44 (52.4%)"           "34 (40.5%)"          "111 (43…
#> # ℹ 7 more rows

We define the format of the output. We highlight items that are not discussed in previous discussion.

text_indent_first and text_indent_left are used to control the indent space of text. They are helpful when you need to control the white space of a long phrase, “AMERICAN INDIAN OR ALASKA NATIVE” in the table provides an example.

colheader1 <- paste(names(tbl_base), collapse = "|")
colheader2 <- paste(tbl_base[1, ], collapse = "|")
rel_width <- c(2.5, rep(1, 4))

tbl_base[-1, ] %>%
  rtf_title(
    "Baseline Characteristics of Participants",
    "(All Participants Randomized)"
  ) %>%
  rtf_colheader(colheader1,
    col_rel_width = rel_width
  ) %>%
  rtf_colheader(colheader2,
    border_top = "",
    col_rel_width = rel_width
  ) %>%
  rtf_body(
    col_rel_width = rel_width,
    text_justification = c("l", rep("c", 4)),
    text_indent_first = -240,
    text_indent_left = 180
  ) %>%
  rtf_encode() %>%
  write_rtf("tlf/tlf_base.rtf")

In conclusion, the procedure to generate demographic and baseline characteristics table is summarized as follows:

Step 1: Read the data set.
Step 2: Use table1::table1() to get the baseline characteristics table.
Step 3: Transfer the output from Step 2 into a data frame that only contains ASCII characters.
Step 4: Define the format of the RTF table by using the R package r2rtf.