Create ADSL

Introduction

This guide will show you how four pharmaverse packages, along with some from tidyverse, can be used to create an ADaM such as ADSL end-to-end, using the pilot CDISC SDTM data as input.

The four packages used with a brief description of their purpose are as follows:

  • {metacore}: provides harmonized metadata/specifications object.
  • {metatools}: uses the provided metadata to build/enhance and check the dataset.
  • {admiral}: provides the ADaM derivations.
  • {xportr}: delivers the SAS transport file (XPT) and eSub checks.

It is important to understand {metacore} objects by reading through the above linked package site, as these are fundamental to being able to use {metatools} and {xportr}. Each company may need to build a specification reader to create these objects from their source standard specification templates.

Load Data and Required pharmaverse Packages

The first step is to load our pharmaverse packages and input data.

options(repos = c(
  pharmaverse = 'https://pharmaverse.r-universe.dev',
  CRAN = 'https://cloud.r-project.org'))

library(metacore)
library(metatools)
library(admiral.test)
library(admiral)
library(xportr)
library(dplyr)
library(tidyr)
library(lubridate)
library(stringr)

# Read in input SDTM data 
data("admiral_dm")
data("admiral_ex")

Next we need to load the specification file in the form of a {metacore} object.

# Read in metacore object 
load(metacore_example("pilot_ADaM.rda"))
metacore <- metacore %>% 
   select_dataset("ADSL")

Here is an example of how a {metacore} object looks showing variable level metadata:

metacore$ds_vars
## # A tibble: 49 × 7
##    dataset variable key_seq order keep  core  supp_flag
##    <chr>   <chr>      <int> <int> <lgl> <chr> <lgl>    
##  1 ADSL    STUDYID       NA     1 FALSE <NA>  NA       
##  2 ADSL    USUBJID        1     2 FALSE <NA>  NA       
##  3 ADSL    SUBJID        NA     3 FALSE <NA>  NA       
##  4 ADSL    SITEID        NA     4 FALSE <NA>  NA       
##  5 ADSL    SITEGR1       NA     5 FALSE <NA>  NA       
##  6 ADSL    ARM           NA     6 FALSE <NA>  NA       
##  7 ADSL    TRT01P        NA     7 FALSE <NA>  NA       
##  8 ADSL    TRT01PN       NA     8 FALSE <NA>  NA       
##  9 ADSL    TRT01A        NA     9 FALSE <NA>  NA       
## 10 ADSL    TRT01AN       NA    10 FALSE <NA>  NA       
## # … with 39 more rows

Start Building Derivations

The first derivation step we are going to do is to pull through all the columns that come directly from the SDTM datasets. You might know which datasets you are going to pull from directly already, but if you don’t you can call metatools::build_from_derived() with just an empty list and the error will tell you which datasets you need to supply.

build_from_derived(metacore, list(), predecessor_only = FALSE)
## Error in build_from_derived(metacore, list(), predecessor_only = FALSE): Not all datasets provided. Please pass the following dataset(s):
## DM

In this case all the columns come from DM so that is the only dataset we will pass into metatools::build_from_derived(). The resulting dataset has all the columns combined and any columns that needed renaming between SDTM and ADaM are renamed.

adsl_preds <- build_from_derived(metacore, 
                                 ds_list = list("dm" = admiral_dm), 
                                 predecessor_only = FALSE, keep = TRUE)
head(adsl_preds, n=10)
## # A tibble: 10 × 14
##    STUDYID      USUBJID SUBJID SITEID ARM     AGE AGEU  RACE  SEX   ETHNIC DTHFL
##    <chr>        <chr>   <chr>  <chr>  <chr> <dbl> <chr> <chr> <chr> <chr>  <chr>
##  1 CDISCPILOT01 01-701… 1015   701    Plac…    63 YEARS WHITE F     HISPA… ""   
##  2 CDISCPILOT01 01-701… 1023   701    Plac…    64 YEARS WHITE M     HISPA… ""   
##  3 CDISCPILOT01 01-701… 1028   701    Xano…    71 YEARS WHITE M     NOT H… ""   
##  4 CDISCPILOT01 01-701… 1033   701    Xano…    74 YEARS WHITE M     NOT H… ""   
##  5 CDISCPILOT01 01-701… 1034   701    Xano…    77 YEARS WHITE F     NOT H… ""   
##  6 CDISCPILOT01 01-701… 1047   701    Plac…    85 YEARS WHITE F     NOT H… ""   
##  7 CDISCPILOT01 01-701… 1057   701    Scre…    59 YEARS WHITE F     HISPA… ""   
##  8 CDISCPILOT01 01-701… 1097   701    Xano…    68 YEARS WHITE M     NOT H… ""   
##  9 CDISCPILOT01 01-701… 1111   701    Xano…    81 YEARS WHITE F     NOT H… ""   
## 10 CDISCPILOT01 01-701… 1115   701    Xano…    84 YEARS WHITE M     NOT H… ""   
## # … with 3 more variables: RFSTDTC <chr>, RFENDTC <chr>, TRT01P <chr>

Now we have the base dataset, we can start to create some variables. We can start with creating the subgroups using the controlled terminology, in this case AGEGR1. The metacore object holds all the metadata needed to make ADSL. Part of that metadata is the controlled terminology, which can help automate the creation of subgroups. We can look into the {metacore} object and see the controlled terminology for AGEGR1.

get_control_term(metacore, variable = AGEGR1)
## # A tibble: 3 × 2
##   code  decode
##   <chr> <chr> 
## 1 <65   <65   
## 2 65-80 65-80 
## 3 >80   >80

Because this controlled terminology is written in a fairly standard format we can automate the creation of AGEGR1. The function metatools::create_cat_var() takes in a {metacore} object, a reference variable - in this case AGE because that is the continuous variable AGEGR1 is created from, and the name of the sub-grouped variable. It will take the controlled terminology from the sub-grouped variable and group the reference variables accordingly.

Using a similar philosophy we can create the numeric version of RACE using the controlled terminology stored in the {metacore} object with the metatools::create_var_from_codelist() function.

adsl_ct <- adsl_preds %>% 
   create_cat_var(metacore, ref_var = AGE, 
                  grp_var = AGEGR1, num_grp_var = AGEGR1N) %>% 
   create_var_from_codelist(metacore = metacore, 
                            input_var = RACE, 
                            out_var = RACEN) %>% 
   #Removing screen failures from ARM and TRT01P to match the define and FDA guidence
   mutate(ARM = if_else(ARM == "Screen Failure", NA_character_, ARM),
          TRT01P = if_else(TRT01P == "Screen Failure", NA_character_, TRT01P)
   )

head(adsl_ct, n=10)
## # A tibble: 10 × 17
##    STUDYID      USUBJID SUBJID SITEID ARM     AGE AGEU  RACE  SEX   ETHNIC DTHFL
##    <chr>        <chr>   <chr>  <chr>  <chr> <dbl> <chr> <chr> <chr> <chr>  <chr>
##  1 CDISCPILOT01 01-701… 1015   701    Plac…    63 YEARS WHITE F     HISPA… ""   
##  2 CDISCPILOT01 01-701… 1023   701    Plac…    64 YEARS WHITE M     HISPA… ""   
##  3 CDISCPILOT01 01-701… 1028   701    Xano…    71 YEARS WHITE M     NOT H… ""   
##  4 CDISCPILOT01 01-701… 1033   701    Xano…    74 YEARS WHITE M     NOT H… ""   
##  5 CDISCPILOT01 01-701… 1034   701    Xano…    77 YEARS WHITE F     NOT H… ""   
##  6 CDISCPILOT01 01-701… 1047   701    Plac…    85 YEARS WHITE F     NOT H… ""   
##  7 CDISCPILOT01 01-701… 1057   701    <NA>     59 YEARS WHITE F     HISPA… ""   
##  8 CDISCPILOT01 01-701… 1097   701    Xano…    68 YEARS WHITE M     NOT H… ""   
##  9 CDISCPILOT01 01-701… 1111   701    Xano…    81 YEARS WHITE F     NOT H… ""   
## 10 CDISCPILOT01 01-701… 1115   701    Xano…    84 YEARS WHITE M     NOT H… ""   
## # … with 6 more variables: RFSTDTC <chr>, RFENDTC <chr>, TRT01P <chr>,
## #   AGEGR1 <chr>, AGEGR1N <dbl>, RACEN <dbl>

Now we have sorted out what we can easily do with controlled terminology it is time to start deriving some variables. Here you could refer directly to using the {admiral} template and vignette in practice, but for the purpose of this end-to-end ADaM vignette we will share a few exposure derivations from there. We derive the start and end of treatment, the treatment duration, and the safety population flag.

adsl_raw <- adsl_ct %>%
  derive_vars_merged_dtm(
    dataset_add = admiral_ex,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & nchar(EXSTDTC) >= 10,
    new_vars_prefix = "TRTS",
    dtc = EXSTDTC,
    order = vars(TRTSDTM, EXSEQ),
    mode = "first",
    by_vars = vars(STUDYID, USUBJID)
  ) %>%
  derive_vars_merged_dtm(
    dataset_add = admiral_ex,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & nchar(EXENDTC) >= 10,
    new_vars_prefix = "TRTE",
    dtc = EXENDTC,
    time_imputation = "last",
    order = vars(TRTEDTM, EXSEQ),
    mode = "last",
    by_vars = vars(STUDYID, USUBJID)
  ) %>%
   derive_vars_dtm_to_dt(source_vars = vars(TRTSDTM, TRTEDTM)) %>%  #Convert Datetime variables to date 
   derive_var_trtdurd() %>% 
   derive_var_merged_exist_flag(
     dataset_add = admiral_ex,
     by_vars = vars(STUDYID, USUBJID),
     new_var = SAFFL,
     condition = (EXDOSE > 0 | (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO")))
   ) %>% 
   drop_unspec_vars(metacore) #This will drop any columns that aren't specificed in the metacore object

head(adsl_raw, n=10)
## # A tibble: 10 × 21
##    STUDYID      USUBJID SUBJID SITEID ARM     AGE AGEU  RACE  SEX   ETHNIC DTHFL
##    <chr>        <chr>   <chr>  <chr>  <chr> <dbl> <chr> <chr> <chr> <chr>  <chr>
##  1 CDISCPILOT01 01-701… 1015   701    Plac…    63 YEARS WHITE F     HISPA… ""   
##  2 CDISCPILOT01 01-701… 1023   701    Plac…    64 YEARS WHITE M     HISPA… ""   
##  3 CDISCPILOT01 01-701… 1028   701    Xano…    71 YEARS WHITE M     NOT H… ""   
##  4 CDISCPILOT01 01-701… 1033   701    Xano…    74 YEARS WHITE M     NOT H… ""   
##  5 CDISCPILOT01 01-701… 1034   701    Xano…    77 YEARS WHITE F     NOT H… ""   
##  6 CDISCPILOT01 01-701… 1047   701    Plac…    85 YEARS WHITE F     NOT H… ""   
##  7 CDISCPILOT01 01-701… 1057   701    <NA>     59 YEARS WHITE F     HISPA… ""   
##  8 CDISCPILOT01 01-701… 1097   701    Xano…    68 YEARS WHITE M     NOT H… ""   
##  9 CDISCPILOT01 01-701… 1111   701    Xano…    81 YEARS WHITE F     NOT H… ""   
## 10 CDISCPILOT01 01-701… 1115   701    Xano…    84 YEARS WHITE M     NOT H… ""   
## # … with 10 more variables: RFSTDTC <chr>, RFENDTC <chr>, TRT01P <chr>,
## #   AGEGR1 <chr>, AGEGR1N <dbl>, RACEN <dbl>, TRTSDT <date>, TRTEDT <date>,
## #   TRTDURD <dbl>, SAFFL <chr>

Apply Metadata to Create an eSub XPT and Perform Associated Checks

Now we have all the variables defined we can run some checks before applying the necessary formatting. The top four functions performing checks and sorting/ordering come from {metatools}, whereas the others focused around applying attributes to prepare for XPT come from {xportr}. At the end you could add a call to xportr::xportr_write() to produce the XPT file.

adsl_raw %>% 
   check_variables(metacore) %>% # Check all variables specified are present and no more
   check_ct_data(metacore, na_acceptable = TRUE) %>% # Checks all variables with CT only contain values within the CT
   order_cols(metacore) %>% # Orders the columns according to the spec
   sort_by_key(metacore) %>% # Sorts the rows by the sort keys 
   xportr_type(metacore) %>% # Coerce variable type to match spec
   xportr_length(metacore) %>% # Assigns SAS length from a variable level metadata 
   xportr_label(metacore) %>% # Assigns variable label from metacore specifications 
   xportr_df_label(metacore) # Assigns dataset label from metacore specifications
## # A tibble: 306 × 49
##    STUDYID USUBJID SUBJID SITEID SITEGR1   ARM TRT01P TRT01PN TRT01A TRT01AN
##      <dbl>   <dbl>  <dbl>  <dbl>   <dbl> <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
##  1      NA      NA   1015    701      NA    NA     NA      NA     NA      NA
##  2      NA      NA   1023    701      NA    NA     NA      NA     NA      NA
##  3      NA      NA   1028    701      NA    NA     NA      NA     NA      NA
##  4      NA      NA   1033    701      NA    NA     NA      NA     NA      NA
##  5      NA      NA   1034    701      NA    NA     NA      NA     NA      NA
##  6      NA      NA   1047    701      NA    NA     NA      NA     NA      NA
##  7      NA      NA   1057    701      NA    NA     NA      NA     NA      NA
##  8      NA      NA   1097    701      NA    NA     NA      NA     NA      NA
##  9      NA      NA   1111    701      NA    NA     NA      NA     NA      NA
## 10      NA      NA   1115    701      NA    NA     NA      NA     NA      NA
## # … with 296 more rows, and 39 more variables: TRTSDT <dbl>, TRTEDT <dbl>,
## #   TRTDURD <dbl>, AVGDD <dbl>, CUMDOSE <dbl>, AGE <dbl>, AGEGR1 <dbl>,
## #   AGEGR1N <dbl>, AGEU <dbl>, RACE <dbl>, RACEN <dbl>, SEX <dbl>,
## #   ETHNIC <dbl>, SAFFL <dbl>, ITTFL <dbl>, EFFFL <dbl>, COMP8FL <dbl>,
## #   COMP16FL <dbl>, COMP24FL <dbl>, DISCONFL <dbl>, DSRAEFL <dbl>, DTHFL <dbl>,
## #   BMIBL <dbl>, BMIBLGR1 <dbl>, HEIGHTBL <dbl>, WEIGHTBL <dbl>, EDUCLVL <dbl>,
## #   DISONSDT <dbl>, DURDIS <dbl>, DURDSGR1 <dbl>, VISIT1DT <dbl>, …