This guide will show you how four pharmaverse packages, along with some from tidyverse, can be used to create an ADaM such as ADSL
end-to-end, using the pilot CDISC SDTM data as input.
The four packages used with a brief description of their purpose are as follows:
{metacore}
: provides harmonized metadata/specifications object.{metatools}
: uses the provided metadata to build/enhance and check the dataset.{admiral}
: provides the ADaM derivations.{xportr}
: delivers the SAS transport file (XPT) and eSub checks.It is important to understand {metacore}
objects by reading through the above linked package site, as these are fundamental to being able to use {metatools}
and {xportr}
. Each company may need to build a specification reader to create these objects from their source standard specification templates.
The first step is to load our pharmaverse packages and input data.
options(repos = c(
pharmaverse = 'https://pharmaverse.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))
library(metacore)
library(metatools)
library(admiral.test)
library(admiral)
library(xportr)
library(dplyr)
library(tidyr)
library(lubridate)
library(stringr)
# Read in input SDTM data
data("admiral_dm")
data("admiral_ex")
Next we need to load the specification file in the form of a {metacore}
object.
# Read in metacore object
load(metacore_example("pilot_ADaM.rda"))
metacore <- metacore %>%
select_dataset("ADSL")
Here is an example of how a {metacore}
object looks showing variable level metadata:
metacore$ds_vars
## # A tibble: 49 × 7
## dataset variable key_seq order keep core supp_flag
## <chr> <chr> <int> <int> <lgl> <chr> <lgl>
## 1 ADSL STUDYID NA 1 FALSE <NA> NA
## 2 ADSL USUBJID 1 2 FALSE <NA> NA
## 3 ADSL SUBJID NA 3 FALSE <NA> NA
## 4 ADSL SITEID NA 4 FALSE <NA> NA
## 5 ADSL SITEGR1 NA 5 FALSE <NA> NA
## 6 ADSL ARM NA 6 FALSE <NA> NA
## 7 ADSL TRT01P NA 7 FALSE <NA> NA
## 8 ADSL TRT01PN NA 8 FALSE <NA> NA
## 9 ADSL TRT01A NA 9 FALSE <NA> NA
## 10 ADSL TRT01AN NA 10 FALSE <NA> NA
## # … with 39 more rows
The first derivation step we are going to do is to pull through all the columns that come directly from the SDTM datasets. You might know which datasets you are going to pull from directly already, but if you don’t you can call metatools::build_from_derived()
with just an empty list and the error will tell you which datasets you need to supply.
build_from_derived(metacore, list(), predecessor_only = FALSE)
## Error in build_from_derived(metacore, list(), predecessor_only = FALSE): Not all datasets provided. Please pass the following dataset(s):
## DM
In this case all the columns come from DM
so that is the only dataset we will pass into metatools::build_from_derived()
. The resulting dataset has all the columns combined and any columns that needed renaming between SDTM and ADaM are renamed.
adsl_preds <- build_from_derived(metacore,
ds_list = list("dm" = admiral_dm),
predecessor_only = FALSE, keep = TRUE)
head(adsl_preds, n=10)
## # A tibble: 10 × 14
## STUDYID USUBJID SUBJID SITEID ARM AGE AGEU RACE SEX ETHNIC DTHFL
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 CDISCPILOT01 01-701… 1015 701 Plac… 63 YEARS WHITE F HISPA… ""
## 2 CDISCPILOT01 01-701… 1023 701 Plac… 64 YEARS WHITE M HISPA… ""
## 3 CDISCPILOT01 01-701… 1028 701 Xano… 71 YEARS WHITE M NOT H… ""
## 4 CDISCPILOT01 01-701… 1033 701 Xano… 74 YEARS WHITE M NOT H… ""
## 5 CDISCPILOT01 01-701… 1034 701 Xano… 77 YEARS WHITE F NOT H… ""
## 6 CDISCPILOT01 01-701… 1047 701 Plac… 85 YEARS WHITE F NOT H… ""
## 7 CDISCPILOT01 01-701… 1057 701 Scre… 59 YEARS WHITE F HISPA… ""
## 8 CDISCPILOT01 01-701… 1097 701 Xano… 68 YEARS WHITE M NOT H… ""
## 9 CDISCPILOT01 01-701… 1111 701 Xano… 81 YEARS WHITE F NOT H… ""
## 10 CDISCPILOT01 01-701… 1115 701 Xano… 84 YEARS WHITE M NOT H… ""
## # … with 3 more variables: RFSTDTC <chr>, RFENDTC <chr>, TRT01P <chr>
Now we have the base dataset, we can start to create some variables. We can start with creating the subgroups using the controlled terminology, in this case AGEGR1
. The metacore object holds all the metadata needed to make ADSL
. Part of that metadata is the controlled terminology, which can help automate the creation of subgroups. We can look into the {metacore}
object and see the controlled terminology for AGEGR1
.
get_control_term(metacore, variable = AGEGR1)
## # A tibble: 3 × 2
## code decode
## <chr> <chr>
## 1 <65 <65
## 2 65-80 65-80
## 3 >80 >80
Because this controlled terminology is written in a fairly standard format we can automate the creation of AGEGR1
. The function metatools::create_cat_var()
takes in a {metacore}
object, a reference variable - in this case AGE
because that is the continuous variable AGEGR1
is created from, and the name of the sub-grouped variable. It will take the controlled terminology from the sub-grouped variable and group the reference variables accordingly.
Using a similar philosophy we can create the numeric version of RACE
using the controlled terminology stored in the {metacore}
object with the metatools::create_var_from_codelist()
function.
adsl_ct <- adsl_preds %>%
create_cat_var(metacore, ref_var = AGE,
grp_var = AGEGR1, num_grp_var = AGEGR1N) %>%
create_var_from_codelist(metacore = metacore,
input_var = RACE,
out_var = RACEN) %>%
#Removing screen failures from ARM and TRT01P to match the define and FDA guidence
mutate(ARM = if_else(ARM == "Screen Failure", NA_character_, ARM),
TRT01P = if_else(TRT01P == "Screen Failure", NA_character_, TRT01P)
)
head(adsl_ct, n=10)
## # A tibble: 10 × 17
## STUDYID USUBJID SUBJID SITEID ARM AGE AGEU RACE SEX ETHNIC DTHFL
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 CDISCPILOT01 01-701… 1015 701 Plac… 63 YEARS WHITE F HISPA… ""
## 2 CDISCPILOT01 01-701… 1023 701 Plac… 64 YEARS WHITE M HISPA… ""
## 3 CDISCPILOT01 01-701… 1028 701 Xano… 71 YEARS WHITE M NOT H… ""
## 4 CDISCPILOT01 01-701… 1033 701 Xano… 74 YEARS WHITE M NOT H… ""
## 5 CDISCPILOT01 01-701… 1034 701 Xano… 77 YEARS WHITE F NOT H… ""
## 6 CDISCPILOT01 01-701… 1047 701 Plac… 85 YEARS WHITE F NOT H… ""
## 7 CDISCPILOT01 01-701… 1057 701 <NA> 59 YEARS WHITE F HISPA… ""
## 8 CDISCPILOT01 01-701… 1097 701 Xano… 68 YEARS WHITE M NOT H… ""
## 9 CDISCPILOT01 01-701… 1111 701 Xano… 81 YEARS WHITE F NOT H… ""
## 10 CDISCPILOT01 01-701… 1115 701 Xano… 84 YEARS WHITE M NOT H… ""
## # … with 6 more variables: RFSTDTC <chr>, RFENDTC <chr>, TRT01P <chr>,
## # AGEGR1 <chr>, AGEGR1N <dbl>, RACEN <dbl>
Now we have sorted out what we can easily do with controlled terminology it is time to start deriving some variables.
Here you could refer directly to using the {admiral}
template and vignette in practice, but for the purpose of this end-to-end ADaM vignette we will share a few exposure derivations from there.
We derive the start and end of treatment, the treatment duration, and the safety population flag.
adsl_raw <- adsl_ct %>%
derive_vars_merged_dtm(
dataset_add = admiral_ex,
filter_add = (EXDOSE > 0 |
(EXDOSE == 0 &
str_detect(EXTRT, "PLACEBO"))) & nchar(EXSTDTC) >= 10,
new_vars_prefix = "TRTS",
dtc = EXSTDTC,
order = vars(TRTSDTM, EXSEQ),
mode = "first",
by_vars = vars(STUDYID, USUBJID)
) %>%
derive_vars_merged_dtm(
dataset_add = admiral_ex,
filter_add = (EXDOSE > 0 |
(EXDOSE == 0 &
str_detect(EXTRT, "PLACEBO"))) & nchar(EXENDTC) >= 10,
new_vars_prefix = "TRTE",
dtc = EXENDTC,
time_imputation = "last",
order = vars(TRTEDTM, EXSEQ),
mode = "last",
by_vars = vars(STUDYID, USUBJID)
) %>%
derive_vars_dtm_to_dt(source_vars = vars(TRTSDTM, TRTEDTM)) %>% #Convert Datetime variables to date
derive_var_trtdurd() %>%
derive_var_merged_exist_flag(
dataset_add = admiral_ex,
by_vars = vars(STUDYID, USUBJID),
new_var = SAFFL,
condition = (EXDOSE > 0 | (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO")))
) %>%
drop_unspec_vars(metacore) #This will drop any columns that aren't specificed in the metacore object
head(adsl_raw, n=10)
## # A tibble: 10 × 21
## STUDYID USUBJID SUBJID SITEID ARM AGE AGEU RACE SEX ETHNIC DTHFL
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 CDISCPILOT01 01-701… 1015 701 Plac… 63 YEARS WHITE F HISPA… ""
## 2 CDISCPILOT01 01-701… 1023 701 Plac… 64 YEARS WHITE M HISPA… ""
## 3 CDISCPILOT01 01-701… 1028 701 Xano… 71 YEARS WHITE M NOT H… ""
## 4 CDISCPILOT01 01-701… 1033 701 Xano… 74 YEARS WHITE M NOT H… ""
## 5 CDISCPILOT01 01-701… 1034 701 Xano… 77 YEARS WHITE F NOT H… ""
## 6 CDISCPILOT01 01-701… 1047 701 Plac… 85 YEARS WHITE F NOT H… ""
## 7 CDISCPILOT01 01-701… 1057 701 <NA> 59 YEARS WHITE F HISPA… ""
## 8 CDISCPILOT01 01-701… 1097 701 Xano… 68 YEARS WHITE M NOT H… ""
## 9 CDISCPILOT01 01-701… 1111 701 Xano… 81 YEARS WHITE F NOT H… ""
## 10 CDISCPILOT01 01-701… 1115 701 Xano… 84 YEARS WHITE M NOT H… ""
## # … with 10 more variables: RFSTDTC <chr>, RFENDTC <chr>, TRT01P <chr>,
## # AGEGR1 <chr>, AGEGR1N <dbl>, RACEN <dbl>, TRTSDT <date>, TRTEDT <date>,
## # TRTDURD <dbl>, SAFFL <chr>
Now we have all the variables defined we can run some checks before applying the necessary formatting.
The top four functions performing checks and sorting/ordering come from {metatools}
, whereas the others focused around applying attributes to prepare for XPT come from {xportr}
. At the end you could add a call to xportr::xportr_write()
to produce the XPT file.
adsl_raw %>%
check_variables(metacore) %>% # Check all variables specified are present and no more
check_ct_data(metacore, na_acceptable = TRUE) %>% # Checks all variables with CT only contain values within the CT
order_cols(metacore) %>% # Orders the columns according to the spec
sort_by_key(metacore) %>% # Sorts the rows by the sort keys
xportr_type(metacore) %>% # Coerce variable type to match spec
xportr_length(metacore) %>% # Assigns SAS length from a variable level metadata
xportr_label(metacore) %>% # Assigns variable label from metacore specifications
xportr_df_label(metacore) # Assigns dataset label from metacore specifications
## # A tibble: 306 × 49
## STUDYID USUBJID SUBJID SITEID SITEGR1 ARM TRT01P TRT01PN TRT01A TRT01AN
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 NA NA 1015 701 NA NA NA NA NA NA
## 2 NA NA 1023 701 NA NA NA NA NA NA
## 3 NA NA 1028 701 NA NA NA NA NA NA
## 4 NA NA 1033 701 NA NA NA NA NA NA
## 5 NA NA 1034 701 NA NA NA NA NA NA
## 6 NA NA 1047 701 NA NA NA NA NA NA
## 7 NA NA 1057 701 NA NA NA NA NA NA
## 8 NA NA 1097 701 NA NA NA NA NA NA
## 9 NA NA 1111 701 NA NA NA NA NA NA
## 10 NA NA 1115 701 NA NA NA NA NA NA
## # … with 296 more rows, and 39 more variables: TRTSDT <dbl>, TRTEDT <dbl>,
## # TRTDURD <dbl>, AVGDD <dbl>, CUMDOSE <dbl>, AGE <dbl>, AGEGR1 <dbl>,
## # AGEGR1N <dbl>, AGEU <dbl>, RACE <dbl>, RACEN <dbl>, SEX <dbl>,
## # ETHNIC <dbl>, SAFFL <dbl>, ITTFL <dbl>, EFFFL <dbl>, COMP8FL <dbl>,
## # COMP16FL <dbl>, COMP24FL <dbl>, DISCONFL <dbl>, DSRAEFL <dbl>, DTHFL <dbl>,
## # BMIBL <dbl>, BMIBLGR1 <dbl>, HEIGHTBL <dbl>, WEIGHTBL <dbl>, EDUCLVL <dbl>,
## # DISONSDT <dbl>, DURDIS <dbl>, DURDSGR1 <dbl>, VISIT1DT <dbl>, …