Apply IQR-Based Top and Bottom Coding to LIS/LWS Variables
apply_iqr_top_bottom_coding.Rd
This function performs top and/or bottom coding on a specified variable across a list of LIS/LWS datasets.
It applies an interquartile range (IQR)-based rule on the log()
transformation of the variable.
Optionally, weights can be supplied, and the transformation can be one- or two-sided.
Usage
apply_iqr_top_bottom_coding(
data_list,
var_name,
wgt_name = NULL,
times = 3,
one_sided = NULL,
type = c("type_4", "type_2")
)
Arguments
- data_list
A named list of data frames, from LIS or LWS microdata.
- var_name
Character string. Name of the variable to code (e.g., "dhi").
- wgt_name
Optional character string. Name of the weight variable to use in computing weighted percentiles.
- times
Numeric. The IQR multiplier for determining bounds (default is 3).
- one_sided
Character. Set to
"top"
,"bottom"
, orNULL
for two-sided coding.- type
Character. Type of quantile estimator to use (default is
"type_4"
).
Value
A list of data frames with the same structure as data_list
, where var_name
has been adjusted by bounding extreme values according to an IQR rule.
Details
The function:
Transforms the variable to
log()
scale (logarithmic transformation).Replaces invalid log-values (e.g. from log(0) or negatives) with zero.
Computes the IQR (interquartile range) on the log-transformed variable using weighted percentiles.
Caps values beyond \([Q1 - times * IQR, Q3 + times * IQR]\) on the original scale using
exp()
.
Regarding LWS datasets:
Datasets with multiple imputations (via
inum
) are detected automatically.Top and bottom coding is applied within each imputation group in such datasets.
Datasets without
inum
, or with only a single imputation, are processed normally.
A warning is issued if the variable level (e.g. household vs individual) seems inconsistent with the dataset structure.
Examples
if (FALSE) { # \dontrun{
# Import data, ideally at the level of the variable of interest.
data_hhd <- lissyrtools::lissyuse("au", vars = c("dhi"), from = 2016)
# Default case, where top and bottom coding is performed simultaneously
data_hhd[1] %>%
purrr::map(~ .x[!is.na(.x$dhi), ]) %>%
purrr::map(~ .x %>% mutate(new_wgt = nhhmem * hwgt)) %>%
apply_iqr_top_bottom_coding("dhi", "hwgt", times = 3) %>%
run_weighted_mean("dhi", "new_wgt")
# Example with the use or arguments `one_sided` = {"top", "bottom"} and `type`
data_hhd[1] %>%
purrr::map(~ .x[!is.na(.x$dhi), ]) %>%
purrr::map(~ .x %>% mutate(new_wgt = nhhmem * hwgt)) %>%
apply_iqr_top_bottom_coding("dhi", "hwgt", one_sided = "top", type = "type_2") %>%
run_weighted_mean("dhi", "new_wgt")
# Load individual-level datasets by selecting individual-level variables, if the target variable is at the individual level (e.g., "pilabour")
data_ind <- lissyrtools::lissyuse("au", vars = c("pilabour", "emp"), from = 2016)
data_ind[1] %>%
purrr::map(~ .x[!is.na(.x$pilabour), ] %>% filter(emp ==1)) %>%
apply_iqr_top_bottom_coding("pilabour", "ppopwgt") %>%
run_weighted_percentiles("pilabour", "ppopwgt", probs = seq(0.1, 0.9,0.1))
} # }