Compute (weighted) counts or percentages from a list of data frames
run_weighted_count.Rd
This function calculates (weighted) category counts or percentages for a given categorical variable across a list of data frames (e.g., by country or year). Optionally, results can be grouped by another categorical variable.
Usage
run_weighted_count(
data_list,
var_name,
wgt_name = NULL,
na.rm = FALSE,
by = NULL,
percent = FALSE
)
Arguments
- data_list
A named list of data frames, (e.g., across countries or years).
- var_name
A string specifying the name of the categorical variable for which counts or percentages are to be computed. This must be listed in
lissyrtools::lis_categorical_variables
orlissyrtools::lws_wealth_categorical_variables
.- wgt_name
(Optional) A string specifying the name of the weight variable to apply. If
NULL
, unweighted counts are used.- na.rm
Logical; if
TRUE
, observations with missing values invar_name
are removed before computing counts or percentages.- by
(Optional) A string naming a second categorical variable for disaggregation. Results will then be split by this variable. Must also be listed in the allowed categorical variables.
- percent
Logical; if
TRUE
, the function returns weighted (or unweighted) percentages. IfFALSE
, it returns simple category counts.
Value
A list of named vectors:
If
by
is not specified, returns a named vector of counts or percentages per dataset.If
by
is specified, returns a nested list, where the outer list is by dataset and the inner list is byby
category.
Examples
if (FALSE) { # \dontrun{
data <- lissyrtools::lissyuse(data = c("de", "es", "uk"), vars = c("dhi", "region_c", "area_c", "educ", "emp"), from = 2016)
run_weighted_count(
data[names(data)[stringr::str_sub(names(data),3,4) == "18"]],
var_name ="educ",
by = "emp",
percent = FALSE,
na.rm = TRUE
)
# Specify `percent` = TRUE, to output percentages, unweighted or weighted.
run_weighted_count(
data[names(data)[stringr::str_sub(names(data),3,4) == "18"]],
var_name ="region_c",
percent = TRUE,
na.rm = FALSE
)
# It is also possible to check the share of missings.
run_weighted_count(
data[names(data)[stringr::str_sub(names(data),3,4) == "18"]],
var_name ="region_c",
percent = TRUE,
na.rm = TRUE
)
# When `percent` = FALSE, and `wgt_name` is specified, it will be ignore and an unweighted count will be applied.
run_weighted_count(
data[names(data)[stringr::str_sub(names(data),3,4) == "18"]],
var_name ="region_c",
wgt_name = "hpopwgt",
percent = FALSE,
na.rm = TRUE
)
# Datasets where the variable in the `var_name` argument is only made of NA's will not be considered.
run_weighted_count(
data[names(data)[stringr::str_sub(names(data),3,4) == "18"]],
var_name ="area_c",
percent = FALSE,
na.rm = TRUE
)
# The same logic is applied with the `by` argument.
run_weighted_count(
data["uk15"],
"educ",
na.rm = TRUE,
by = "area_c"
)
} # }