Transform Structured Data Lists into a Tidy Data Frame for Plotting
structure_to_plot.Rd
This function takes a nested list of data (with up to three possible structures)
and transforms it into a tidy data.frame
suitable for visualization in LISSY.
It supports three structures, typically outputs by weighted summary functions in lissyrtools such as:
run_weighted_mean()
, run_weighted_percentiles()
, and run_weighted_count()
.
Arguments
- data_list
A named list containing the data to be transformed. The structure and naming conventions of this list determine how the data is processed.
1st structure: List with country names as keys and named vectors of year-values. Expected when
names(data_list)
are country codes matchingget_countries_lis()
orget_countries_lws()
.2nd structure: List with
ccyy
abbreviations as keys, and named numeric/integer vectors, where names represent categories. Typically output from functions grouped by a categorical variable, e.g."educ"
.3rd structure: List with
ccyy
abbreviations as keys, each containing sublists of named vectors. Represents more complex summaries with multiple grouping variables, percentiles, or shares.
Value
A tidy data.frame
with the following columns (depending on input structure):
cname
: Country name.year
: Year.dname
: Dataset country-year code in theccyy
form.category
: (2nd and 3rd structure) Grouping variable categories.share
/percentile
/by_var
: (3rd structure) Variable describing the vector names.value
: Numeric values from the input list.
Details
Depending on the summary statistics computed before, the function renames one of the columns in the 3rd structure based on pattern matching:
If the values contains "%" and "-", the column is renamed to
share
.If the values contains "%", renamed to
percentile
.Otherwise, renamed to
by_var
.
Examples
if (FALSE) { # \dontrun{
library(ggplot2)
library(lissyrtools)
library(RColorBrewer)
library(ggthemes)
library(purrr)
library(forcats)
data <- lissyrtools::lissyuse(data = c("es", "de"), vars = c("dhi", "educ", "pi11", "rural"), from = 2016)
# Example usage for 1st structure
weighted_means <- run_weighted_mean(data, "pi11")
df1 <- structure_to_plot(weighted_means)
# Example usage for 2nd structure
weighted_means_educ <- run_weighted_mean(data, "pi11", by = "educ")
df2 <- structure_to_plot(weighted_means_educ)
# Example usage for 3rd structure
weighted_percentiles <- run_weighted_percentiles(data, "pi11", by = "educ")
df3 <- structure_to_plot(weighted_percentiles)
# Example usage for 3rd structure but with shares
weighted_percentiles <- run_weighted_percentiles(data, "pi11", by = "educ", share = TRUE)
df3 <- structure_to_plot(weighted_percentiles)
# Chart example: Plotting education group shares over years by country
run_weighted_count(data, "educ", percent = TRUE, na.rm = TRUE) %>%
structure_to_plot() %>%
ggplot(aes(x = year, y = value, color = cname, group = interaction(cname, category))) +
geom_line(linewidth = 1.2) +
geom_point(size = 0.6) +
scale_color_stata() +
labs(
x = "Year",
y = "Share of Education Group (%)",
color = "Country"
) +
expand_limits(y = 0) +
facet_grid(~fct_relevel(category, 'low', 'medium', 'high')) +
theme_bw() +
theme(axis.text.x = element_text(angle = 25, hjust = 1))
# Another example: plotting weighted mean of 'dhi' over years by country
data %>%
map(~ .x %>% filter(relation == 1000) %>% mutate(new_wgt = nhhmem * hwgt)) %>%
apply_ppp_adjustment("dhi", "lis", "lisppp") %>%
run_weighted_mean("dhi", "new_wgt") %>%
structure_to_plot() %>%
ggplot(aes(x = year, y = value, color = cname, group = cname)) +
geom_point() +
geom_line() +
labs(
title = "dhi trend",
caption = "Source: Luxembourg Income Study"
) +
scale_color_stata() +
scale_y_continuous(labels = scales::comma) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 25, hjust = 1))
} # }