Skip to content

Commit

Permalink
update and clean up
Browse files Browse the repository at this point in the history
  • Loading branch information
fsolt committed May 20, 2021
1 parent d884b45 commit c942abf
Show file tree
Hide file tree
Showing 11 changed files with 23 additions and 34 deletions.
44 changes: 23 additions & 21 deletions vignette/R_swiid.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,32 +40,32 @@ header-includes:

```{r 01-preamble, include=FALSE}
library(knitr)
opts_chunk$set(concordance=TRUE, cache=TRUE, warning=FALSE, message=FALSE)
opts_chunk$set(concordance=TRUE, warning=FALSE, message=FALSE)
```

The Standardized World Income Inequality Database (SWIID) takes a Bayesian approach to standardizing observations collected from the [OECD Income Distribution Database](http://www.oecd.org/social/inequality.htm), the [Socio-Economic Database for Latin America and the Caribbean generated by CEDLAS and the World Bank](http://sedlac.econo.unlp.edu.ar/eng/), [Eurostat](http://epp.eurostat.ec.europa.eu), the [World Bank's PovcalNet](http://iresearch.worldbank.org/PovcalNet/index.htm), the \href{http://interwp.cepal.org/sisgen/ConsultaIntegrada.asp?idIndicador=250\&idioma=e}{UN Economic Commission for Latin America and the Caribbean}, national statistical offices around the world, and many other sources. [Luxembourg Income Study](http://www.lisdatacenter.org) data serves as the standard.

As described in @Solt2020, the SWIID maximizes the comparability of available income inequality data for the broadest possible sample of countries and years. But incomparability remains, and it is sometimes substantial. This remaining incomparability is reflected in the standard errors of the SWIID estimates, making it often crucial to take this uncertainty into account when making comparisons across countries or over time [@Solt2009, 238; @Solt2016_nv, 14; @Solt2020, 1196]. It was once the case that incorporating the standard errors into an analysis required considerable effort. It is now straightforward.

In version 9.0 of the SWIID, the inequality estimates and their associated uncertainty are represented by 100 draws from the posterior distribution: for any given observation, the differences across these imputations capture the uncertainty in the estimate. The `swiid9_0.zip` includes the file `swiid9_0.rda`, which is pre-formatted to facilitate taking this uncertainty into account. The following sections describe how to subset the data, merge in additional variables, and do analyses.
In version 9.1 of the SWIID, the inequality estimates and their associated uncertainty are represented by 100 draws from the posterior distribution: for any given observation, the differences across these imputations capture the uncertainty in the estimate. The `swiid9_1.zip` includes the file `swiid9_1.rda`, which is pre-formatted to facilitate taking this uncertainty into account. The following sections describe how to subset the data, merge in additional variables, and do analyses.

# Getting Started
Loading the file `swiid9_0.rda` adds `swiid` to the Global Environment, a list of 100 dataframes. Each of these dataframes includes, in addition to country and year identifiers, the following four variables:
Loading the file `swiid9_1.rda` adds `swiid`, a list of 100 dataframes, to the global environment. Each of these dataframes includes, in addition to country and year identifiers, the following four variables:
\begin{itemize}
\item \verb+gini_disp+: Estimate of Gini index of inequality in equivalized (square root scale) household disposable (post-tax, post-transfer) income, using \href{http://www.lisdatacenter.org}{Luxembourg Income Study} data as the standard.
\item \verb+gini_mkt+: Estimate of Gini index of inequality in equivalized (square root scale) household market (pre-tax, pre-transfer) income, using \href{http://www.lisdatacenter.org}{Luxembourg Income Study} data as the standard.
\item \verb+abs_red+: Estimated absolute redistribution, the number of Gini-index points market-income inequality is reduced due to taxes and transfers: the difference between the \verb+gini_mkt+ and \verb+gini_disp+.
\item \verb+rel_red+: Estimated relative redistribution, the percentage reduction in market-income inequality due to taxes and transfers: the difference between the \verb+gini_mkt+ and \verb+gini_disp+, divided by \verb+gini_mkt+, multiplied by 100.
\end{itemize}

The variation in these variables across the 100 dataframes captures the uncertainty in the SWIID estimates. As described below, this format facilitates taking this uncertainty into account when conducting analyses. It does not, however, lend itself easily to tasks such plotting. The mean-plus-standard-error summary format is much better suited to such purposes; for this reason, the object `swiid_summary` is also included in the `swiid9_0.rda`.
The variation in these variables across the 100 dataframes captures the uncertainty in the SWIID estimates. As described below, this format facilitates taking this uncertainty into account when conducting analyses. It does not, however, lend itself easily to tasks such plotting. The mean-plus-standard-error summary format is much better suited to such purposes; for this reason, the object `swiid_summary` is also included in the `swiid9_1.rda`.

```{r plot, fig.height=3}
library(tidyverse)
library(here)
# Load the SWIID
load("swiid9_0.rda")
load("swiid9_1.rda")
# Plot SWIID gini_disp estimates for the United States
swiid_summary %>%
Expand Down Expand Up @@ -103,33 +103,35 @@ swiid_lac <- swiid %>%
\noindent The result is a new list of 100 data frames, each containing the SWIID data only for the countries of Latin America and the Caribbean.

# Merging
Merging additional data into the SWIID also needs to be done carefully. Suppose we wanted to do a (simplified) replication of Solt, Habel, and Grant's (-@Solt2011a) analysis of [World Values Survey](http://worldvaluessurvey.org) data on religiosity. As our measure of religiosity, we will use the WVS item on respondents' self-report of the importance of God to their lives, which is measured on a ten-point scale. Given secularization theory, we will need to control for GDP per capita, which we will calculate from information from the [Penn World Tables](www.ggdc.net/pwt) [@Feenstra2015]. Below we first load the PWT dataset and use it to generate a dataset of GDP per capita (in thousands of dollars). Then we load the WVS data, generate our variables of interest, and merge in our PWT data. Finally, we use `purrr::map()` to merge these data into each of the 100 SWIID dataframes.
Merging additional data into the SWIID also needs to be done carefully. Suppose we wanted to do a (simplified) replication of Solt, Habel, and Grant's -@Solt2011a analysis of [World Values Survey](http://worldvaluessurvey.org) data on religiosity. As our measure of religiosity, we will use the WVS item on respondents' self-report of the importance of God to their lives, which is measured on a ten-point scale. Given secularization theory, we will need to control for GDP per capita, which we will calculate from information from the [Penn World Tables](www.ggdc.net/pwt) [@Feenstra2015]. Below we first load the PWT dataset and use it to generate a dataset of GDP per capita (in thousands of dollars). Then we load the WVS data, generate our variables of interest, and merge in our PWT data. Finally, we use `purrr::map()` to merge these data into each of the 100 SWIID dataframes.

```{r 03.5-prep, include=FALSE, cache=FALSE}
```{r 03.5-prep, include=FALSE}
library(tidyverse)
library(countrycode)
```

```{r 04-merge, cache=FALSE}
```{r 04-merge}
library(haven)
# Get GDP per capita data from the Penn World
# Tables, Version 9.1 (Feenstra et al. 2015)
# download.file("https://www.rug.nl/ggdc/docs/pwt91.dta",
# "pwt91.dta")
# Tables, Version 10.0 (Feenstra et al. 2015)
# download.file("https://www.rug.nl/ggdc/docs/pwt100.dta",
# "pwt100.dta")
pwt91_gdppc <- read_dta("pwt91.dta") %>%
transmute(country = country,
pwt100_gdppc <- read_dta("pwt100.dta") %>%
transmute(country = countrycode(country,
origin = "country.name",
destination = "country.name"),
year = year,
gdppc = rgdpe/pop/1000) %>%
filter(!is.na(gdppc))
# Get World Values Survey 7-wave data (from
# http://www.worldvaluessurvey.org/WVSDocumentationWVL.jsp),
# generate variables of interest, and merge in the PWT data
wvs <- read_dta("WVS_Longitudinal_1981_2016_stata_v20180912.dta",
wvs <- read_dta("WVS_TimeSeries_stata_v1_6.dta",
encoding = "latin1") %>%
transmute(country = countrycode(S003,
transmute(country = countrycode(S003,
origin = "iso3n",
destination = "country.name"),
year = as.numeric(S020),
Expand All @@ -138,7 +140,7 @@ wvs <- read_dta("WVS_Longitudinal_1981_2016_stata_v20180912.dta",
educ = ifelse(X025>0, X025, NA),
age = ifelse(X003>0, X003, NA)) %>%
filter(complete.cases(.)) %>%
left_join(pwt91_gdppc, by = c("country", "year"))
left_join(pwt100_gdppc, by = c("country", "year"))
# Merge the WVS (and PWT) data into all 100 SWIID dataframes
wvs_swiid <- swiid %>%
Expand All @@ -148,10 +150,10 @@ wvs_swiid <- swiid %>%
# Analyzing
Once the desired subset has been extracted and any additional variables created or merged in, we may proceed to analysis. Again, we use `purrr::map()`, this time to estimate our model on each of the 100 different dataframes. Continuing with our example, we estimate a three-level linear mixed-effects model of individual responses nested in country-years nested in countries using `lme4::lmer()` [@Bates2015].

```{r 05-estimate, cache=FALSE}
```{r 05-estimate}
library(lme4)
# Estimate model on first five dataframes
# Estimate the model on all 100 dataframes
m1 <- wvs_swiid %>% map(~ lmer(religiosity ~ gini_disp + gdppc +
age + educ + male +
(1 | country/year),
Expand All @@ -160,7 +162,7 @@ m1 <- wvs_swiid %>% map(~ lmer(religiosity ~ gini_disp + gdppc +

We could now use Rubin's (\citeyear{Rubin1987}) rules to calculate the mean and standard error for each fixed-effect estimate across these 100 sets of results (these rules are implemented in the package `mitools` [@Lumley2014] and others), but instead we will use simulation. Using `sim` from the `arm` package [@Gelman2015], we generate the distribution of fixed-effects estimates in the results both for each dataframe and across the 100 dataframes. Then we will use these distributions to calculate estimates and standard errors for each fixed effect, putting them in a tidy data frame like that achieved by using `broom::tidy()` [@Robinson2016].

```{r 06-sim, cache=FALSE}
```{r 06-sim}
# Simulate distribution of estimates
m1_sims <- m1 %>%
map(. %>% arm::sim(n.sims = 100)) %>%
Expand All @@ -179,7 +181,7 @@ m1_tidy <- tibble(term = names(m1_sims),

Having results in tidy form makes it easy to work with them further. Here we use the `dotwhisker` package [@Solt2015c] to present a dot-and-whisker plot of the results. The dots represent the estimated change on the ten-point religiosity scale for a change of two standard deviations in each independent variable; the whiskers represent the 95\% confidence intervals of these estimates.

```{r 07-results, fig.height=3.5, cache=FALSE}
```{r 07-results, fig.height=3.5}
library(dotwhisker)
# Plot the results
Expand All @@ -205,5 +207,5 @@ m1_tidy %>%
Please cite to the SWIID by referring to its article of record and including the version number and date of release:

\begin{hangparas}{.25in}{1}
Solt, Frederick. 2020. "Measuring Income Inequality Across Countries and Over Time: The Standardized World Income Inequality Database." \emph{Social Science Quarterly}. SWIID Version 9.0, October 2020.
Solt, Frederick. 2020. ``Measuring Income Inequality Across Countries and Over Time: The Standardized World Income Inequality Database.'' \emph{Social Science Quarterly}. SWIID Version 9.1, May 2021.
\end{hangparas}
Binary file modified vignette/R_swiid.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
13 changes: 0 additions & 13 deletions vignette/R_swiid_cache/latex/__packages

This file was deleted.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified vignette/R_swiid_files/figure-latex/07-results-1.pdf
Binary file not shown.
Binary file modified vignette/R_swiid_files/figure-latex/plot-1.pdf
Binary file not shown.

0 comments on commit c942abf

Please sign in to comment.