update and clean up

fsolt · May 20, 2021 · c942abf · c942abf
1 parent d884b45
commit c942abf
Show file tree

Hide file tree

Showing 11 changed files with 23 additions and 34 deletions.
diff --git a/vignette/R_swiid.Rmd b/vignette/R_swiid.Rmd
@@ -40,32 +40,32 @@ header-includes:
 
 ```{r 01-preamble, include=FALSE}
 library(knitr)
-opts_chunk$set(concordance=TRUE, cache=TRUE, warning=FALSE, message=FALSE)
+opts_chunk$set(concordance=TRUE, warning=FALSE, message=FALSE)
 ```
 
 The Standardized World Income Inequality Database (SWIID) takes a Bayesian approach to standardizing observations collected from the [OECD Income Distribution Database](http://www.oecd.org/social/inequality.htm), the [Socio-Economic Database for Latin America and the Caribbean generated by CEDLAS and the World Bank](http://sedlac.econo.unlp.edu.ar/eng/), [Eurostat](http://epp.eurostat.ec.europa.eu), the [World Bank's PovcalNet](http://iresearch.worldbank.org/PovcalNet/index.htm), the \href{http://interwp.cepal.org/sisgen/ConsultaIntegrada.asp?idIndicador=250\&idioma=e}{UN Economic Commission for Latin America and the Caribbean}, national statistical offices around the world, and many other sources.  [Luxembourg Income Study](http://www.lisdatacenter.org) data serves as the standard.  
 
 As described in @Solt2020, the SWIID maximizes the comparability of available income inequality data for the broadest possible sample of countries and years.  But incomparability remains, and it is sometimes substantial.  This remaining incomparability is reflected in the standard errors of the SWIID estimates, making it often crucial to take this uncertainty into account when making comparisons across countries or over time [@Solt2009, 238; @Solt2016_nv, 14; @Solt2020, 1196].  It was once the case that incorporating the standard errors into an analysis required considerable effort.  It is now straightforward.
 
-In version 9.0 of the SWIID, the inequality estimates and their associated uncertainty are represented by 100 draws from the posterior distribution: for any given observation, the differences across these imputations capture the uncertainty in the estimate.  The `swiid9_0.zip` includes the file `swiid9_0.rda`, which is pre-formatted to facilitate taking this uncertainty into account.  The following sections describe how to subset the data, merge in additional variables, and do analyses.
+In version 9.1 of the SWIID, the inequality estimates and their associated uncertainty are represented by 100 draws from the posterior distribution: for any given observation, the differences across these imputations capture the uncertainty in the estimate.  The `swiid9_1.zip` includes the file `swiid9_1.rda`, which is pre-formatted to facilitate taking this uncertainty into account.  The following sections describe how to subset the data, merge in additional variables, and do analyses.
 
 # Getting Started
-Loading the file `swiid9_0.rda` adds `swiid` to the Global Environment, a list of 100 dataframes.  Each of these dataframes includes, in addition to country and year identifiers, the following four variables:
+Loading the file `swiid9_1.rda` adds `swiid`, a list of 100 dataframes, to the global environment.  Each of these dataframes includes, in addition to country and year identifiers, the following four variables:
 \begin{itemize}
 	\item \verb+gini_disp+: Estimate of Gini index of inequality in equivalized (square root scale) household disposable (post-tax, post-transfer) income, using \href{http://www.lisdatacenter.org}{Luxembourg Income Study} data as the standard.
 	\item \verb+gini_mkt+: Estimate of Gini index of inequality in equivalized (square root scale) household market (pre-tax, pre-transfer) income, using \href{http://www.lisdatacenter.org}{Luxembourg Income Study} data as the standard.	
 	\item \verb+abs_red+: Estimated absolute redistribution, the number of Gini-index points market-income inequality is reduced due to taxes and transfers: the difference between the \verb+gini_mkt+ and \verb+gini_disp+.
 	\item \verb+rel_red+: Estimated relative redistribution, the percentage reduction in market-income inequality due to taxes and transfers: the difference between the \verb+gini_mkt+ and \verb+gini_disp+, divided by \verb+gini_mkt+, multiplied by 100.
 \end{itemize}
 
-The variation in these variables across the 100 dataframes captures the uncertainty in the SWIID estimates.  As described below, this format facilitates taking this uncertainty into account when conducting analyses.  It does not, however, lend itself easily to tasks such plotting.  The mean-plus-standard-error summary format is much better suited to such purposes; for this reason, the object `swiid_summary` is also included in the `swiid9_0.rda`.
+The variation in these variables across the 100 dataframes captures the uncertainty in the SWIID estimates.  As described below, this format facilitates taking this uncertainty into account when conducting analyses.  It does not, however, lend itself easily to tasks such plotting.  The mean-plus-standard-error summary format is much better suited to such purposes; for this reason, the object `swiid_summary` is also included in the `swiid9_1.rda`.
 
 ```{r plot, fig.height=3}
 library(tidyverse)
 library(here)
 
 # Load the SWIID
-load("swiid9_0.rda")
+load("swiid9_1.rda")
 
 # Plot SWIID gini_disp estimates for the United States
 swiid_summary %>% 
@@ -103,33 +103,35 @@ swiid_lac <- swiid %>%
 \noindent The result is a new list of 100 data frames, each containing the SWIID data only for the countries of Latin America and the Caribbean.
 
 # Merging
-Merging additional data into the SWIID also needs to be done carefully.  Suppose we wanted to do a (simplified) replication of Solt, Habel, and Grant's (-@Solt2011a) analysis of [World Values Survey](http://worldvaluessurvey.org) data on religiosity.  As our measure of religiosity, we will use the WVS item on respondents' self-report of the importance of God to their lives, which is measured on a ten-point scale.  Given secularization theory, we will need to control for GDP per capita, which we will calculate from information from the [Penn World Tables](www.ggdc.net/pwt) [@Feenstra2015].  Below we first load the PWT dataset and use it to generate a dataset of GDP per capita (in thousands of dollars).  Then we load the WVS data, generate our variables of interest, and merge in our PWT data.  Finally, we use `purrr::map()` to merge these data into each of the 100 SWIID dataframes.
+Merging additional data into the SWIID also needs to be done carefully.  Suppose we wanted to do a (simplified) replication of Solt, Habel, and Grant's -@Solt2011a analysis of [World Values Survey](http://worldvaluessurvey.org) data on religiosity.  As our measure of religiosity, we will use the WVS item on respondents' self-report of the importance of God to their lives, which is measured on a ten-point scale.  Given secularization theory, we will need to control for GDP per capita, which we will calculate from information from the [Penn World Tables](www.ggdc.net/pwt) [@Feenstra2015].  Below we first load the PWT dataset and use it to generate a dataset of GDP per capita (in thousands of dollars).  Then we load the WVS data, generate our variables of interest, and merge in our PWT data.  Finally, we use `purrr::map()` to merge these data into each of the 100 SWIID dataframes.
 
-```{r 03.5-prep, include=FALSE, cache=FALSE}
+```{r 03.5-prep, include=FALSE}
 library(tidyverse)
 library(countrycode)
 ```
 
-```{r 04-merge, cache=FALSE}
+```{r 04-merge}
 library(haven)
 
 # Get GDP per capita data from the Penn World 
-# Tables, Version 9.1 (Feenstra et al. 2015)
-# download.file("https://www.rug.nl/ggdc/docs/pwt91.dta",
-#               "pwt91.dta")
+# Tables, Version 10.0 (Feenstra et al. 2015)
+# download.file("https://www.rug.nl/ggdc/docs/pwt100.dta",
+#               "pwt100.dta")
 
-pwt91_gdppc <- read_dta("pwt91.dta") %>% 
-    transmute(country = country,
+pwt100_gdppc <- read_dta("pwt100.dta") %>% 
+    transmute(country = countrycode(country,
+                                    origin = "country.name",
+                                    destination = "country.name"),
               year = year,
               gdppc = rgdpe/pop/1000) %>% 
     filter(!is.na(gdppc)) 
 
 # Get World Values Survey 7-wave data (from
 # http://www.worldvaluessurvey.org/WVSDocumentationWVL.jsp),
 # generate variables of interest, and merge in the PWT data
-wvs <- read_dta("WVS_Longitudinal_1981_2016_stata_v20180912.dta",
+wvs <- read_dta("WVS_TimeSeries_stata_v1_6.dta",
                 encoding = "latin1") %>% 
-    transmute(country = countrycode(S003, 
+    transmute(country = countrycode(S003,
                                     origin = "iso3n",
                                     destination = "country.name"),
               year = as.numeric(S020),
@@ -138,7 +140,7 @@ wvs <- read_dta("WVS_Longitudinal_1981_2016_stata_v20180912.dta",
               educ = ifelse(X025>0, X025, NA),
               age = ifelse(X003>0, X003, NA)) %>% 
     filter(complete.cases(.)) %>% 
-    left_join(pwt91_gdppc, by = c("country", "year"))
+    left_join(pwt100_gdppc, by = c("country", "year"))
 
 # Merge the WVS (and PWT) data into all 100 SWIID dataframes
 wvs_swiid <- swiid %>%
@@ -148,10 +150,10 @@ wvs_swiid <- swiid %>%
 # Analyzing
 Once the desired subset has been extracted and any additional variables created or merged in, we may proceed to analysis.  Again, we use `purrr::map()`, this time to estimate our model on each of the 100 different dataframes. Continuing with our example, we estimate a three-level linear mixed-effects model of individual responses nested in country-years nested in countries using `lme4::lmer()` [@Bates2015].
 
-```{r 05-estimate, cache=FALSE}
+```{r 05-estimate}
 library(lme4)
 
-# Estimate model on first five dataframes
+# Estimate the model on all 100 dataframes
 m1 <- wvs_swiid %>% map(~ lmer(religiosity ~ gini_disp + gdppc +
                                  age + educ + male +
                                  (1 | country/year),
@@ -160,7 +162,7 @@ m1 <- wvs_swiid %>% map(~ lmer(religiosity ~ gini_disp + gdppc +
 
 We could now use Rubin's (\citeyear{Rubin1987}) rules to calculate the mean and standard error for each fixed-effect estimate across these 100 sets of results (these rules are implemented in the package `mitools` [@Lumley2014] and others), but instead we will use simulation.  Using `sim` from the `arm` package [@Gelman2015], we generate the distribution of fixed-effects estimates in the results both for each dataframe and across the 100 dataframes.  Then we will use these distributions to calculate estimates and standard errors for each fixed effect, putting them in a tidy data frame like that achieved by using `broom::tidy()` [@Robinson2016].
 
-```{r 06-sim, cache=FALSE}
+```{r 06-sim}
 # Simulate distribution of estimates
 m1_sims <- m1 %>% 
     map(. %>% arm::sim(n.sims = 100)) %>%
@@ -179,7 +181,7 @@ m1_tidy <- tibble(term = names(m1_sims),
 
 Having results in tidy form makes it easy to work with them further.  Here we use the `dotwhisker` package [@Solt2015c] to present a dot-and-whisker plot of the results.  The dots represent the estimated change on the ten-point religiosity scale for a change of two standard deviations in each independent variable; the whiskers represent the 95\% confidence intervals of these estimates.
 
-```{r 07-results, fig.height=3.5, cache=FALSE}
+```{r 07-results, fig.height=3.5}
 library(dotwhisker)
 
 # Plot the results                      
@@ -205,5 +207,5 @@ m1_tidy %>%
 Please cite to the SWIID by referring to its article of record and including the version number and date of release:
 
 \begin{hangparas}{.25in}{1}
-Solt, Frederick. 2020. "Measuring Income Inequality Across Countries and Over Time: The Standardized World Income Inequality Database." \emph{Social Science Quarterly}.  SWIID Version 9.0, October 2020.
+Solt, Frederick. 2020. ``Measuring Income Inequality Across Countries and Over Time: The Standardized World Income Inequality Database.'' \emph{Social Science Quarterly}.  SWIID Version 9.1, May 2021.
 \end{hangparas}
diff --git a/vignette/R_swiid.pdf b/vignette/R_swiid.pdf
diff --git a/vignette/R_swiid_cache/latex/3-subset_94b396c825679bff56cf4c56b6b88022.RData b/vignette/R_swiid_cache/latex/3-subset_94b396c825679bff56cf4c56b6b88022.RData
diff --git a/vignette/R_swiid_cache/latex/3-subset_94b396c825679bff56cf4c56b6b88022.rdb b/vignette/R_swiid_cache/latex/3-subset_94b396c825679bff56cf4c56b6b88022.rdb
diff --git a/vignette/R_swiid_cache/latex/3-subset_94b396c825679bff56cf4c56b6b88022.rdx b/vignette/R_swiid_cache/latex/3-subset_94b396c825679bff56cf4c56b6b88022.rdx
diff --git a/vignette/R_swiid_cache/latex/__packages b/vignette/R_swiid_cache/latex/__packages
diff --git a/vignette/R_swiid_cache/latex/plot_2a71ef39260d85d7188d02c39865286d.RData b/vignette/R_swiid_cache/latex/plot_2a71ef39260d85d7188d02c39865286d.RData
diff --git a/vignette/R_swiid_cache/latex/plot_2a71ef39260d85d7188d02c39865286d.rdb b/vignette/R_swiid_cache/latex/plot_2a71ef39260d85d7188d02c39865286d.rdb
diff --git a/vignette/R_swiid_cache/latex/plot_2a71ef39260d85d7188d02c39865286d.rdx b/vignette/R_swiid_cache/latex/plot_2a71ef39260d85d7188d02c39865286d.rdx
diff --git a/vignette/R_swiid_files/figure-latex/07-results-1.pdf b/vignette/R_swiid_files/figure-latex/07-results-1.pdf
diff --git a/vignette/R_swiid_files/figure-latex/plot-1.pdf b/vignette/R_swiid_files/figure-latex/plot-1.pdf