Exploring Poverty with Principal Component Analysis

Project Summary

This Data Expedition seeks to introduce students to statistical analysis in the field of international development. Students construct a index of wealth/poverty based on asset holdings using four datasets collected under the umbrella of the Living Standards Measurement Survey project at the World Bank. We selected countries to represent different continents with comparable and recent survey data: Bulgaria (2007), Tajikistan (2009), Tanzania (2010-2011), and Panama (2008).

First, we construct an index of wealth based on household assets in the different countries using Principle Components Analysis. Once a poverty index is constructed, students seek to understand what the main drivers of wealth/poverty are in different countries. We include variables for health, education, age, relationship to the household head, and sex. Students then use regression analysis to identify the main drivers of poverty in different countries.

Themes and Categories
Year
2018

Graduate Students: Claire Le Barbenchon (claire.lebarbenchon@duke.edu) and Federico Ferrari (federico. ferrari@duke.edu)

Faculty: Mine Çetinkaya-Rundel

Course: STA 112FS: Data Science

Guiding Questions

This project seeks to help students answer some key questions in international development: How can we get measures of poverty that do not involve income? What are some of the demographic and human capital variables that can help explain wealth levels?

In developing countries, income may not be an ideal measure of well-being. In agricultural societies, income is tied to harvest, and thus is sensitive to the season in which the survey is administered. Furthermore, income often does not capture in-kind revenue, and is a particularly poor measure for households that practice subsistence agriculture. For this reason, development experts will often use indices of poverty based on household assets, to better understand well-being levels. Household asset ownership does not vary seasonally, and is not tied to payment-type, making it a more stable and often preferred measure of poverty.

Conceptual and Statistical Questions:

  • Why we may want to create a poverty or wealth index?
  • How to use Principal Component Analysis (PCA) to construct indices
  • How to interpret PCA results and compare across countries
  • Using regression to understand the factors that affect wealth across countries
  • Using and interpreting interaction terms

The plots below show the distribution of wealth in each of the four countries. They can be used to answer descriptive questions about inequality.

Figure 1: Distribution of first component
Figure 1: Distribution of first component
Figure 2: Boxplot of first component
Figure 2: Boxplot of first component

Data

All the data sets used in this project were collected under the umbrella of the Living Standards Measurement Survey project at the World Bank. This large initiative has collected survey data from developing countries since the 1980s (using comparable survey instruments) to better understand questions of health, education, poverty, employment and other indicators of well-being around the world.

While this data is public use, individuals must register and download the data at http://microdata.worldbank. org/index.php/catalog/central

We created a simplified data set for instructional purposes, randomly selecting 2,500 observations, in order to make sure that sample sizes were uniform across countries.

household_dummy.csv

Dimensions of the dataset are 2,500 observations per country, for 4 countries, and 20 variables including: an ID variable for the individual and the household; and a country variable; the following demographic variables: age, gender, marital status, relationship to household head; education; a health proxy variable (hospitalization in past 12 months); water access; and ten household assets (refrigerator, tv, bike, motorbike, computer, car, video, stereo, stove, sewing machine). Thus the full dimensions are 10,000x20, including the variable for the country.

Bibliography

The World Bank, Living Standards Measurement Study LSMS (2007). Bulgaria Multitopic Household Survey 2007 [BGR_2007_MTHS_v01_M]. Retrieved from http://microdata.worldbank.org/index.php/catalog/ 2273/study-description

The World Bank, Living Standards Measurement Study - Integrated Surveys on Agriculture (2010-2011). Tanzania - National Panel Survey 2010-2011, Wave 2 [TZA_2010_NPS-R2_v01_M]. Retrieved from http: //microdata.worldbank.org/index.php/catalog/1050

The World Bank, Living Standards Measurement Study LSMS (2008). Panama - Encuesta de Niveles de Vida 2008 [PAN_2008_ENV_v01_M]. Retrieved from http://microdata.worldbank.org/index.php/catalog/70

Tajikistan Statistical Agency, Living Standards Measurement Study LSMS (2009). Tajikistan - Living Standards Survey 2009 [TJK_2009_TLSS_v01_M]. Retrieved from http://microdata.worldbank.org/index. php/catalog/73%5Bc1%5D

Related Projects

In this two-day, virtual data expedition project, students were introduced to the APIM in the context of stress proliferation, linked lives, the spousal relationship, and mental and physical health outcomes.

Stress proliferation is a concept within the stress process paradigm that explains how one person’s stressors can influence others (Thoits 2010). Combining this with the life course principle of linked lives explains that because people are embedded in social networks, stress not only can impact the individual but can also proliferate to people close to them (Elder Jr, Shanahan and Jennings 2015). For example, one spouse’s chronic health condition may lead to stress-provoking strain in the marital relationship, eventually spilling over to affect the other spouse’s mental health. Additionally, because partners share an environment, experiences, and resources (e.g., money and information), as well as exert social control over each other, they can monitor and influence each other’s health and health behaviors. This often leads to health concordance within couples; in other words, because individuals within the couple influence each other’s health and well-being, their health tends to become more similar or more alike (Kiecolt-Glaser and Wilson 2017, Polenick, Renn and Birditt 2018). Thus, a spouse’s current health condition may influence their partner’s future health and spouses may contemporaneously exhibit similar health conditions or behaviors.

However, how spouses influence each other may be patterned by the gender of the spouse with the health condition or exhibiting the health behaviors. Recent evidence suggests that a wife’s health condition may have little influence on her husband’s future health conditions, but that a husband’s health condition will most likely influence his wife’s future health (Kiecolt-Glaser and Wilson 2017).

Nationally there is a disproportionate number of children of color (African American & Latino) in the child welfare system. Durham County is no different. However, reviewing this problem through the lens of data has not been done to formulate or implement possible solutions. Durham County Department of Social Services Child & Family Services would like to evaluate systems to identify where and how disproportionality and disparity are occurring. It is occurring at the entry point of Reporting child abuse and neglect? Is it occurring at the case decision? Is our reunification time different for African American children? Or Does it take longer for a child of color to achieve permanence through adoption? Organizing the data to show us our “hot spots” would facilitate further discussion and focus on solutions to an age-old systemic problem.

Faculty Lead: Greg Herschlag

Project Lead: Jovetta L Whitfield

A team of students led by researchers at Duke University and UC Davis will visualize data on child and family health from Yolo County, California. Data varies from single words or numbers per variable (e.g. gender, age) to more complex (e.g. crime and violence, social integration, housing/homeless impact). The visualization dashboard will be used by academic researchers and community service providers in addition to Yolo County community members. The overall goal of the research is to reduce health disparities through strengthening academic-community partnerships.

Project Lead: Leigh Ann Simmons (UC Davis)