Data topic

Data topic sign-up sheet

Folk the course website, edit this page accordingly, and create a pull request.


Useful Data inventories:

Awesome Public Datasets | Google Dataset Search | Please add more!


Week 3: Andrew Messamore’s Data Topic.

Dataset Introduction

This is the city of Chicago’s open data API. I hope to use it to look at how civic organizations impact the quality of housing and other important social issues.

  • Kassen, Maxat. “A promising phenomenon of open data: A case study of the Chicago open data project.” Government Information Quarterly 30, no. 4 (2013): 508-513.
  • This article presents a case study of the open data project in the Chicago area. The main purpose of the research is to explore empowering potential of an open data phenomenon at the local level as a platform useful for promotion of civic engagement projects and provide a framework for future research and hypothesis testing. Today the main challenge in realization of any e-government projects is a traditional top–down administrative mechanism of their realization itself practically without any input from members of the civil society. In this respect, the author of the article argues that the open data concept realized at the local level may provide a real platform for promotion of proactive civic engagement. By harnessing collective wisdom of the local communities, their knowledge and visions of the local challenges, governments could react and meet citizens’ needs in a more productive and cost-efficient manner. Open data-driven projects that focused on visualization of environmental issues, mapping of utility management, evaluating of political lobbying, social benefits, closing digital divide, etc. are only some examples of such perspectives. These projects are perhaps harbingers of a new political reality where interactions among citizens at the local level will play a more important role than communication between civil society and government due to the empowering potential of the open data concept.

Week 5: Harshal Zalke

Dataset Introduction: IMF Data

I will be using the IMF database to gather data on loans provided to different developing countries and the conditions attached with each loan.

  • Dreher, Axel, Jan-Egbert Sturm, and James Raymond Vreeland. “Global horse trading: IMF loans for votes in the United Nations Security Council.” European Economic Review 53, no. 7 (2009): 742-757.
  • This paper studies the relationship between IMF loans approved to developig countries and theier membership status in the United Nations Security Council. The authors propose that temporary members of the United Nations Security Council are more likely to get their loan requests approved. For this, they use a panel data for 197 countries over the period from 1951 to 2004. They check the number of loans approved for each country between these years. They also check the number of conditions attached with each loan. The study finds that not only do the temporary members of UNSC are more likely to get their loan request approved but the conditions attached with loans are also fewer. The authors conclude that IMF loans is a mechanism by which shareholders of the fund win favor with voting members of the United Nations.

Week 5 - Oct 7: Alex Meed’s Data Topic.

Dataset background

The Federal Election Commission publishes data on campaign committees for Congress and the presidency online. Using this information, I can analyze contributions made to various campaigns for public office. I can also attempt to correlate this data with other datasets to determine the impact of campaign contributions on votes on Congress, electoral results, and other pertinent information.

  • Gimpel, James G., and James H. Glenn. “Racial Proximity and Campaign Contributing.” Electoral Studies 57 (February 2019): 79-87.
  • Gimpel and Glenn use FEC campaign data to analyze whether potential donors to federal campaigns are more active in areas where black and white residents live in close proximity. The authors use data from the American Community Survey to estimate racial population proportions by ZIP code. They then correlate this with the ZIP codes provided by campaign donors, and listed in FEC data, from 2004 to 2014. The authors find that areas in the American South with high levels of mixed settlement produce high levels of campaign contributions.

Week 6: Xu Meiying’s data topic.

Dataset Introduction

Alibaba is an internet-based e-commerce website that covers business-to-business online marketplaces, retail and payment platforms, shopping search engine and data-centric cloud computing services.24 Alibaba requires that manufacturers provide detailed infor- mation about the company, including location, employee size, registered capital and a link to the company’s website. Alibaba allows manufacturers to subscribe as Gold Suppliers, a premium membership that provides promotional opportunities to maximise the exposure and return-on- investment of the suppliers. To qualify for a Gold Supplier membership, a supplier must complete an authentication and verification process by a reputable third-party security service provider appointed by Alibaba. Once approved, Gold Supplier members are authorised to display the Gold Supplier icon to demonstrate their authenticity. Limiting the search to Gold Suppliers ensured the existence of the manufacturing companies identified and excluded fake companies from the search results.

  • T., Jiang, N., Grana, R., Ling, P. M., & Glantz, S. A. (2016). “A content analysis of electronic cigarette manufacturer websites in China”. Tobacco control, 25(2), 188-194.
  • The paper used Alibaba to study the websites of electronic cigarette (e-cigarette) manufacturers in China and describe how they market their products. From March to April 2013, researchers used two search keywords ‘electronic cigarette’ (Dian Zi Xiang Yan in Chinese) and ‘manufacturer’ (Sheng Chan Chang Jia in Chinese) to search e-cigarette manufacturers in China on Alibaba. A total of 18 websites of 12 e-cigarette manufacturers in China were analysed by using a coding guide which includes 14 marketing claims. The coding guide consisted of seven sections with 90 total items, including: (1) basic information about the site, (2) regulatory language, (3) contact information, (4) products, (5) claims, (6) messaging and (7) promotion.
  • The paper finds that health-related benefits were claimed most frequently (89%), followed by the claims of no secondhand smoke (SHS) exposure (78%), and utility for smoking cessation (67%). A wide variety of flavours, celebrity endorsements and e-cigarettes specifically for women were presented. None of the websites had any age restriction on access, references to government regulation or lawsuits. Instruction on how to use e-cigarettes was on 17% of the websites.
  • The paper concludes that betterregulationofe-cigarettemarketing messages on manufacturers’ websites is needed in China. The frequent claims of health benefits, smoking cessation, strategies appealing to youth and women are concerning, especially targeting women. Regulators should prohibit marketing claims of health benefits, no SHS exposure and value for smoking cessation in China until health-related, quality and safety issues have been adequately addressed. To avoid e-cigarette use for initiation to nicotine addiction, messages targeting youth and women should be prohibited.

Week 9 - Nov 4: T Oladimeji’s Data Topic

Dataset background

This is the U.S. Securities and Exchange Commission’s data portal. My goal is to use it to study how CEO beliefs predict firm actions and performance.

  • Koch‐Bayram, Irmela F., and Georg Wernicke. “Drilled to obey? Ex‐military CEOs and financial misconduct.” Strategic Management Journal 39.11 (2018): 2943-2964.
  • Abstract: We examine the influence of CEOs’ military background on financial misconduct using two distinctive datasets. First, we make use of accounting and auditing enforcement releases (AAER) issued by the U.S. Securities and Exchange Commission (SEC), which contain intentional and substantial cases of financial fraud. Second, we use a dataset of “lucky grants,” which provide a measure of the likelihood of grant dates of CEOs’ stock options having been manipulated. Results for both datasets indicate that CEOs who served in the military are less inclined to be involved in fraudulent financial reporting and to backdate stock options. In addition, we find that these relationships are moderated by board oversight (CEO duality and independent directors in the board).

Week 11 - Nov 11: Ethan Tenison’s Data Topic

Dataset background

The Atlas of Economic Complexity is a data visualization tool(dataset) that allows people to explore global trade flows across markets, track these dynamics over time and discover new growth opportunities for every country. The Atlas places the industrial capabilities and knowhow of a country at the heart of its growth prospects, where the diversity and complexity of existing capabilities heavily influence how growth happens.

I hope to use this dataset in order to analyze the impact of trade war ignited by the United States.

  • Hartmann, Dominik, et al. “Linking economic complexity, institutions, and income inequality.” World Development 93 (2017): 75-93.
  • Summary: A country’s mix of products predicts its subsequent pattern of diversification and economic growth. But does this product mix also predict income inequality? Here we combine methods from econometrics, network science, and economic complexity to show that countries exporting complex products—as measured by the Economic Complexity Index—have lower levels of income inequality than countries exporting simpler products. Using multivariate regression analysis, we show that economic complexity is a significant and negative predictor of income inequality and that this relationship is robust to controlling for aggregate measures of income, institutions, export concentration, and human capital. Moreover, we introduce a measure that associates a product to a level of income inequality equal to the average GINI of the countries exporting that product (weighted by the share the product represents in that country’s export basket). We use this measure together with the network of related products—or product space—to illustrate how the development of new products is associated with changes in income inequality. These findings show that economic complexity captures information about an economy’s level of development that is relevant to the ways an economy generates and distributes its income. Moreover, these findings suggest that a country’s productive structure may limit its range of income inequality. Finally, we make our results available through an online resource that allows for its users to visualize the structural transformation of over 150 countries and their associated changes in income inequality during 1963–2008.

Week 12 - Nov 18: Lucas Sepulveda’s Data Topic.

The National Center for Charitable Statistics derives data from information that tax-exempt nonprofit organizations file with the IRS, resulting in the most comprehensive standardized data on tax-exempt organizations. The data is intended for reserachers and policy-makers to use as a springboard for more in-depth survey or case-study research on nonprofits.

  • Bielefeld, W. (2000). Metropolitan Nonprofit Sectors: Findings from NCCS Data. Nonprofit and Voluntary Sector Quarterly, 29(2), 297–314
  • Data from the National Center for Charitable Statistics (NCCS) and other secondary sources was used to examine the nonprofit sectors of nine metropolitan regions. The results indicate that nonprofit sectors vary widely in terms of the numbers of organizations in them and the proportions of different types of providers. Moreover, the findings showed complex and intriguing relationships between nonprofit sectors and political culture, generosity, wealth, poverty, and heterogeneity. Traditionalistic sites had sectors with the opposite characteristics. The sectors in individualistic sites lay between these two patterns. Wealthier sites had larger, better-supported and secure sectors. Sites with higher poverty had less well supported sectors with smaller human service components. The most and least heterogeneous sites had the largest and smallest nonprofit sectors respectively. These findings bolster confidence in the use of NCCS data.

Week 13 - Nov 25: Ryan Anderson’s Data Topic.

Dataset Introduction

This is the U.S. Bureau of Labor Statistics website, which contains myriad datasets on national employment. I hope to use this data to present on the impact of automation on the workforce, including but not limited to shifting skill/education requirements, sector-level trends and projections, and wage inequality.

  • David H. Autor, 2019. “Work of the Past, Work of the Future,” AEA Papers and Proceedings, vol 109, pages 1-32.
  • Labor markets in U.S. cities today are vastly more educated and skill-intensive than they were five decades ago. Yet, urban non-college workers perform substantially less skilled work than decades earlier. This deskilling reflects the joint effects of automation and international trade, which have eliminated the bulk of non-college production, administrative support, and clerical jobs, yielding a disproportionate polarization of urban labor markets. The unwinding of the urban non-college occupational skill gradient has, I argue, abetted a secular fall in real non-college wages by: (1) shunting non-college workers out of specialized middle-skill occupations into low-wage occupations that require only generic skills; (2) diminishing the set of non-college workers that hold middle-skill jobs in high-wage cities; and (3) attenuating, to a startling degree, the steep urban wage premium for non-college workers that prevailed in earlier decades. Changes in the nature of work—many of which are technological in origin—have been more disruptive and less beneficial for non-college than college workers.

Week 13 – Dec 9: Mychal Warner’s Topic

Dataset Background

The World Inequality Database is an open database that contains data on the historical evolution of income and wealth distribution. The WID contains datasets on income and wealth inequality both within and between countries.

  • Marina Gindelsky. “Modeling and Forecasting Income Inequality in the United States.” Bureau of Economic Analysis. https://www.bea.gov/research/papers/2018/ August 201
  • Abstract: Recently, an idea has emerged that “the rich are getting richer and the poor are getting poorer”. Using tax data from Piketty, Saez, and Zucman (2017) (updated in the World Wealth & Income Database) and internal microdata from the Current Population Survey (1975-2015), this paper models inequality and performs pseudo-out-of-sample (2012-2015) and true out-ofsample (2016-2018) forecasts for 5 income inequality measures. The lowest forecast errors from the best models are found for distributional metrics, as compared to top income shares. While macroeconomic indicators, human capital, and labor force metrics often enhance models, measures of skill biased technological change are not found to be robust predictors of inequality trends. Naive approaches often outperform more complex models.

Week 14 - Dec 2: H-1B VISA Data.

Dataset background

These are H-1B Visas applications data from 2006 - 2018, which includes applicants’ employers, wage rates, job titles, employers’ addresses, etc. I hope to find features of H-1B Visa applications, such as the applicants’ average wage rates, their employers’ location, etc. Also, I hope to figure out what are kinds of jobs or employers with most applicants; are H-1B Visa applicants often paid averagely less than the native employees, regarding the same job positions?

  • Watts, Julie R. “The H-1B visa: Free market solutions for business and labor.” Population Research and Policy Review 20 (2001): 143–156.
  • The debate on the H-1B Visa often concentrates on the statement that the H-1B visa program opened the US information technology labor market to temporary, skilled immigrant labor. However, the author argues that this debate often obscures the fundamental flaws of the H-1B Visa. These flaws privilege the IT industry at the expense of H-1B holders and domestic IT workers and should be remedied to ensure that both business and labor abide by free-market principles. The employers’ ability to hire globally must be balanced by workers’ rights to seek out better opportunities. Workers with H-1B Visa are bound to a specific employer for 6 years, which restricts the ability of H-1B holders to compete in an open market, and enables employers to pay a salary often below market value. Also, this restriction jeopardizes domestic IT workers by enabling the IT companies to replace domestic workers to cheaper, temporary immigrant labor.