Course Schedule

Understanding “Data”

Data Acquisition and Preprocessing

Research Design: Integrate Data Science with Social Science Research


Week 0 Pre-course Back2Top


Week 1 1/27: Course introduction: why this course? Back2Top

Before class

  • Readings:
    • Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature News, 533(7604), 452. doi:10.1038/533452a.
    • Briney, K. (2015). The Data Problem. In Data management for researchers: Organize, maintain and share your data for research success. Research Skills Series (Exeter, England). HOLLIS number:014921191. Exeter, UK: Pelagic Publishing.
    • Cioffi-Revilla, C. (2017). Introduction to Computational Social Science. Texts in Computer Science. doi:10.1007/978-3-319-50131-4.
    • Gentzkow, M., & Shapiro, J. M. (2014d). Introduction. In Code and data for the social sciences: A practitioner’s guide.
    • Lazer, D., Pentland, A., Adamic, L., Aral, S., Barab´asi, A.-L., Brewer, D., . . . Alstyne, M. V. (2009). Computational Social Science. Science, 323(5915), 721–723. doi:10.1126/science.1167742.6

In class

  • Discussion and lecture on readings.
  • Course review: Syllabus, assignments, final project.

After class


Week 2 2/3: Data management and research life cycle Back2Top

Before class

  • Readings:
    • Briney, K. (2015). Planning for Data Management. In Data management for researchers: Organize, maintain and share your data for research success. Research Skills Series (Exeter, England). HOLLIS number: 014921191. Exeter, UK: Pelagic Publishing.
    • Briney, K. (2015). The Data Lifecycle. In Data management for researchers: Organize, maintain and share your data for research success. Research Skills Series (Exeter, England). HOLLIS number: 014921191. Exeter, UK: Pelagic Publishing.
    • Ruane, J. M. (2016). Designing Ideas: What Do We Want to Know and How Can We Get There? In Introducing Social Research Methods: Essentials for Getting the Edge (pp. 67–92). Chichester, West Sussex, UK ; Hoboken, NJ: John Wiley & Sons Inc.

In class

After class


Week 3 2/10: Data types and data visualization and interaction Back2Top

Before class

  • Readings:
    • Gleicher, M., Albers, D., Walker, R., Jusufi, I., Hansen, C. D., & Roberts, J. C. (2011). Visual comparison for information visualization. Information Visualization, 10(4), 289–309. 00323. doi:10.1177/1473871611416549.
    • Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings 1996 IEEE Symposium on Visual Languages (pp. 336–343). 05240. doi:10.1109/VL.1996.545307.

In class

  • Discussion and lecture on readings.

After class


Week 4 2/17: Data structure, relational database/data dictionary, and file input/output (I/O) Back2Top

Before class

  • Readings:
    • Gentzkow, M., & Shapiro, J. M. (2014e). Keys. In Code and data for the social sciences: A practitioner’s guide.
    • Swaroop C. H. (2013). Data Structures. In A Byte of Python.
    • Wickham, H. (2014). Tidy data. The Journal of Statistical Software, 59(10). http://www.jstatsoft.org/v59/i10/
    • Normalization of Database

In class

  • Discussion and lecture on readings.
  • Group practice: Form to Table (IRS 990 Form)

After class


Week 5 2/24: Text and relation as data Back2Top

Before class

  • Required Readings:
    • Grimmer, J., &Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267–297. doi:10.1093/pan/mps028.
    • Provan, K. G., Veazie, M. A., Staten, L. K., & Teufel-Shone, N. I. (2005). The use of network analysis to strengthen community partnerships. Public Administration Review, 65(5), 603–613.
    • Borgatti, S. P., & Foster, P. C. (2003). The Network Paradigm in Organizational Research: A Review and Typology. Journal of Management, 29(6), 991–1013. doi:10.1016/S0149-20630300087-4.

In class:

  • Discussion and lecture on readings.
  • Group practice: collect network data.

After class


Week 6 3/2: Documenting data and version control Back2Top

Before class

  • Readings:
    • Briney, K. (2015b). Documentation. In Data management for researchers: Organize, maintain and share your data for research success. Research Skills Series (Exeter, England). HOLLIS number:014921191. Exeter, UK: Pelagic Publishing.
    • Broman, K. W., & Woo, K. H. (2017). Data organization in spreadsheets (tech. rep. No. e3183v1). PeerJ Inc. doi:10.7287/peerj.preprints.3183v1.
    • Gentzkow, M., & Shapiro, J. M. (2014f). Version Control. In Code and data for the social sciences: A practitioner’s guide.

In class

  • Discussion and lecture on readings.

After class


Week 7 3/9: Acquiring data: open data and open-source intelligence Back2Top

Before class

  • Readings:
    • Briney, K. (2015c). Managing sensitive data. In Data management for researchers: Organize, maintain and share your data for research success. Research Skills Series (Exeter, England). HOLLIS number: 014921191. Exeter, UK: Pelagic Publishing.
    • Rasche, A., Morsing, M., & Wetter, E. (2019). Assessing the Legitimacy of “Open” and “Closed” Data Partnerships for Sustainable Development. Business & Society, 0007650319825876. doi:10.1177/0007650319825876.
    • Williams, H. J., & Blum, I. (2018). Defining Second Generation Open Source Intelligence (OSINT) for the Defense Enterprise. RAND Corporation.

In class

  • Discussion and lecture on readings.

After class


Week 8 3/23: Cleaning data: data preprocessing and organizing Back2Top

Before class

  • Readings:
    • Briney, K. (2015d). Organization. In Data management for researchers: Organize, maintain and share your data for research success. Research Skills Series (Exeter, England).
    • Gentzkow, M., & Shapiro, J. M. (2014). Directories. In Code and data for the social sciences: A practitioner’s guide.
    • Miksa, T., Simms, S., Mietchen, D., & Jones, S. (2019). Ten principles for machine-actionable data management plans. PLOS Computational Biology, 15(3), e1006750. doi:10.1371/journal.pcbi.1006750.

In class

After class


Week 9 3/30: Sharing data: data policy and research ethics Back2Top

Before class

  • Readings:
    • Menczer, F. (2008). Legal and ethical considerations in crawling/mining online social network data.
    • Ruane, J. M. (2016). Ethics: It’s the Right Thing To Do. In Introducing Social Research Methods: Essentials for Getting the Edge (pp. 45–66). Chichester, West Sussex, UK ; Hoboken, NJ: John Wiley & Sons Inc.
    • Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., . . . Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. doi:10.1038/sdata.2016.18.

In class

After class


Week 10 4/6: Final project workday (no class) Back2Top


Week 11 4/13: Research design and research life cycle Back2Top

Before class

  • Readings:
    • Briney, K. (2015). Data reuse and restarting the data lifecycle. In Data management for researchers: Organize, maintain and share your data for research success. Research Skills Series (Exeter, England).
    • Creswell, J. W. (2014). The Selection of a Research Approach. In Research design: Qualitative, quantitative, and mixed methods approaches (4th ed). Thousand Oaks: SAGE Publications.
    • Gerring, J. (1999). What Makes a Concept Good? A Criterial Framework for Understanding Concept Formation in the Social Sciences. Polity, 31(3), 357–393. doi:10.2307/3235246.
    • Ruane, J. M. (2016d). Measure by Measure: Developing Measures—Making the Abstract Concrete. In Introducing Social Research Methods: Essentials for Getting the Edge (pp. 93–116). Chichester, West Sussex, UK ; Hoboken, NJ: John Wiley & Sons Inc.

In class

  • Discussion and lecture on readings.

After class


Week 12 4/20: Validation Back2Top

Before class

  • Readings:
    • Ruane, J. M. (2016a). All That Glitters Is Not Gold: Assessing the Validity and Reliability of Measures. In Introducing Social Research Methods: Essentials for Getting the Edge (pp. 117–138). Chichester, West Sussex, UK ; Hoboken, NJ: John Wiley & Sons Inc.
    • Wallace, J. L. (2016). Juking the Stats? Authoritarian Information Problems in China. British Journal of Political Science, 46(1), 11–29. doi:10.1017/S0007123414000106.
    • Zhu, L. (2013). Panel Data Analysis in Public Administration: Substantive and Statistical Considerations. Journal of Public Administration Research and Theory, 23(2), 395–428. doi:10.1093/jopart/mus064.

In class

  • Discussion and lecture on readings.
  • Hands-on practice.

After class


Week 13 4/27: Replication, standardization, and automation of research workflow Back2Top

Before class

  • Readings:
    • Gentzkow, M., & Shapiro, J. M. (2014a). Appendix: Code Style. In Code and data for the social sciences: A practitioner’s guide.
    • King, G. (1995). Replication, Replication. PS: Political Science & Politics, 28(3), 444–452. doi:10.2307/420301.
    • Wilson, G., Aruliah, D. A., Brown, C. T., Hong, N. P. C., Davis, M., Guy, R. T., . . . Wilson, P. (2014). Best Practices for Scientific Computing. PLOS Biology, 12(1), e1001745. doi:10.1371/journal.pbio.1001745.
    • Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2017). Good enough practices in scientific computing. PLOS Computational Biology, 13(6), e1005510. doi:10.1371/journal.pcbi.1005510.
    • Gentzkow, M., & Shapiro, J. M. (2014). Automation. In Code and data for the social sciences: A practitioner’s guide.

In class

  • Discussion and lecture on readings.
  • Hands-on practice.

After class


Week 14 5/4: From empirical study to theory building. Final project presentation Back2Top

Before class

  • Readings:
    • Creswell, J. W. (2014b). The Use of Theory. In Research design: Qualitative, quantitative, and mixed methods approaches (4th ed). Thousand Oaks: SAGE Publications.
    • Sutton, R. I., & Staw, B. M. (1995). What Theory is Not. Administrative Science Quarterly, 40(3), 371–384. doi:10.2307/2393788.

In class

  • Discussion and lecture on readings.

After class