Digital Data 2019, Week 12

Week 12: Knock Me Down Nine Times But I GitHub Ten

As a tech optimist, Clay Shirky believes in the transformative power of many technologies, including GitHub — a platform that allows programmers to collaboratively write code. By assigning unique numerical IDs to each edit made by a programmers, GitHub allows users to easily resolve conflicts and to revert back to any previous state. This means that access to edit the code doesn’t need to be so highly restricted, and dozens or hundreds of people or more can be actively working on the same code at the same time. Shirky argues that this same platform could be used to democratize the law, if only more lawyers knew about and understood the nature of GitHub as an open-source programming platform (Shirky 2012). However, this viewpoint ignores two key realities, the first that lawmakers probably would not agree that radical and total transparency is something for which government should strive. Sometimes the opacity of governing is maintained for selfish reasons, to hide the influence that lobbyists, PACs, and special interests. Other times this is because the process of governing requires engaging in nuanced conversation and finding compromises that wouldn’t be very popular with a global audience. To properly create governmental transparency, which I agree is a positive goal, there is more to do than simply to educate lawmakers about GitHub; we must have nuanced discussions about where and when transparency is desirable. We must also have conversations about GitHub as a platform, which brings me to the second reality Shirky ignores: that the radical transparency GitHub might provide does not make it immune to the same kinds of bias present elsewhere in society. As an organization, GitHub has been accused of rank sexism by at least one high-profile former employee (Wilhelm and Tsotsis 2014); as a platform, GitHub repositories are filled with obviously racist and sexist code (Horn 2013). Platforms are not neutral, but contain and reproduce many of our societal biases. If GitHub (or any platform) is to be trumpeted as transformative, we must reckon with the negative possible transformations as well as the positive ones.

A figure showing that most of the commits (i.e. changes) made to GitHub repositories are made by a small proportion of users. (Kalliamvakou et al. 2014:95)

A recent study (Kalliamvakou et al. 2014) found that we need to be having similarly nuanced conversations about using GitHub as a platform for research. While the public-by-default nature of GitHub repositories provides a large quantity of data, the quality of data is questionable, as (among other issues) a majority of these repositories are for personal use, and many are used not for software development but for other purposes– collaborative writing, hosting websites, etc (Kalliamvakou et al. 2014:94). This is probably due to GitHub’s freemium model; organizations, professional programmers, and anyone else who has a privacy or intellectual property concern is probably using a paid account to unlock the private repository option not available to free accounts. While its primary purpose is collaborative programming, we also must consider GitHub as a space in which influence is gained and exerted (Goggins and Petakovic 2014). On the one hand, GitHub makes it easy to measure certain kinds of contributions (the number of pull requests made, the number of replies a certain user posts in discussion areas), it also can be used to marginalize other kinds of contributions, and therefore other kinds of contributors, that are just as necessary in a collaborative project. Earlier this week, NASA released the first image of a black hole, and some tried to discredit Katherine Bouman’s importance to the project by using GitHub-provided metrics to “prove” that most of the work had been done by one of her (white male) colleagues (Lou and Ahmed 2019). This is not to say GitHub is useless in research; in fact it’s excellent for some applications, like openly storing a text corpus so that research based on those data can be replicated (Alschner, Seiermann, and Skougarevskiy 2018). As sociologists, however, we must be willing and able to question not just whether tools and platforms meet our technical standards, but whether our use of them fits our ethical and moral standards as well.

Alschner, Wolfgang, Julia Seiermann, and Dmitriy Skougarevskiy. 2018. “Text of Trade Agreements (ToTA)—A Structured Corpus for the Text‐as‐Data Analysis of Preferential Trade Agreements.” Journal of Empirical Legal Studies, 15(3):648–66.

Goggins, Sean and Eva Petakovic. 2014. “Connecting Theory to Social Technology Platforms: A Framework for Measuring Influence in Context.” American Behavioral Scientist, 58(10):1376–92.

Horn, Leslie. 2013. “There Is Blatant Racist and Sexist Language Hiding in Open Source Code.” Gizmodo. Retrieved February 17, 2019 (

Kalliamvakou, Eirini, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. “The Promises and Perils of Mining GitHub.” Pp. 92–101 in Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014. New York, NY, USA: ACM.

Lou, Michelle and Saeed Ahmed. 2019. “To Undermine Katherine Bouman’s Role in the Black Hole Photo, Trolls Held up a White Man as the Real Hero — until He Fought Back.” CNN, April 12.

Shirky, Clay. 2012. How the Internet Will (One Day) Transform Government.

Wilhelm, Alex and Alexia Tsotsis. 2014. “Julie Ann Horvath Describes Sexism And Intimidation Behind Her GitHub Exit.” TechCrunch, March 15.