Primary Menu

Education, Events, Publication

Funding & Recognition

Understanding How Stories Evolve in Fanfiction Communities via Data Science

Summer 2024

Project Background

Fanfiction refers to written works created and shared by fans of an already existing work, such as a movie, television show, or book series. These existing works give their fans a common point of interest which can springboard into explorations of any genre or topic. Thus, fanfiction communities provide a forum for collective storytelling where often marginalized individuals can engage in cultural commentary, experiment with narrative forms, express and process emotions, and explore aspects of identity or belief - both through generating and interacting with fanworks. alone hosts over 11 million works of fanfiction and enables writers to categorize these stories and readers to search for and interact with these texts through comments and other simple indicators of engagement. We examine these indicators surrounding works, such as tags, view counts, and posting dates.

This project investigates what we can learn from the data surrounding fanfiction works. We will apply known analysis techniques to questions like: Can we map how interest in a particular topic has surged and receded with current events? Can we observe the growth in interest of a particular setting or trope? Can we answer how events such as the pandemic influence genres consumed: Did people seek and produce works focused on comfort? Did they incorporate an analogous event into their fictional worlds? Do these responses vary with source material or country of origin? Through this process, we hope to learn more about how fanfiction data can help us understand how people, stories, and society interact today.

Student Role

With the guidance of the mentor team, the student will help (1) generate questions to ask of the data, (2) perform exploratory data analysis towards answering those questions or generating new ones, and (3) report on findings within the data and on the efficacy of the methods. We expect the data analysis to include running known statistical analyses, using well-established natural language processing techniques, and generating visualizations of the data. We will also encourage and aid the student in developing and studying research questions of their own interests.

No prior familiarity with fanfiction, skills in narrative analysis, or technical experience are assumed. We will adapt the project to the student's experience, but any student will acquire new knowledge and learn new technologies, with the guidance of the mentoring team. We expect the student will become familiar with how fanfiction is archived, categorized, and accessed on a global scale and learn data science approaches for exploratory data analysis, scripting/programming to perform the data science (e.g., Python, R), data and document management (Github, Google Drive), and report writing software (Overleaf).

Content note: The fanfiction works analyzed in this project may include sexually explicit material, violence, or traumatic situations. These student will *not* be expected to read such works. However, they will likely be exposed to keywords describing this potentially disturbing content and need to discuss the analysis of these keywords with the research team.

Student Learning Outcomes and Benefits

Through this project, the student will gain experience in data science techniques for exploratory data analysis; computational approaches to literary analysis; familiarity with digital narrative and archival processes and their emerging forms and genres; developing research questions; communicating and presenting research findings; and working in a collaborative, interdisciplinary research team. Additionally, they will learn about ethics in data collections and curation, both from the mentoring team and through the CITI training program on human subjects data in research.

In support of the data science aspects of this project, we expect the student will build and exercise skills in computer programming/scripting, data management, statistical techniques for exploratory data analysis, and visualization techniques for exploratory data analysis.

The student will be included in any research output (e.g., research papers) that are based in part on their work. For projects substantially based on their work, we will encourage and aid the student in leading a paper submission, including beyond the timeframe of their SPUR project. We will encourage the student to participate in all SPUR program activities such as the Summer Symposium.

We expect these experiences and skills will help students in applications to graduate programs, internships, and data-related careers. Furthermore, outcomes of this project, such as data science artifacts and reports, will be publicly available in open repositories online, allowing students to share their results and incorporate them into their curricula vitae, resumes, and related documents.

Anne Jamison


The student will be mentored by an interdisciplinary team, including Professor Anne Jamison, an expert on fanfiction, Professor Kate Isaacs, an expert on data visualization, Professor Marina Kogan, an expert on social computing, and John Gordon, program manager of the SCI-HUM Initiative. We value different perspectives of our colleagues and encourage the student to take advantage of them as well as contribute their own.

The student will have reserved time twice weekly with the mentoring team to discuss project activities, provide feedback, and advise both towards the project and research and career options in general. The student is also invited to meet informally with any mentor between the established scheduled meeting times. They will be seated among research students at the Scientific Computing and Imaging Institute (SCI Institute), close to the majority of the mentoring team.

We encourage students to take ownership of the project and propose their own ideas and technologies. The mentoring team will suggest possible technologies and related resources for study. Together with the student, we will collaboratively and iteratively guide project work.

The student will further be included in regular group meetings with the project mentors' research groups and the research groups' asynchronous chat platforms (e.g., Slack). These avenues provide an opportunity to practice communicating research and to receive diverse feedback. We will also encourage the student to participate in informal and non-research activities during the summer as we believe that students gain a lot through the relationships they develop with peers.