Skip to content

An NSF-funded data visualization tool that makes it easier for people to explore large volumes of social media data to study the emotions, thoughts, behaviors, and health of people during a pandemic or related disaster.

Award Abstract #2318438

A Data Visualization Tool For the Covid-19 Online Prevalence of Emotions in Institutions Database

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Goal of the Grant

The purpose of the grant is three-fold: to develop a tool to broaden research participation in social media analysis, test blended theories, and inform public health policies and interventions.

  • Our main project goal was to develop a Data Analytics Visualization Tool to broaden research participation in social media by creating a sample drawdown feature that does not require programming skills from users. Whereas, without the tool, this skill would be required in order to search the database and draw samples for social, behavioral, and economic science research inquiries.
  • This tool thus removes barriers for analyzing large social media data and provides access to free content analysis software, which is extremely useful when not all institutions provide such software, free of cost.
  • This tool provides a comprehensive framework that allows users to blend an array of social media research methods, such as topic modeling, content analysis, sentiment analysis, and more.
  • User guides and associated tutorials are developed as part of this project to improve the usability of the tool.
  • The tool will undergo internal and external evaluation to improve the effectiveness of the tool.
  • To test a blended theory approach that combines topic modeling and content analysis procedures in a four-phased approach. Phase 1 involves harvesting social media data and compiling a corpus (e.g., COPE-ID). Phase 2 entails using data science techniques (e.g., topic modeling) to compress the corpus on dimensions or topics of relevance. Phase 3 involves a sample drawdown of the most relevant content for further inquiry. Phase 4 entails performing qualitative analysis on a subset of data
  • Using the four-phased framework, this tool has the potential to draw together scientists from across fields of study to examine multiple methods in application to user-generated big data from social media.
  • Being able to access large pandemic data has the potential to allow scientists to generate new theories of human behavior, an area that has become increasingly important across various phases of the pandemic. 
  • Additionally, this tool has the potential to reach public administration and emergency management scholars given that research questions pertaining to the adoption of public health interventions, emotions expressed toward COVID-19, and attitudes toward various social institutions during the COVID-19 pandemic, can be addressed with the COPE-ID data and the Data Analytics Visualization Tool.

Goal of the Tool

The purpose of the Data Visualization tool itself is multifaceted. First, we aim to broaden research participation by providing a tool that assists with data inquiries and sample drawdown, thus simplifying digital content analysis processes. Second, we aim to blend computation and social science theoretical frameworks to generate valuable and intuitive visualizations from the data. Third, we enhance access to both data and analytic resources, enabling the public health community to test emerging theories of pandemic-related human behavior and strengthen future pandemic response efforts.  

  • Beyond sample drawdown, this tool also provides users with topic modeling, content analysis, word clouds, sentiment analysis, and other visualizations (e.g., word clouds, bar charts, and interactive intertopic distance maps) while also calculating inter-rater reliability in real-time for the principal investigator of a project, allowing teams to more easily revise codebook content which is also integrated into the system. Thus, this tool provides a comprehensive toolkit for analyzing social media data.  
  • A methodological goal is to test the four-phased framework’s capabilities for integrating computational and qualitative text analyses (Andreotta et al., 2019). Specifically, this tool includes a body of data collected on the emotions, cognitions, and behavioral responses to the COVID-19 pandemic (COPE-ID; Phase 1) and creates a way of visualizing topics included in the database by offering topic modeling and visualization techniques (Phase 2). From there, users will have a better idea for keywords or key phrases to include in their sample drawdown (Phase 3) for further qualitative inquiry through content analysis that is embedded in the tool (Phase 4).  
  • By eliminating the need for programming to draw samples from larger data files, this tool works toward broadening participation in COVID-19 pandemicrelated research where the behavioral component is less examined. Data included in this set can be used to answer questions about the emotions, cognitions, and behavioral responses to the COVID-19 pandemic as discussed across 10 social media platforms. Being able to access large pandemic data without barriers has the potential to allow scientists to generate new theories on human behavior, an area that has become increasingly important across various phases of the pandemic.  

Goal of the Grant

The purpose of an NSF infrastructure grant is to support the development, enhancement, and sustainability of research infrastructure that enables cutting-edge scientific discovery, collaboration, and development across disciplines.

The purpose of this grant specifically was to empower the broader research community by developing a tool that (1) makes big data more accessible to all, (2) answers the call to bridge the gaps between social science research and computer science, and (3) allows researchers to engage with critical questions around the COVID-19 pandemic through multifaced research lenses to improve future policies and interventions related to a pandemic or disaster. 

This NSF-funded infrastructure project addresses longstanding barriers to accessing complex, user-generated data by creating a tool that opens social media datasets, like the COPE-ID corpus, to a broader spectrum of researchers. Traditionally, social, behavioral, and economic (SBE) scientists without programming expertise have been excluded from such work due to the technical demands of data acquisition via APIs and data processing pipelines. The DataViz tool eliminates the need for coding or computational fluency, enabling qualitative and mixed-methods researchers to engage with big data efficiently and meaningfully. By removing the manual, time-consuming barriers to data retrieval, the tool makes social media data more equitable and usable, especially for those who lack access to technical collaborators or resources. In doing so, it advances NSF’s mission to expand open-access data and promote broader participation, supporting cross-disciplinary innovation and empowering more inclusive research across a wide array of academic domains.

This project responds directly to Andreotta et al.’s (2019) call to integrate computational methods with social science in social media analyses, as one cannot make sense of textual data without classifying and capturing the contextual nuances of content, which machines cannot. Following Andreotta et al.’s (2019) proposed framework, DataViz supports a four-phased methodological approach designed to break down disciplinary silos and advance collaborative, mixed-methods research. This approach involved utilizing a large-scale social media dataset (i.e., COPE-ID), applying data science techniques, such as topic modeling (e.g., LDA, NMF, BERTopic), the social media posts into meaningful thematic structures, drawing down the most relevant topic clusters for closer scrutiny, and lastly creating a space where researchers can apply qualitative frameworks to interpret nuanced patterns within the data. This hybrid framework allows for scalable exploration of complex human behavior while preserving the interpretive depth of SBE research. In doing so, the tool enables transformative interdisciplinary research and bridges persistent gaps between computer science and the social sciences, underscoring the infrastructure grant program’s goal in fostering collaboration between fields that traditionally operate in silos.

The development of the DataViz tool, supported by NSF’s Infrastructure program, equips researchers to explore the complex societal dimensions of pandemics and disasters through interdisciplinary, data-informed approaches. By leveraging the large-scale COVID-19 dataset COPE-ID, the tool has the capability to support new theories of human behavior, which is critical as institutions continue to navigate the long-term impacts of COVID-19. For example, scholars across various disciplines, including communication, public health, psychology, political science, emergency management, and others, can investigate questions related to health messaging, institutional trust, and public sentiment. DataViz also enables the analysis of online behavioral patterns, providing insights into how communities respond to risk and public health interventions. This analysis can help identify how different populations adopt or resist health measures, with variations across time, space, and institutional contexts. Ultimately, the DataViz tool fosters researchers’ abilities to provide insights that support more transparent communication strategies, evidence-based decision-making, and equitable policy development for future public health crises and disaster resilience efforts.

Goal of the Tool

The purpose of the Data Visualization tool itself is multifaceted. First, we aim to broaden research participation by providing a tool that assists with data inquiries and sample drawdown, thus simplifying digital content analysis processes. Second, we aim to blend computation and social science theoretical frameworks to generate valuable and intuitive visualizations from the data. Third, we enhance access to both data and analytic resources, enabling the public health community to test emerging theories of pandemic-related human behavior and strengthen future pandemic response efforts.  

Primary Investigators

Studio Portrait of Megan Stubbs-Richardson (photo © Mississippi State University)

Megan Stubbs-Richardson, Ph.d.

Principal Investigator

Sujan Ranjan Re Anreddy, Ph.D.

Co-Principal Investigator

Studio Portrait of Terri Hernandez (photo © Mississippi State University)

Terri Hernandez, Ph.D.

Co-Principal Investigator

Research Team
Affiliations

Mississippi State University (www.msstate.edu) is a comprehensive, doctoral-degree-granting institution and is the largest research university in the state. Known from its beginnings as “The People’s University,” this land-grant institution is accredited by the Commission on Colleges of the Southern Association of Colleges and Schools to award baccalaureate, master, specialist, and doctoral degrees. MSU has the only college of veterinary medicine and college of architecture, art, and design in the state. Mississippi State is designated by the Carnegie Foundation for the Advancement of Teaching as “a high research activity university.” The Carnegie Foundation has also recognized Mississippi State with its Community Engagement Classification.

The recently released NSF Higher Education Research and Development Survey for Fiscal Year 2015 places Mississippi State at 94th overall among public and private institutions based on total research and development expenditures. Nationally, MSU is ranked 59th in non-medical school R&D expenditures. In addition, MSU remains a top 10 school in the U.S. for agricultural sciences, as well as a top 50 university in Mechanical, Aeronautical/Astronautical, Electrical Engineering, and Computer Science. It also achieved top 20 status in social sciences according to the NSF.

Founded in 1950, Social Science Research Center (SSRC) (www.ssrc.msstate.edu) is an interdisciplinary research center whose annual research portfolio normally ranges between $10-12 million. Each year, research faculty lead approximately 40 extramurally-funded research projects that address social, health, safety, and security issues. Research scientists develop their own research agendas, locate funding for their research and develop research strategies, collaborations, tools, and personnel to carry out their research goals through the support of SSRC’s administrative infrastructure, which assists in the financial and personnel aspects of preparing, submitting, and administering research grants and contracts.

SSRC maintains its own Information Technology and Communications Services (ITCS). ITCS includes 20 physical servers, 38 virtualized servers (cloud and local), 12 network attached storage (NAS) devices, over 200 workstations (Windows, macOS, and Linux), and upwards of 70 printers. The NAS devices provide the SSRC with over 150 terabytes of distributed redundant storage configured in RAID 5, RAID 6, or similar disk array. This provides protection against data loss in the event of hardware failures and rapid expandability. SSRC has constructed its own on-premise computing cluster. This is comprised of 15 nodes orchestrated using MaaS (Metal-As-A-Service), Juju (for service orchestration), and OpenStack to allow for flexible on-the-fly creation of virtual servers.

The mission of Data Science for the Social Science Laboratory (DS3)(www.ds3.ssrc.msstate.edu) is to apply data science techniques and methods to social science research across a variety of big data sources including open source data, such as social media content and images. The laboratory has a tentative partnership with the Computer Science Department at MSU. Its in-house experts consist of six scientists from the following five fields of study: Sociology, Psychology, Criminology, Computer Science, and Public Administration. The laboratory collects social media data from a variety of platforms and conducts content, spatial, network, and machine learning analyses of social media text and images along with the associated metadata. We will utilize the skillsets and expertise of this laboratory as needed to assist in conducting the work proposed in the present proposal.