Skip to content

Research Products

For our most recent and ongoing COPE-ID and other data science projects, please visit our DS3 research lab's website.

Grant APA Citation

Grant APA Citation: Stubbs-Richardson, M., Anreddy, S., & Porter, B. (2020). RAPID: Analyses of emotions expressed in social media and forums during the COVID-19 pandemic (Award No. 2031246). National Science Foundation.

Website APA Citation

Website APA Citation: Stubbs-Richardson, M., Anreddy, S., & Porter, B. (2022, June 29). COVID-19 online prevalence of emotions in institutions database. COPE-ID.

Database APA Citation

Database APA Citation: Stubbs-Richardson, M., Anreddy, S., & Porter, B. (2022, June 29). COVID-19 online prevalence of emotions in institutions database. Data Science for the Social Sciences Laboratory in the Social Science Research Center at Mississippi State University.

Manuscripts Under Review for Journal Publication

Masked Up but Feeling Down: The Association Between Mental Health and Regulations During the COVID Pandemic

By: Mary Margaret Mitchell, Ben Porter, Anastasia Elder, & Megan Stubbs-Richardson

State and local governments enacted regulation to protect the public from COVID-19 at different times throughout the pandemic. Previous research has shown the efficacy of these interventions at preventing infection, but little research has examined the association between these regulations and mental health. The current study used publicly available data from a variety of sources to examine whether aggregate mental health was associated with these regulations.

Specifically, we examined mask mandates, bar closures, stay-at-home orders, and gathering bans in relation to depression, anxiety, and feelings of isolation. Generally, these regulations were associated with increases in the prevalence of these feelings. However, certain regulations were associated with decreased prevalence of mental health concerns. Specifically, mandatory stay-at-home orders were associated with lower prevalence of depression and feelings of isolation; limited gathering bans were associated with lower feelings of isolation. Political leaning of a county was not clearly related to better or worse mental health functioning in the presence of government regulation. However, the epoch of the pandemic did impact these associations. Gathering bans were associated with the worst mental health symptoms at the beginning of the pandemic whereas mask mandates had the greatest association during the Delta surge. These results highlight some of the health-related ramifications of public health regulation and highlight the complexity of the impact that the COVID-19 pandemic had on individuals.

Mitchell, M., Porter, B., Elder, A., & Stubbs-Richardson, M. (Under Review). Masked up but feeling down: The association between mental health and regulations during the COVID pandemic. Journal of Health Psychology.

Conference Presentations

Evaluation of Machine Learning Algorithms Used for Classifying COVID-19 Misinformation Across Social Media Platforms

By: Maxwell Perkins, Undergraduate Research Assistant under supervision of Dr. Sujan Anreddy

Distrust in the scientific community during the COVID-19 pandemic augmented urgent public health concerns through fostering the spread of misinformation on social media platforms. Uncertainty surrounding the appropriate remedies for the virus led many to promote misinformation that deepened distrust in scientists and further obfuscated appropriate procedures. To combat this misinformation, some platforms implemented moderation procedures which included flagging or removing posts that could contain misinformation. With the goal of combating the spread of misinformation across multiple platforms, this study analyzes machine learning algorithms for predicting COVID-19 related misinformation in text-based social media posts. Multinomial Naive Bayes (MNB), Support Vector Machines (SVM), and Multinomial Logistic Regression (MLR) are some of the Natural Language Processing (NLP) text-classification algorithms implemented and compared for identifying misinformation.

To train and test the prospective models, data related to COVID-19 was collected from Parler, Reddit, Tumblr, Twitter, and YouTube Comments. Algorithm performance is evaluated through comparison between expected performance percentages from the sample and resulting predictions on the population data. So far, algorithm performance measures indicate that a Support Vector Machine algorithm is adept at identifying posts with misinformation from across the given social media platforms, in comparison with the aforementioned algorithms. With further tuning for certain platforms an SVM algorithm has potential to be a credible filter for consumption of COVID-19 related social media information.


Perkins, M. (2022). Evaluation of machine learning algorithms used for classifying COVID-19 misinformation across social media platforms. Presented at the Mississippi State University Undergraduate Symposium in Starkville, MS. Poster Presentation. 

The Association Between Mental Health and Preventive Regulation Over COVID-19

By: Mary Margaret Mitchell & Ben Porter

COVID-19 is a worldwide pandemic that has caused wide-spread actions taken by governments all around the world to slow the spread. During COVID-19, mask mandates, gathering bans, closing of bars, and stay at home orders were all government regulations in the United States that differed by location. Government regulation could have caused a rise or decrease in anxiety and depression throughout the pandemic because government regulations caused people to be required to change their behaviors.

Combining CDC’s publicly available data that tracks country level changes in mask mandates, gathering bans, closing of bars, and stay at home orders with publicly available COVIDcast data that tracks county level changes in cases of COVID-19, symptoms, exposures, and mental health effects allows us to examine whether there is a correlation between government regulation and anxiety and depression levels. Each variable represents the proportion of individuals in a particular county endorsing depression or anxiety, respectively. Preventive measures were obtained from CDC datasets which contain relevant orders for each county in the US at each day across the pandemic. Cross-classified mixed models were used to evaluate depression and anxiety separately with each observation nested within county and day.

Increased levels of preventive measures were generally associated with increased reports of depression and anxiety, except for complete stay-at-home orders being associated with lower levels of depression. These results show that there is an association between preventive measures and prevalence of mental health issues across the pandemic. However, the current study is not able to determine whether a causal link exists between these variables. Future research should investigate the reason for this association. This research can be useful to officials for planning public health preventive measures and preventing unforeseen impacts on mental health.


Mitchell, M., & Porter, B. (2022). The association between mental health and preventive regulation over COVID-19. Presented at the Mississippi State University Undergraduate Symposium in Starkville, MS. Poster Presentation. 


Mitchell, M., & Porter, B. (2022). The association between mental health and preventive regulation over COVID-19. Presented at the Mississippi Academy of Sciences in Biloxi, MS. Oral Presentation.

Topic Modeling of Millions of Tweets using Latent Dirichlet Allocation (LDA) Algorithm

By: Nishan Karki, Undergraduate Research Assistant under supervision of Dr. Sujan Anreddy

Topic modeling is a statistical model for discovering the abstract topics that occur in a collection of documents. It helps to organize, summarize, and better understand the major themes embedded within a text-based source. The purpose of this project is to apply the Latent Dirichlet Allocation (LDA) algorithm for topic modeling and extract different topics found within 4.4 million tweets related to COVID-19.

The raw tweets contain numerous emojis, hyperlinks, digits, arbitrary spacing, punctuation, and non-ASCII characters, requiring that overall data cleaning and pre-processing occur before attempting to fit the model to the data. After testing the model and supplying it with different values for the parameter dictating the number of topics (N) to be produced, N = 20 topics gave the best and most relevant results when compared to N = 5, 10, 15, or 25. In this way, with the help of Natural Language Processing (NLP) and the LDA model, the desired number of topics in millions of tweets related to covid-19 was generated.


Karki, N. (2021). Topic modeling of millions of tweets using latent dirichlet allocation (LDA) algorithm. Presented at the Mississippi State University Undergraduate Symposium in Starkville, MS. Recorded Virtual Presentation. 

Learn More

Using Sentiment Analysis Techniques to Discover Emotions Conveyed on Twitter and Reddit

By: Taylor Ray, Graduate Research Assistant

With user bases of at least 340 million and 430 million each, Twitter and Reddit are undoubtedly two of the most popular social media sites. Interestingly enough, however, these platforms are characteristically quite different. These differences range from the way in which posts are organized and shared, to the overall structure and length of said posts. In light of the fact that social media hosts people’s ideologies, opinions, general thoughts, and interpretations of current events, it is interesting to consider how the overall sentiment from these platforms differs.

Fortunately, advances in computer science and natural language processing enable us to capture the sentiment of texts— even for groupings of texts that span multiple domains and possess various compositions. This research employs a popular supervised method for text classification, Multinomial Naive Bayes, to classify the sentiment polarity (i.e., “positive”, “neutral”, and “negative”) of 900,000 Reddit and Twitter posts related to the COVID-19 pandemic. Results from this approach not only reveal the sentiment polarity distributions from the two platforms, but also show that the classifier performs with a higher accuracy when trained on a dataset containing content from at least one of the sources to later be tested on.


Ray, T. (2021). Using sentiment analysis techniques to discover emotions conveyed on Twitter and Reddit. Presented at the Mississippi State University Undergraduate Symposium in Starkville, MS. Awarded First Place. Recorded Virtual Presentation.

An Examination of COVID-19 Scams and Misinformation on Social Media and Forums

By: Georgiana Swan*, Taylor Ray, Megan Stubbs-Richardson, Ben Porter, Shelby Gilbreath, Mary Margaret Mitchell, Sujan Anreddy, J. Edward Swan II

Misinformation and scams are a huge problem on social media platforms which erodes trust in our democracy and professionals in every field. The purpose of our team’s study is to analyze the prevalence of scams, misinformation, and counter misinformation related to the COVID-19 pandemic and the COVID-19 vaccine rollout across six social media and forum platforms. Our team aimed to examine the vaccine rollout because of the increased amount of misinformation surrounding this topic. In order to study these phenomena, the computer science team collected posts from YouTube, Twitter, Parler, Tumblr, Reddit, and 4 Chan from December 1st through January 31st. 

Based on our coding, the team’s graduate research assistant ran a Multinominal Naïve Bayes on the remaining data. This type of Machine Learning answers the question, “what is the probability that the selected post fits into each category?” Our findings show that our coders had the highest agreement on the variable “relevance” and the least on “countering vaccine misinformation”. However, due to using the Machine Learning software the coders were not able to follow website links. Our team suggests that future researchers should follow links for more context. This study contributes to the misinformation literature because it combines qualitative and quantitative data and examines multiple platforms at once.


Swan, G., Ray, T., Stubbs-Richardson, M., Porter, B., Gilbreath, S., Mitchell, M., Anreddy, S., & Swan II, J. E. (2021). An examination of COVID-19 scams and misinformation on social media and forums. Presented at the Mississippi State University Undergraduate Symposium in Starkville, MS. Recorded Virtual Presentation. 


Using vaderSentiment to Intuitively Predict the Sentiment of Social Media Posts

A Sixth-Month Overview of the Geolocated Twitterspehere Surrounding the COVID-19 Pandemic

By: Taylor Ray and Sujan Anreddy

During any global event, the medium through which individuals are getting updated news and information about that event is incredibly important—especially in a situation like the current COVID-19 pandemic, where failure to obtain information that is accurate and reliable can have serious effects on one’s overall health and potentially even their life. In our digitally-driven world, social media plays a tremendous role as a media outlet for younger and older groups alike. Ideally, social media is a powerful tool when a person can easily distinguish factual information from (mis)information, but this can be difficult to do when social media platforms and the authors of posts are not making this obvious and/or the social media user is very impressionable.

By: Megan Stubbs-Richardson and Viswadeep Lebakula

Individuals across the globe have taken to social media to cope and express concerns or dismay over the coronavirus pandemic. The coronavirus has caused detrimental, life-threatening circumstances, and deaths globally. It has disrupted lives, the economy, the government, family institutions and the like from the start of the pandemic and its detrimental consequences continue into the present. But, have you ever wondered… what exactly are the most pressing topics surrounding this pandemic? And, how are they discussed in social media? In this blog, we address these questions by collecting geolocated Twitter data from Jan. to June 2020 using approximately 245 keywords which included both words and trending hashtags that were identified by the DS3 team during the six months of data collection. 


Ray, T. & Anreddy, S. (2021, Jan. 14). Using vadersentiment to intuitively predict the sentiment of social media posts. Blog, DS3. 


Stubbs-Richardson, M. & Lebakula, V. (2020, July 29). A sixth-month overview of the geolocated Twitterspehere surrounding the COVID-19 pandemic. Blog, DS3.