top of page
Automation of Sentiment Analysis & Topic Modeling on Py-Spark & SparkNLP using Twitter big-data
- Research Questions:
While working on this project, I specifically oriented my work, around the following questions keeping two electoral candidates in mind (Mr. Donald J. Trump & Mr. Joe Biden):
Is there a marginal difference in the ‘Twitter-holdings’ (user base) of both the candidates? (answered by: general Twitter analysis in sub-topic 1.2)
What ‘Sentiments’ are the candidates trying to drive from their daily tweets? (answered by: sentiment analysis of tweets using SparkNLP in sub-topic 2.2)
How are ‘User Behaving’ (by studying the user level tweet sentiments) on a day to day level towards these candidates? (answered by: sentiment analysis of tweets using SparkNLP in sub-topic 2.2)
What are the ‘Most Discussed Topics’ (top 3) by these candidates over Twitter? (answered by: LDA topic-modeling using SparkNLP in sub-topic 2.3)
Is there a way through which I could ‘Automate’ my analysis over a daily frequency? (answered by: local deployment of Models using windows cmd & connecting data to live open-source RDBMS PostgreSQL in sub-topic 3)

- “Elections nowadays aren’t the same, social-media have changed them a lot!” — Oh yes, me.
Over the past decade, usage of political social-media (mainly Twitter) accounts has skyrocketed. Many political leaders (& sometimes their families) are using Twitter as a preeminent mode of communication with their citizenry. However, this has led to some interesting problems. Not only American elections but also the recent elections of the world’s largest democracy, India, was also accused to be ‘Biased’ due to social media influence (check out this article by ‘The Washington Post’ to get what I mean here). The bias, mainly in the form of polarized ‘public sentiments’, was injected by distorting the fragile fabric of social-media.
Thinking of American politics & Twitter, chances are President Donald J. Trump comes to your mind. Ever since the year 2015, when Trump launched his political campaign, he became infamous for his so-called negative, derogatory & somewhat provoking tweets. Give him, 280 character limit, he’ll translate it to a package consisting of the whole spectrum of emotions, sentiments, facts, and opinions (check out this article by ‘The New York Times’ to know the bulk of tweets which he plays with). Even Vox (famous American news and opinion website), in one of its articles, confirms that Trump tweets a lot, & the quantum is really out there.
All of the above facts, combined with my advanced-analytics knowledge made me think — can I develop a Live App that could keep a track on the social-media behavior of candidates fighting for the United States Presidential election, 2020?
bottom of page