Exploratory Data Analysis on COVID-19 Tweets

Published in

Analytics Vidhya

4 min readNov 22, 2020

During this pandemic, social media platforms were flooded with tons of feeds related to COVID-19. Today in this article I will present you an Exploratory Data Analysis on the kaggle covid19-tweets dataset.

For performing the EDA, I have used the kaggle dataset- covid19_tweets.csv.

What is EDA?

EDA stands for Exploratory Data Analysis. Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

In simple words, EDA means trying to figure out the meaning of dataset before performing any statistical action on the dataset. Knowing the data well before making sense out of it, is what EDA is.

Now let us perform EDA on the covid19_tweets dataset…

First of all import all the required libraries and the dataset:

By using the .head() and .tail() function we can view the first 5 and last 5 rows of the dataset respectively.

To know the statistical summary of the DataFrame columns, we use the describe() function in python. And to know the total number of rows and columns of the dataset, we use the .shape function in python.

We can use different types of graphs such as Barplot, Histogram, Heatmap, Scatterplot, Boxplot and many more for visualization of the dataset.

For this dataset I have implemented the code to visualize the missing_values and unique_values in the dataset. Below are the plots :

Now we will visualize the most frequent users who post tweets related to covid-19:

Second we will visualize the most frequent locations related to covid-19 tweets:

Now to actually understand the content of the tweets which people tweet, we need to know the most common types of words used in these tweets. To do so, we will have to create a Word Cloud of the 50 top most words used in the tweets. We will make the word clouds based on the location from where the tweets have been posted. We will create an individual word cloud for each location. Below are some of the screenshots of word cloud of top most words in India.

Similarly, we can create a word cloud for each location mentioned in the dataset.

If you want to look at the whole code of this EDA on Covid-19 tweets, refer to have github repository https://github.com/ShilpiParikh/EDA-on-COVID-19-tweets.git.

Thus, by using EDA techiques we can get to know the dataset completely and we can bring out meaning information from the dataset and also could figure out if any flaw exists in dataset or not.

References

Word Clouds - COVID 19 Tweets - JMA

Explore and run machine learning code with Kaggle Notebooks | Using data from COVID19 Tweets

www.kaggle.com

COVID19 Tweets

Tweets with the hashtag #covid19

www.kaggle.com

🦠COVID-19: Sentiment Analysis & Social Networks

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

What is Exploratory Data Analysis?

As I was contemplating what could be the maiden topic I should begin writing my blog with, in no time EDA popped up to…

towardsdatascience.com

Exploratory Data Analysis on COVID-19 Tweets

Word Clouds - COVID 19 Tweets - JMA

Explore and run machine learning code with Kaggle Notebooks | Using data from COVID19 Tweets

COVID19 Tweets

Tweets with the hashtag #covid19

🦠COVID-19: Sentiment Analysis & Social Networks

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

What is Exploratory Data Analysis?

As I was contemplating what could be the maiden topic I should begin writing my blog with, in no time EDA popped up to…

Written by Shilpi Parikh