This is the analysis part where we do a small analysis to find
- Which are the most important/popular tweets
- Whether older materials covered important concepts than recent tweets
Load the necessary libs
%load_ext autoreload
%autoreload 2
%matplotlib inline
import pandas as pd
pd.set_option("display.width", 150)
import requests
import re
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from pathlib import Path
Load csv
df = pd.read_csv("chrisalbon_mlflashcards.csv")
df['text'].fillna('', inplace=True)
df.tail()
id likes replies retweets text timestamp url img_url
236 94607825069... 19 1 0 Bayes Error 2017-12-27T... https://twi... https://pbs...
237 94575108497... 47 3 13 Occams Razor 2017-12-26T... https://twi... https://pbs...
238 94571793723... 8 0 1 K-Fold Cros... 2017-12-26T... https://twi... https://pbs...
239 94538342129... 18 1 1 Extrema 2017-12-25T... https://twi... https://pbs...
240 94536381783... 34 1 7 Softmax Act... 2017-12-25T... https://twi... https://pbs...
Check for missing values
df.count(axis=0)
id 241
likes 241
replies 241
retweets 241
text 241
timestamp 241
url 241
img_url 237
dtype: int64
We see we have few img_urls missing. Let’s check what they are
[Read More]