Build a Twitter Analytics App
2 The First Step: Design Your Solution
6 Writing the Backend Twitter Server
Writing the Code in Small Parts: Part 1, The Basic App
Part 2: Adding a Counter to Exit
Part 3: Adding Language and Retweet Count
7 Adding the Data to a Database
8 Testing: What and How to Test
9 Displaying our Data using the Flask Webserver
9.2 Adding templates to our Flask app
9.3 Displaying our Tweets in the Flask Web Server
10 Future Work and Improvements
So far, we were just printing our data. But if we want to pass it to the webserver, we need to store it somewhere.
Before we go ahead, I hope you downloaded the Sqlite Browser, as I recommend earlier. It’ll help you check you are writing the values to the database correctly.
While you could have the Flask server gather the data in real time, that is bad design practice, because the two things are then closely coupled. This is a technical term that means changing one part of the code will affect the other. This is very bad, as a bug in one part will cause failures in another, and that will make it impossible to debug.
We want to keep our codebase independent, so each part can be tested and debugged separately. That’s why the code that reads live tweets is separate from the code that displays it on a web browser.
Okay, let’s get started. First thing we need to do is create the database. The code is taken from here, though I will only go over a few relevant parts (enough for you to understand what’s going on:
1 2 3 4 5 6 |
import sqlite3 db = "../twit_data.db" conn = sqlite3.connect(db) c = conn.cursor() |
We open our database and connect to it. We are using Sqlite3, which automatically creates the database if it doesn’t exist.
1 2 3 4 5 6 7 |
cmd = "CREATE TABLE lang_data (language TEXT, top_language TEXT, datetime TEXT)" c.execute(cmd) conn.commit() conn.close() |
We’re just going to look at one table, to see how it works.
We are using raw sql in the example above, to create a table called lang_data that contains 3 fields. language and top_language is what we get from Twitter. We are also storing the date-time, so that we can do things like Get all tweets in the last hour.
If you run this file, it will create the database for you.
Writing to the database
Okay, let’s see how we will write to our database. Code is here.
We create a connection to our database in the main code:
1 2 3 4 5 6 7 8 9 10 11 12 |
if __name__ == "__main__": num_tweets_to_grab = 1000 retweet_count = 10000 try: conn = sqlite3.connect(db) twit = TwitterMain(num_tweets_to_grab, retweet_count, conn) except Exception as e: print(e.__doc__) finally: conn.close() |
As you can see, we are passing this database to the TwitterMain class, but opening and closing it from the main code. This is important, as if the class code crashes, we’ll still be able to close the database safely. That’s what the finally code above does.
How do we write to the database?
In the get_streaming_data() function, we get the results from the stats class, and write them to the database. Note that this happens after all the Tweets have been gathered (so the real time nature of the script is not compromised):
1 2 3 |
lang, top_lang, top_tweets = self.stats.get_stats() self.c.execute("INSERT INTO lang_data VALUES (?,?, DATETIME('now'))", (str(list(Counter(lang).items())), str(list(Counter(top_lang).items())))) |
We add the lang and top_lang to the database, along with the current time. Note that I’m doing something weird with the languages:
1 |
(str(list(Counter(lang).items())) |
This is because we can’t store Python objects in a database. So I convert them to a Counter, and then a string. This is how it looks in the actual database:
If you look at the top_language field above, you’ll see it’s a string that contains a Python list. That means I can read it easily.
These were the top languages. The actual top tweets are returned as a list. We’ll write each one to the database:
1 2 3 4 |
for t in top_tweets: self.c.execute("INSERT INTO twit_data VALUES (?, DATETIME('now'))", (t,)) self.conn.commit() |
We loop over the tweets, writing each one, and the current time. We then commit the database.
And this is how we write the Trends:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
def get_trends(self): trends = self.api.trends_place(1) trend_data = [] for trend in trends[0]["trends"]: #print(trend['name']) trend_tweets = [] trend_tweets.append(trend['name']) tt = tweepy.Cursor(self.api.search, q = trend['name']).items(3) for t in tt: trend_tweets.append(self.get_tweet_html(t.id)) #print(tweet_html) trend_data.append(tuple(trend_tweets)) self.c.executemany("INSERT INTO trend_data VALUES (?,?,?,?, DATETIME('now'))", trend_data) self.conn.commit() |
This time, we have a trend and 3 Tweets with each trend.
So we build up our list in this way: The first entry is the trend, and the next three are the example tweets of the trend.
We then create a tuple with this list. This allows us to write the whole thing as one block into the database.
Have a look at the complete code, run it a few times, make sure you understand what is happening.
This concludes our backend code.
But you may have noticed there is a big thing missing: Tests. We’ll fix that in the next section. But first, some theory on testing.