Build a Twitter Analytics App
2 The First Step: Design Your Solution
6 Writing the Backend Twitter Server
Writing the Code in Small Parts: Part 1, The Basic App
Part 2: Adding a Counter to Exit
Part 3: Adding Language and Retweet Count
7 Adding the Data to a Database
8 Testing: What and How to Test
9 Displaying our Data using the Flask Webserver
9.2 Adding templates to our Flask app
9.3 Displaying our Tweets in the Flask Web Server
10 Future Work and Improvements
So, our code is starting to get messy, and we need to start organising it. There are two main tasks:
1 We have functions spread all over the place. Put them in a single class. This will make them easy to manage, and more importantly, easy to test.
2 At the moment, we are just printing the statistics. We need a way to pass them to the main function (so we can store, manipulate it etc).
Remember, global variables are evil. They cause unintended consequences, are hard to track, and make maintenance a nightmare. So how will we pass our data from the Twitter streaming API back to our main function? Remember the streaming API is real time, and we don’t want to do any processing (like writing to a database) in it.
Our solution will be to create an another class to store the statistics. This will make it easy to access it from multiple places, but still easy to test. This class will be light, as it will store the data as Python objects in memory, so it won’t take too much time to read/write from it (as it would from a database).
Let’s look at each of the above in detail.
A Twitter Class
First, let’s create a twitter class to encapsulate the multiple functions we have. I’m going to simplify the code so we only have 2 functions- one to read streaming api, the other to read the trends, as that’s what our original use case required.
Let’s go over our class step by step. The code is here:
1 2 3 4 5 6 7 8 |
class TwitterMain(): def __init__(self, num_tweets_to_grab, retweet_count): self.auth = tweepy.OAuthHandler(cons_tok, cons_sec) self.auth.set_access_token(app_tok, app_sec) self.api = tweepy.API(self.auth) self.num_tweets_to_grab = num_tweets_to_grab self.retweet_count = retweet_count |
We set up the authorisation in the initialisation function. We also setup the other variables we need, like num_tweets_to_grab.
After that, we bring in our existing functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
def get_streaming_data(self): twitter_stream = Stream(self.auth, twitter_listener(num_tweets_to_grab=self.num_tweets_to_grab, retweet_count = self.retweet_count)) try: twitter_stream.sample() except Exception as e: print(e.__doc__) def get_trends(self): trends = self.api.trends_place(1) trend_data = [] for trend in trends[0]["trends"]: #print(trend['name']) trend_tweets = [] trend_tweets.append(trend['name']) tt = tweepy.Cursor(self.api.search, q = trend['name']).items(3) for t in tt: trend_tweets.append(t.text) #print(tweet_html) trend_data.append(tuple(trend_tweets)) print(trend_data) |
A Statistics class
As discussed, we need a lightweight statistics class. The code for it is actually quite simple. It is taken from here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
class stats(): def __init__(self): self.lang = [] self.top_lang = [] self.top_tweets = [] def add_lang(self, lang): self.lang.append(lang) def add_top_lang(self, top_lang): self.top_lang.append(top_lang) def add_top_tweets(self, tweet_html): self.top_tweets.append(tweet_html) def get_stats(self): return self.lang, self.top_lang, self.top_tweets |
As we can see, all we are doing is storing the data in Python dictionaries. So why use this method?
Since this is now an object, we can use it from multiple places. We will create it once, and pass the object around. That means, unlike global variables, if we need to change the code, it only needs to be done once.
Let’s see how the class is used. It will be created in the TwitterMain class’s init() function. Only showing new code:
1 2 3 4 |
class TwitterMain(): def __init__(self, num_tweets_to_grab, retweet_count): self.stats = stats() |
We are going to pass this object to the twitter_listener class, which actually grabs the streaming data:
1 2 3 4 5 |
class twitter_listener(StreamListener): def __init__(self, num_tweets_to_grab, stats, get_tweet_html, retweet_count=10000): self.stats = stats |
If you remember, originally, we were printing the language data. We will now pass the data to our stats object:
Instead of:
1 |
print(langs[json_data["lang"]]) |
we’ll do:
1 |
self.stats.add_lang(langs[json_data["lang"]]) |
This will store the language in the lang list in our stats class. We will do the same for the other values we want to store.
Then, back in the main code, once we have gathered our streaming data, we can read back what we have stored by
1 2 3 4 |
lang, top_lang, top_tweets = self.stats.get_stats() print(Counter(lang)) print(Counter(top_lang)) print(top_tweets) |
The get_stats() function returns everything we have stored, which we now print.
So what was the advantage of doing it this way?
- We have passed data from the *twitter_listener* to the *TwitterMain()* class without using global variables
- The method is quite quick, as we are just storing the objects in memory
- We can now read and write the stats in multiple places, which means one function could be updating them, while another could be reading them.
Make sure you look at the whole code to understand what is going on.
Next: Enough printing on the screen, we need to start saving the code in a database.