Build a Twitter Analytics App
2 The First Step: Design Your Solution
6 Writing the Backend Twitter Server
Writing the Code in Small Parts: Part 1, The Basic App
Part 2: Adding a Counter to Exit
Part 3: Adding Language and Retweet Count
7 Adding the Data to a Database
8 Testing: What and How to Test
9 Displaying our Data using the Flask Webserver
9.2 Adding templates to our Flask app
9.3 Displaying our Tweets in the Flask Web Server
10 Future Work and Improvements
Again, we are developing our app in parts.
Last time, we were facing the problem that only a few tweets got printed.
This took me a fair time to debug. I suggest you spend at least five minutes on it, before looking at my solution.
The problem was that the live Twitter stream in wild and unpredictable. Normally, when you search for Tweets, you always get the same json object (and I hope you took my advice and spent some time playing with the json, so that you know what to expect). You get deleted tweets (why?!), tweets in strange formats. 80-90% of the time, the tweets are as you expect. We need to cope with the remaining corner cases. Here is our first attempt to fix it:
1 2 3 4 5 6 7 8 9 |
class twitter_listener(StreamListener): def on_data(self, data): try: j = json.loads(data) print(j["text"]) return True except: pass |
I’m only attaching the changed code. This time, we have the code in a try-except block. If the code finds a json it doesn’t like, it simply ignores it and moves on.
Before I do the code review, can you find anything wrong with this approach?
Code Review
This is a very dangerous anti-pattern. We are brushing problems under the carpet. I addition to bad tweets, any bugs in our code will also get hidden away.
But we do have a problem, that the streaming feed is unpredictable. We don’t our program to stop just because it found 1 bad tweet.
So what’s the solution? The best way is to log the bad tweets, and later on, see if we can write code to work around them.
However, we don’t have any logging facilities at the moment, so we will put this problem on hold, and come back to it later.
See the code here.
Adding a counter
Another problem with our code is, it keeps running non-stop till we kill it. We need to add some sort of a counter to make it stop.
You can’t add this to the on_data() function, as it is called fresh each time.
But we can add it to the class during initialization, and so it will be available each time on_data() is called. Let’s do that:
1 2 3 4 5 |
class twitter_listener(StreamListener): def __init__(self, num_tweets_to_grab): self.counter = 0 self.num_tweets_to_grab = num_tweets_to_grab |
The init function is called when you create the class. We are passing in a variable num_tweets_to_grab, which is the number of tweets we’ll grab. There is also an internal counter. How do you initialise the counter? In your main code, you do:
1 |
twitter_stream = Stream(auth, twitter_listener(num_tweets_to_grab=10)) |
We pass in a value of 10 num_tweets_to_grab when creating our class twitter_listener. Now, in the actual class, we do:
1 2 3 4 5 6 7 |
def on_data(self, data): try: j = json.loads(data) print(j["text"]) self.counter += 1 if self.counter == self.num_tweets_to_grab: return False |
We return False, which causes the class to exit (this is how Tweepy works internally. If you return True, it keeps looking for new tweets). Let’s look at the whole code now (leaving out the imports). We have also included our code to search for tweets and trends here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
class twitter_listener(StreamListener): def __init__(self, num_tweets_to_grab): self.counter = 0 self.num_tweets_to_grab = num_tweets_to_grab def on_data(self, data): try: j = json.loads(data) print(j["text"]) self.counter += 1 if self.counter == self.num_tweets_to_grab: return False return True except: # @TODO: Very dangerous, come back to this! pass def on_error(self, status): print(status) if __name__ == "__main__": auth = tweepy.OAuthHandler(cons_tok, cons_sec) auth.set_access_token(app_tok, app_sec) twitter_api = tweepy.API(auth) # Search stuff search_results = tweepy.Cursor(twitter_api.search, q="Python").items(5) for result in search_results: print(result.text) trends = twitter_api.trends_place(1) for trend in trends[0]["trends"]: print(trend['name']) twitter_stream = Stream(auth, twitter_listener(num_tweets_to_grab=10)) try: twitter_stream.sample() except Exception as e: print(e.__doc__) |
The code for this section is here. Make sure you understand what’s happening before moving on.
Next Part: We will now start looking at the languages of our tweets, plus the top tweets.