Build a Twitter Analytics App
2 The First Step: Design Your Solution
6 Writing the Backend Twitter Server
Writing the Code in Small Parts: Part 1, The Basic App
Part 2: Adding a Counter to Exit
Part 3: Adding Language and Retweet Count
7 Adding the Data to a Database
8 Testing: What and How to Test
9 Displaying our Data using the Flask Webserver
9.2 Adding templates to our Flask app
9.3 Displaying our Tweets in the Flask Web Server
10 Future Work and Improvements
Now, we are going to add the language to the scripts.
In our Twitter json object, the language is returned as a 2 character code under the lang json key. This is returned as a ISO-something code, and I’ve created a small Python dictionary that will convert from the code to the language name. Here is a brief look at the dictionary:
1 2 |
langs = {'ar': 'Arabic', 'bg': 'Bulgarian', 'ca': 'Catalan','cs': 'Czech','da': 'Danish','de': 'German','el': 'Greek','en': 'English' ... snipped ... |
Now we just need to store the languages. First, we add a new variable to the init code:
1 |
self.languages = [] |
So that the new init function will look like:
1 2 3 4 |
def __init__(self, num_tweets_to_grab): self.counter = 0 self.num_tweets_to_grab = num_tweets_to_grab self.languages = [] |
In the on_data() function, we store the languages per tweet:
1 |
self.languages.append(langs[json_data["lang"]]) |
We are using our Python dictionary to convert the code to an English word, and storing that.
Once we have the languages being appended, we can print the status when we exit (we could also do it live).
1 2 3 4 |
if self.counter == self.num_tweets_to_grab: print(self.languages) print(Counter(self.languages)) return False |
So we print all the languages we found. The Counter() function counts each object in the list, so that you will get an output like this:
1 2 |
['English', 'English', 'Spanish', 'Spanish', 'Spanish', 'English', 'Japanese', 'Filipino', 'Japanese', 'Hindi'] Counter({'Spanish': 3, 'English': 3, 'Japanese': 2, 'Hindi': 1, 'Filipino': 1}) |
The code for this section is here.
Adding the top tweets
Top tweets are defined (by me) as tweets that have more than 10,000 retweets. If this surprises you, I have seen 50-60,000 retweets, usually of things like what some celebrity had for breakfast. We will now only print these top tweets, and store their language.
In init(), we add a new field:
1 2 |
def __init__(self, num_tweets_to_grab, retweet_count=10000): self.top_languages = [] |
We also had a new parameter retweet_count initialized to 10000. Then in the main code, we get the retweet count by parsing the json (by now, you should be an expert in the Twitter json!)
1 |
retweet_count = json_data["retweeted_status"]["retweet_count"] |
And we check it is greater than 10,000. If so, we print the tweet, its retweet count and language. We also save the language.
1 2 3 |
if retweet_count >= self.retweet_count: print(json_data["text"], retweet_count, langs[json_data["lang"]]) self.top_languages.append(langs[json_data["lang"]]) |
This will also be printed at exit time. Let’s look at the whole code now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
import tweepy from tweepy.streaming import StreamListener from tweepy import Stream from local_config import * import pdb import json from collections import Counter langs = {'ar': 'Arabic', 'bg': 'Bulgarian', 'ca': 'Catalan', 'cs': 'Czech', 'da': 'Danish', 'de': 'German', 'el': 'Greek', 'en': 'English', 'es': 'Spanish', 'et': 'Estonian', 'fa': 'Persian', 'fi': 'Finnish', 'fr': 'French', 'hi': 'Hindi', 'hr': 'Croatian', 'hu': 'Hungarian', 'id': 'Indonesian', 'is': 'Icelandic', 'it': 'Italian', 'iw': 'Hebrew', 'ja': 'Japanese', 'ko': 'Korean', 'lt': 'Lithuanian', 'lv': 'Latvian', 'ms': 'Malay', 'nl': 'Dutch', 'no': 'Norwegian', 'pl': 'Polish', 'pt': 'Portuguese', 'ro': 'Romanian', 'ru': 'Russian', 'sk': 'Slovak', 'sl': 'Slovenian', 'sr': 'Serbian', 'sv': 'Swedish', 'th': 'Thai', 'tl': 'Filipino', 'tr': 'Turkish', 'uk': 'Ukrainian', 'ur': 'Urdu', 'vi': 'Vietnamese', 'zh_CN': 'Chinese (simplified)', 'zh_TW': 'Chinese (traditional)'} class twitter_listener(StreamListener): def __init__(self, num_tweets_to_grab, retweet_count=10000): self.counter = 0 self.num_tweets_to_grab = num_tweets_to_grab self.retweet_count = retweet_count self.languages = [] self.top_languages = [] def on_data(self, data): try: json_data = json.loads(data) self.languages.append(langs[json_data["lang"]]) self.counter += 1 retweet_count = json_data["retweeted_status"]["retweet_count"] if retweet_count >= self.retweet_count: print(json_data["text"], retweet_count, langs[json_data["lang"]]) self.top_languages.append(langs[json_data["lang"]]) if self.counter >= self.num_tweets_to_grab: print(self.languages) print(self.top_languages) print(Counter(self.languages)) print(Counter(self.top_languages)) return False return True except: # @TODO: Very dangerous, come back to this! pass def on_error(self, status): print(status) if __name__ == "__main__": auth = tweepy.OAuthHandler(cons_tok, cons_sec) auth.set_access_token(app_tok, app_sec) twitter_api = tweepy.API(auth) # Search stuff search_results = tweepy.Cursor(twitter_api.search, q="Python").items(5) for result in search_results: print(result.text) trends = twitter_api.trends_place(1) for trend in trends[0]["trends"]: print(trend['name']) twitter_stream = Stream(auth, twitter_listener(num_tweets_to_grab=100)) try: twitter_stream.sample() except Exception as e: print(e.__doc__) |
Run the code a few times, make sure you are comfortable with it (&here it is on Github). We are now nearing our challenge time.
Code Review
The code has gotten long & messy. Though we are printing the languages, top tweets etc, we are not doing anything else with it. We can’t really return it, as Tweepy only allows us to return True/False. Besides, we shouldn’t be modifying Tweepy core functionality.
We have code spread all over the place. Some is in the class, some is just being run in the main section. This will make it hard to test (remember our principle: Good code is easily testable).
Challenge Time
Before starting the challenge, go back to our original code, which was just a few lines. Look at the whole thing step by step, seeing how our code increased in complexity. This is important, don’t skip this.
Your challenge next is to:
1 Figure out how to manage the code.
2 How do you return the data we have collected in our class to our main code (so that it can be stored in a database)? Hint, global variables are Bad.
3 We will cover testing later, but have a think about it.
For now, at the minimum, do a quick paper pencil design of how you would fix the code. If you get time, fix it.
Next Part: We start organising our code
Coming Soon. Sign up to be notified when the next part is available.