Part 3: Adding Language and Retweet Count

Build a Twitter Analytics App

1 Introduction: Start Here

2 The First Step: Design Your Solution

3 In Which I Rant a Little

4  Design Solution

5 Writing Great Code

6 Writing the Backend Twitter Server

Writing the Code in Small Parts: Part 1, The Basic App

Part 2: Adding a Counter to Exit

Part 3: Adding Language and Retweet Count

Part 4: Organising Our Code

7 Adding the Data to a Database

8 Testing: What and How to Test

8.1 Testing Our Frontend

8.2 Testing Our Backend

9 Displaying our Data using the Flask Webserver

9.1 Introduction to Flask

9.2 Adding templates to our Flask app

9.3 Displaying our Tweets in the Flask Web Server

10 Future Work and Improvements

Now, we are going to add the language to the scripts.

In our Twitter json object, the language is returned as a 2 character code under the lang json key. This is returned as a ISO-something code, and I’ve created a small Python dictionary that will convert from the code to the language name. Here is a brief look at the dictionary:

Now we just need to store the languages. First, we add a new variable to the init code:

So that the new init function will look like:

In the on_data() function, we store the languages per tweet:

We are using our Python dictionary to convert the code to an English word, and storing that.

Once we have the languages being appended, we can print the status when we exit (we could also do it live).

So we print all the languages we found. The Counter() function counts each object in the list, so that you will get an output like this:

The code for this section is here.

Adding the top tweets

Top tweets are defined (by me) as tweets that have more than 10,000 retweets. If this surprises you, I have seen 50-60,000 retweets, usually of things like what some celebrity had for breakfast. We will now only print these top tweets, and store their language.

In init(), we add a new field:

We also had a new parameter retweet_count initialized to 10000. Then in the main code, we get the retweet count by parsing the json (by now, you should be an expert in the Twitter json!)

And we check it is greater than 10,000. If so, we print the tweet, its retweet count and language. We also save the language.

This will also be printed at exit time. Let’s look at the whole code now:

Run the code a few times, make sure you are comfortable with it (&here it is on Github). We are now nearing our challenge time.

Code Review

The code has gotten long & messy. Though we are printing the languages, top tweets etc, we are not doing anything else with it. We can’t really return it, as Tweepy only allows us to return True/False. Besides, we shouldn’t be modifying Tweepy core functionality.

We have code spread all over the place. Some is in the class, some is just being run in the main section. This will make it hard to test (remember our principle: Good code is easily testable).

Challenge Time

Before starting the challenge, go back to our original code, which was just a few lines. Look at the whole thing step by step, seeing how our code increased in complexity. This is important, don’t skip this.

Your challenge next is to:

1 Figure out how to manage the code.

2 How do you return the data we have collected in our class to our main code (so that it can be stored in a database)? Hint, global variables are Bad.

3 We will cover testing later, but have a think about it.

For now, at the minimum, do a quick paper pencil design of how you would fix the code. If you get time, fix it.

Next Part: We start organising our code

Coming Soon. Sign up to be notified when the next part is available.