Build a Twitter Analytics App
2 The First Step: Design Your Solution
6 Writing the Backend Twitter Server
Writing the Code in Small Parts: Part 1, The Basic App
Part 2: Adding a Counter to Exit
Part 3: Adding Language and Retweet Count
7 Adding the Data to a Database
8 Testing: What and How to Test
9 Displaying our Data using the Flask Webserver
9.2 Adding templates to our Flask app
9.3 Displaying our Tweets in the Flask Web Server
10 Future Work and Improvements
There are several ways to design your code. For a time UML was the fad. The thinking was, you would just draw fancy UML diagrams, and some tool would convert your UML to code. And we would all turn into software architects. No more messy programming.
Yeah, that didn’t work out.
I personally find UML very bureaucratic / bulky / complicated for what I want to do. So I stick to simple flowcharts. The goal is to share your design with others, not become obsessed with what tool to use.
In a “formal” context, ie, when you are being paid, you would do the design in a proper Word document. I’ll just do it here, since my goal is to show you what will be expected of you.
High Level Design
This is my high level view. There are two main components:
1. The Twitter Server
This will be some Python scripts that will talk to the Twitter API, and read data. The data will then be analysed and the results stored in a database.
2. The Web app
This is a simple Flask server that will read the database and pass the results to the HTML. The front end scripting library (Google Charts) will then take these results and graph them.
Originally, I didn’t want to use a database. I was planning to use task queues. This is an advanced concept; queues are like threads that process data in the background. So my Flask server would trigger some functions via the queues, and these would talk to the Twitter API and return the data in real time.
Like I said in the design section, I looked at my hardware limitations. I wanted to host my code on Pythonanywhere.com, and they don’t support queues (or threads). I could have used another provider, but task queues are a new topic to me as well, and they would have added extra time to the project. So I decided to drop the idea. These are the sort of decisions you need to make as early as possible. When you are stuck, choose the simplest option that will allow you to finish the project.
Okay, let’s look at my two components in a bit more detail:
The Flask Server
This is fairly simple. All we are doing is reading from the database and updating an HTML file, so that Google Charts can draw graphs. If you have never used Flask before, you may not understand what’s going on. Don’t worry, we’ll cover this later.
Twitter Server
This is slightly more complicated. In truth, the diagram above could have been broken down a bit more. The perform analysis section could have its own diagrams, but from a programming point of view, the code isn’t that complex, so I will stick to this.
The idea is, we get the Twitter Trends, then we read the streaming data. The next step is to perform an analysis. This will include things like finding the top tweets, finding the language of tweets etc. We will keep looping till we read 200 tweets, and then write the results to the database.
Why 2000? Because Twitter has limits about how much data you can stream, and this limit isn’t really well defined. I just tried different values. Sometimes, I could get 10,000 tweets, but other times I would be temporarily blocked. If you abuse the Twitter API too much, you can be blocked for life. So I am sticking to 2000 / hour, which looks safe.
A Sequence Diagram
Even though I’ve mentioned I’m not a fan of UML, there is one aspect of it that is very useful: Sequence diagrams.
The diagrams above show how the blocks link together, but they do not show how data flows through the system. That’s what sequence diagrams do. Here’s mine. Sorry it’s hand drawn, but I couldn’t find any good tools:
And that’s it. The design is fairly simple, because the original problem is fairly simple, being a learning exercise.
At this stage, you’d go for a design review, to check everyone else on the team understands your design, and there are no problems with it. If you are working with a legacy program, chances are you might break something else if you aren’t careful, which is why a design review is needed.
Your Challenge
Your challenge for this week is: Write code that will:
1 Search Twitter for a term, say Python.
2 Find the top trends on Twitter.
3 Print 100 tweets from the Twitter streaming data.
You access the Twitter API directly, using REST/JSON objects, or using a library. I will recommend the second, as it will save you a lot of JSON parsing / url creating. For Python, I found Tweepy to be the most well documented library.
For now, just find any library that can talk to the Twitter Api, and copy paste code that can do the above 3. Don’t worry about making the solution fancy or even clean. In fact, make it as simple as possible. In the next section, we will look at some software engineering principles, as well as my code to read Twitter data.
Next: We look at some principles of writing code you can be proud of.