Build a Reddit Bot Series
Part 1: Read posts from reddit
Part 4: Marvin the Depressed Bot
Introduction
So we are going to build a simple Reddit Bot that will do two things:
- It will monitor a particular subreddit for new posts, and when someone posts “I love Python”, it will reply “Me too!”.
-
It will also monitor all comments to recent posts, and if it finds one that says “I hate Python”, it will post a link to /r/learnpython and ask the commenter to ask a question there.
Prerequisite knowledge
Only a basic knowledge of Python is required, as building bots is fairly easy.
Part 1
In part one, we will see how we can read data from Reddit using the Reddit API. The source code is available at Github:
Edit: Based on comment by reader Farid:
Reddit has updated it’s website to a new look. If you come across a link above that does not work, then you will have to change the url.
Here is an example:
Above we have the link http://www.reddit.com/dev/api
Yet, it says not found. If we change the link to http://old.reddit.com/dev/api then the link should work.
In short, if a reddit link does not work change the “www” to “old”, so the link looks like “old.reddit.com”
Software bot
A software bot is a program that can interact with websites autonomously. They can be as simple or as complex as you want them to be.
The bot runs in the background and monitors a website. When it sees a change (like a post on Reddit), it can reply to it, upvote, or do any other task it was programmed to.
Monitoring websites
There are many ways to monitor websites. You can use web scraping tools like urllib or Beautifulsoup any anything similar. There is a slight problem with this, though. Bots can make thousands of requests a second, and this can overload servers. So most big websites ban bots. Ignore this at your own risk. I have been banned from Google for hours, had my Gmail locked till I entered a dozen captachas, my mobile and the name of my first cat.
If you want to do this properly, stick to any rules the website has.
Reddit API
Reddit provides an API, and unlike some websites, it’s actually quite easy to use. It’s based on REST and json, so in theory doesn’t require any fancy setup.
The important thing is to follow the rules they set. Two of the most important ones are:
- You can’t make more than 1 request every 2 seconds (or 30 a minute)
- You must not lie about your user agent
Read the rest here.
The user agent is what identifies your browser. Libraries like Python’s urllib are severely restricted by Reddit to prevent abuse. Reddit recommends you use your own special user agent, and that’s what we’ll do.
Using the API
The API is quite easy to use, like I said. You make a REST request, and this can be done via urllib2 (as long as you set the user agent properly). This is how you would do it. I have put two links below. Open both in a new tab:
http://www.reddit.com/r/learnPython/
http://www.reddit.com/r/learnPython/hot/.json
The first is how a human would see it. The second is how your code sees it. As you can see, getting the json is fairly easy.
The problem with this approach is that you still have to make sure you rate limit your requests. You also have to parse the json yourself. Json is easy to parse in Python, as it’s essentially a Python dictionary, but if you actually look at the json, there is a lot of data.
Introducing Praw
[Update Dec 2016: Reddit and Praw now force you to use Oauth. I’ve updated the article to use that]
Praw is a library that fixes many of these problems for you. It limits how many requests you can make, and makes it easy to extract the json. Install it by:
1 |
pip install praw |
You need to do some setup first.
Create Reddit App
Go to: https://www.reddit.com/prefs/apps/
And select Create App:
Give it a name. You have to choose a redirect uri (for some stupid reason, stupid because Im building a bot, not a webapp, but whatever). I chose http://127.0.0.1
You will now get a client_id (red box below) and secret (blue box below). Note it down, but keep it secret.
Now, you need to update your praw ini file to remember these settings. Otherwise, you’ll have to put them in your script and thats dangerous (as others might see them).
This page describes how to change praw.ini files: https://praw.readthedocs.io/en/v4.0.0/getting_started/configuration/prawini.html
You will find the file in your Python install folder, under Lib\Site-Packages\praw\praw.ini
Update: As Bryce points out in the comments:
I don’t recommend modifying the package-level praw.ini
as those changes will be overwritten every time the package is updated. Instead praw.ini
should be placed in the directory that the program is run from (often the same directory as the file).
Other options are specified here: https://praw.readthedocs.io/en/latest/getting_started/configuration/prawini.html#praw-ini-files
I recommend following Bryce’s advice.
Add the values we noted down:
client_id and client_secret are what you wrote down. Username and password are your account details (and optional if you only want read only access).
There is a new field: user_agent.
Remember I said the Reddit rules say you have to have a specific user agent? I’m choosing the name PyEng Bot. The number at the end is the version. This is recommended, because once your code is out there, people might abuse it. If someone spams Reddit with your code, Reddit will ban that user agent.
In that case, you just move the version up. Not ideal, but you have to accept that your code may be misused by spammers.
Let’s go over the code now. Download it at Github.
1 |
import praw |
We import praw.
1 2 3 |
reddit = praw.Reddit('bot1') subreddit = r.subreddit("learnpython") |
We create a Reddit instance using the values we saved under bot1.
Then we get the subreddit learnpython.
Now, if you look on the subreddit, you can see that there is a hot tab. This does not indicate the temperature there is high or that there are racy swimsuit models. It means the most popular posts. That’s what we are going to read now. The function to do so is get_hot().
1 |
for submission in subreddit.hot(limit=5): |
We get the top 5 hot submissions. At this stage, you can do this to see which functions are available (you can do that at any stage, or look at Praw’s documentation).
Seeing a snipped list:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
dir(submission) ['approve', 'approved_by', 'author', 'domain', 'downs', 'downvote', 'edit', 'edited', 'saved', 'score', 'secure_media', 'secure_media_embed', 'selftext', 'selftext_html', 'title', 'ups', 'upvote', 'url', 'user_reports', 'visited', 'vote'] |
I’ll point out a few important ones. Title is the title, as it appears on Reddit’s main page. Selftext is the optional text you can put on posts- most posts don’t have these. learnpython is unique in that most posts do have text (usually the poster asking their question), which is why I’ve chosen it. score is the total score, adding upvotes and downvotes (both of which are also available).
These are the three we will print:
1 2 3 4 5 |
for submission in subreddit.hot(limit=5): print("Title: ", submission.title) print("Text: ", submission.selftext) print("Score: ", submission.score) print("---------------------------------\n") |
That’s it. Run the script, and open Reddit in a browser at the same time. Check that you are getting the right results.
Next time
Next time we will look at how to send a reply to a post on Reddit. Next Part
How do I install praw on a Windows platform? I have Python27 installed, but no pip command…
From the Answer-Your-Own-Question Dept.
Instructions here: https://github.com/BurntSushi/nfldb/wiki/Python-&-pip-Windows-installation
I recommend people install Anaconda Python- it comes with 90% of what you need. https://www.continuum.io/downloads
You have to install it from the command line, not the python interpreter.
Change your directory to the Python Scripts directory using “cd c:\Python27\Scripts” then run “pip install craw”.
Running the script returns the following error:
File “bot_read.py”, line 11
print “Title: “, submission.title
^
SyntaxError: Missing parenthesis in call to ‘print’
try: print(“Title”)
You need to use Python 3 syntax, which is print(“Title”) as Shantnu indicated.
Thank you for this amazing post. I am enjoying learning Python because of such build yourself tutorials. Watching your screen show the output and not an error is an amazing confidence booster. Keep up the awesome work.
Thank you for the kind words 🙂
Hey man. This bot has helped me massively. I am so close to completing a project I have been working on. I have been trying to find out how to extract the comments from submissions and this has helped.
However when I edited the code I got this error:
posts_replied_to.append(submission.id)
AttributeError: ‘filter’ object has no attribute ‘append’
I can’t see why it is suddenly not working, when it worked before
you’re welcome.
The error most likely means posts_replied_to has not been initialised correctly. Put a break point right before that line and check what it’s been set to.
Thanks I read somewhere that it was a python 2 vs python 3 problem with the use of the term “append”
I solved it with this code instead:
with open(“posts_replied_to.txt”, “a”) as myfile:
myfile.write(submission.id + “\n”)
Thanks again
Hello there, and thank-you for this awesome tutorial! Sadly though, when I run this script, it returns the following:
Traceback (most recent call last):
File “untitled0.py”, line 6, in
r = praw.Reddit(user_agent = user_agent)
File “/anaconda/lib/python3.5/site-packages/praw/reddit.py”, line 114, in __init__
raise ClientException(required_message.format(attribute))
praw.exceptions.ClientException: Required configuration setting ‘client_id’ missing.
This setting can be provided in a praw.ini file, as a keyword argument to the
Reddit
class constructor, or as an environment variable.Any help would be greatly appreciated. Thank-you again!
Jeff, it seems praw have updated their script (to keep up with reddit changes to Ouath).
I’ll have to update the article. Thanks for letting me know, I’ll get back to you soon.
Okay, I’ve updated the script. It should work now.
Thanks for the post! Everything worked for me except that apparently reddit.get_subreddit() has been deprecated in favor of reddit.subreddit()
hi. first of all, this is a great tutorial. but i’m having a problem. when i try to install praw i get this error: File “”, line 1 pip install praw SyntaxError: invalid syntax. i’ve been stuck on this for a while now and would really appriciate your help.
How are you installing it? Post the exact command and error message
the command is pip install praw and the error says File””, line 1 pip install praw ^ SyntaxError: invalid syntax
I am trying to install it through the python command line.
there should be a stdin in arrow brackets (like this: <) inside the parenthises
when I use python script.py, I get:
prawcore.exceptions.OAuthException: unauthorized_client error processing request (Only script apps may use password auth)
Did you create the Reddit app correctly? That error says Reddit doesnt like your login
Hey guys! Make sure the app (On reddit side) is a script app rather than a web app. This is what did it for me.
Great tip! Thanks.
Sent you a link to Gitter chat. Keep in mind I may not be there as the same time as you, but I will reply to messages.
Great tutorial! One suggestion, however, pertains to this line:
> You will find the file in your Python install folder, under Lib\Site-Packages\praw\praw.ini
I don’t recommend modifying the package-level
praw.ini
as those changes will be overwritten every time the package is updated. Insteadpraw.ini
should be placed in the directory that the program is run from (often the same directory as the file).Other options are specified here: https://praw.readthedocs.io/en/latest/getting_started/configuration/prawini.html#praw-ini-files
One other comment is that for PRAW4 the following line:
subreddit = r.get_subreddit(“learnpython”)
should now be:
subreddit = r.subreddit(“learnpython”)
Thanks! I will update the blog
Hi, for whatever reason I’m getting an SSL error when I try to run the script. I’ve currently got Python 2.7.13.
Here’s the error:
File “/home//.local/lib/python2.7/site-packages/prawcore/requestor.py”, line 48, in request
raise RequestException(exc, args, kwargs)
prawcore.exceptions.RequestException: error with request Can’t connect to HTTPS URL because the SSL module is not available.
I’ve tried to reinstall the SSL module using “pip install ssl”, but that doesn’t work because, according to the error message, it’s “already built in”.
I’ve been googling for hours but to no avail, would really appreciate some help with this. No one else seems to be in my situation…
Great tutorial otherwise!
-Hugh
Do you mind trying this with Python 3, which is what I used? I recommend Anaconda Python: https://www.continuum.io/downloads
PS: Google is lying, when you search for that error message you get unrelated results, which is why you were struggling.
I am getting error –> NoSectionError: No section: ‘bot1’
I have updated my praw..ini file as follow:
[bot1]
client_id: 6LuNIgq******Q
client_secret: xtbrrx********DGiv4GxFE
username: **********
password: **********
user_agent: python_bot 0.1
I got it solved…it was because I was running code in other directory…
But now I am getting following “ClientException: Required configuration setting ‘user_agent’ missing.”
You have set the user agent. Is there a typo, or are you not setting the user agent correctly? What value are you using?
I noticed you have…
reddit = praw.Reddit(‘bot1’)
subreddit = r.subreddit(“learnpython”)
…in the code example. Should be…
r = praw.Reddit(‘bot1’)
subreddit = r.subreddit(“learnpython”)
…or…
reddit = praw.Reddit(‘bot1’)
subreddit = reddit.subreddit(“learnpython”)
Thanks for pointing that out! I had updated the code, but didnt update the article (at least, not properly).
cheers!
Hello I have installed praw successfully in my C:/Python folder and updated the praw.ini file to include the bot info. When i run my script (located in a different dir, my cygwin dir ) I get the following error. Any ideas?
Traceback (most recent call last):
File “first_script.py”, line 2, in
import praw
ImportError: No module named praw
P.S. THANK YOU for this article, it is super helpful, I look forward to your reply
Is Python on the path? How did you install Python? I recommend Anaconda Python, as it comes with a lot of libraries, and also adds itself to the path for you.
Hey you must forgive me, I am very new to these environment variables. I can execute python files from my python directory but not in my cygwin directory, does that mean I need to add my home cygwin directory to the path? any clarification or a point in the right direction would be super helpful. I am sure I am overlooking something small.
Another P.S lol I am able to run my hello world python program from that directory. I went ahead and added the cygwin directory to the path as well, I am still experiencing the same issue.
Do i need to install python in the cygdrive/c/ folder?
Sorry for all of the replies/spam. It looks like my problem is rooted with cygwin, I have successfully ran it from the IDLE shell.
Hi, I’ve used your code but I keep getting a long error message that ends in “prawcore.exceptions.OAuthException: invalid_grant error processing request”. Apparently it’s something to do with the praw.ini file but I can’t make it work. Do you recognise the error and how can I fix it?
It usually happens if you didnt copy/paste app keys / password correctly. Double check it.
I feel stupid – I had the bot’s username rather than my Reddit account’s. I put that in and it worked fine, thank you
You say to put the praw.ini file in the program’s folder and not to use the file in the package directory. If I put the file in the new folder, don’t I have to point the script to use the new .ini file?
Nevermind, I found out that it checks for the praw.ini file in the script’s folder first and only uses the one in the package directory if it did not find it anywhere else.
Thanks for replying with the answer. I was wondering the same thing.
what username and password should i input in the praw.ini file?
Hey, great tutorial. I’ve already run the code once and it worked perfectly. But, I ran it again and now I’m getting the error of
Traceback (most recent call last):
File “C:/Users/HSI/PycharmProjects/untitled2/Reddit Bot.py”, line 21, in
if submission.id not in posts_replied_to:
NameError: name ‘posts_replied_to’ is not defined
what do I do?
Look at the other comments, someone has seen this problem before.
$ python deebuggers.py
Traceback (most recent call last):
File “deebuggers.py”, line 4, in
reddit = praw.Reddit(‘bot1’)
File “C:\Python27\lib\site-packages\praw\reddit.py”, line 129, in __init__
self.config = Config(config_section, **config_settings)
File “C:\Python27\lib\site-packages\praw\config.py”, line 66, in __init__
self.custom = dict(Config.CONFIG.items(site_name), **settings)
File “C:\Python27\lib\ConfigParser.py”, line 347, in items
raise NoSectionError(section)
ConfigParser.NoSectionError: No section: ‘bot1’
You provided the name of a praw.ini configuration which does not exist.
For help with creating a Reddit instance, visit
https://praw.readthedocs.io/en/latest/code_overview/reddit_instance.html
For help on configuring PRAW, visit
https://praw.readthedocs.io/en/latest/getting_started/configuration.html
CAN ANYONE TELL ME WHAT’S WRONG WITH THIS?
Your error message tells you exactly what’s wrong. Read it carfully
Thanks for this tutorial. Your plain English explanation of both the python code AND the reddit API are top notch, man.
[my bot is “ARGbot” in the “I love python” posts]
Cool, thanks!
when i run the script, it opens terminal and then closes immediately, is this supposed to happen?
How r u running the script? Run it from the command line:
python script.py
The part2 worked fine on pythonforengineers
but when I changed to another subreddit it didn’t work
ie: subreddit = reddit.subreddit(‘SEO’) # Also tried OnlineTrafficTeam but didn’t work !
Can you please help ?
Thanks
I removed the condition
# If we haven’t replied to this post before
if submission.id not in posts_replied_to:
But it didn’t seem to work neither !
You have to ask permission from other sub Reddit! Many ban bots; it’s like spamming someone.
Reddit has updated it’s website to a new look. If you come across a link above that does not work, then you will have to change the url.
Here is an example:
Above we have the link http://www.reddit.com/dev/api
Yet, it says not found. If we change the link to http://old.reddit.com/dev/api then the link should work.
In short, if a reddit link does not work change the “www” to “old”, so the link looks like “old.reddit.com”
Thats a great point, thanks!
Updated the article with this.
Your comments about the praw,ini ended a nightmare day for me thank you.
Have a look at my reddit image grabber with GUI full source if anyone wants it.
https://stevepython.wordpress.com/2018/08/17/trouble-with-my-exes