When I first began experimenting with sentiment analysis and web scraping, I wanted to try something useful and practical, so I chose to examine Reddit discussions. In this article, I'll explain how I configured a Python script to retrieve posts from Reddit and evaluate their sentiment.

Sentiment analysis is a natural language processing (NLP) technique used to determine the emotional tone behind a piece of text. We'll use it to classify text as positive, negative, or neutral, making it useful for analyzing textual content like customer reviews, social media comments, and even Reddit discussions.

For this project, I used VADER (Valence Aware Dictionary and sEntiment Reasoner). It is a sentiment analysis tool specifically designed for social media content. Unlike traditional machine learning models, VADER doesn’t require training data—it uses a pre-built lexicon of words and their associated sentiment scores. It’s great for handling informal language, emojis, and even sarcasm to some extent. Of course, you have the option of training your own model, but that comes with its own set of challenges which might be better suited for another article.

Configure your environment

Open your terminal (or Command Prompt), create a new folder for your project, and navigate to it.

mkdir reddit-sentiment && cd reddit-sentiment

Create a virtual environment (we’ll call it venv).

python -m venv venv

Activate the virtual environment.

venv\Scripts\activate

Install the required dependencies

For this project, we'll need the following packages:

  • PRAW - Python Reddit API Wrapper - this will tap into the Reddit API and pull all the comments from a given reddit thread.
  • NLTK - Natural Language Toolkit - As mentioned earlier, we'll use the VADER sentiment analysis capabilities to assess and classify each comment.
  • python-dotenv (optional, but recommended) - Loads the environment variables from a .env file. You do not necessarily need to do this, but it is good practice to not hard-code important credentials into your code.

The command below will install all three at once.

pip install praw nltk dotenv

Get Reddit API Credentials

To access Reddit data using PRAW, you'll need a Reddit app - you can create this in the Reddit Developer Portal. Log into your Reddit account and navigate to the portal. Create a new Script type application and fill in the required details (app name, description, about URL, redirect URL - for the redirect URL, use a placeholder URL temporarily). Once you're done, you'll receive your Cliend ID and Client Secret.

Create a .env file in your project's root directory with the following entries and replace the values accordingly using the details of the app you just created:

REDDIT_CLIENT_ID="your_client_id"
REDDIT_CLIENT_SECRET="your_client_secret"
REDDIT_USERNAME="your_reddit_username"
REDDIT_PASSWORD="your_reddit_password"
USER_AGENT="your_app_name"

Data Capture

Create a new file called scraper.py (or any name you'd like to give it) and add the following code:

import praw
import nltk
import os
from nltk.sentiment import SentimentIntensityAnalyzer
from dotenv import load_dotenv

# Load .env file
load_dotenv()

# Access environment variables
CLIENT_ID = os.getenv("REDDIT_CLIENT_ID")
CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET")
USER_AGENT = os.getenv("REDDIT_USER_AGENT")

# Download VADER lexicon
nltk.download("vader_lexicon")

# Set up Reddit API credentials
reddit = praw.Reddit(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET,
    user_agent=USER_AGENT,
)

# Initialize sentiment analyzer
sia = SentimentIntensityAnalyzer()

The first few lines will import the required libraries. We then load in the environment variables and initialize PRAW with the imported credentials. We also add the VADER lexicon.

In the next section, we'll set the URL of the reddit thread we want to analyze and look at the sentiment of the thread's title and selftext (the body of the original post):

# Define the Reddit post URL
post_url = "https://www.reddit.com/r/sub/comments/id/title/"

# Extract post ID from URL
post_id = post_url.split("/")[-3]

# Get the post
submission = reddit.submission(id=post_id)

# Analyze post title and text
post_text = submission.title + " " + submission.selftext
sentiment = sia.polarity_scores(post_text)

Next, let's get the sentiment for each comment in the thread. You can adjust the 50 value in the for loop to determine the maximum number of comments from the thread that will be included in the analysis:

# Analyze comments
submission.comments.replace_more(limit=0)
for comment in submission.comments.list()[:50]:  # Analyze first 50 comments
    comment_sentiment = sia.polarity_scores(comment.body)
    print(f"\nComment: {comment.body}")
    print("Sentiment:", comment_sentiment)

# Get an overall sentiment score by averaging the compound scores of the post and comments
compound_scores = [sentiment["compound"]]
for comment in submission.comments.list()[:50]:
    compound_scores.append(sia.polarity_scores(comment.body)["compound"])

overall_sentiment = {
    "compound": sum(compound_scores) / len(compound_scores)
}

Finally, you can print the overall results along with a simple statement:

# Print sentiment analysis
print("\n\nTitle and selftext (body) sentiment (OP) score: ", sentiment)

# average sentiment of the comments
print("Average sentiment of the comments: ", overall_sentiment)

# Summarize overall sentiment
if(overall_sentiment["compound"] > 0.05):
    print("The overall sentiment of the post and comments is positive.")
elif(overall_sentiment["compound"] < -0.05):
    print("The overall sentiment of the post and comments is negative.")
else:
    print("The overall sentiment of the post and comments is neutral.")

In sentiment analysis, 0.05 and -0.05 are industry standard thresholds commonly used to classify content into the three main categories.

Conclusion

This is a fairly simple implementation of sentiment analysis. I'll explore the possibility of taking this even further by:

  • Modify the script to look at more than just one thread. Perhaps use it to analyze the sentiment of an entire subreddit.
  • A future implementation of this could account for changes in sentiment over time when analyzing a given topic (based on keywords or content) or subreddit. This would require capture and retention of large amounts of data.
  • Expand this implementation to include more than just reddit comments. Twitter (X), YouTube comments, Amazon reviews, and more could all be analyzed by platform, or analyzed collectively to see how sentiment compares across platforms.

Here's the full code:

import praw
import nltk
import os
from nltk.sentiment import SentimentIntensityAnalyzer
from dotenv import load_dotenv

# Load .env file
load_dotenv()

# Access environment variables
CLIENT_ID = os.getenv("REDDIT_CLIENT_ID")
CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET")
USER_AGENT = os.getenv("REDDIT_USER_AGENT")

# Download VADER lexicon
nltk.download("vader_lexicon")

# Set up Reddit API credentials
reddit = praw.Reddit(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET,
    user_agent=USER_AGENT,
)

# Initialize sentiment analyzer
sia = SentimentIntensityAnalyzer()

# Define the Reddit post URL
post_url = "https://www.reddit.com/r/sub/comments/id/title/"

# Extract post ID from URL
post_id = post_url.split("/")[-3]

# Get the post
submission = reddit.submission(id=post_id)

# Analyze post title and text
post_text = submission.title + " " + submission.selftext
sentiment = sia.polarity_scores(post_text)

# Analyze comments
submission.comments.replace_more(limit=0)
for comment in submission.comments.list()[:50]:  # Analyze first 50 comments
    comment_sentiment = sia.polarity_scores(comment.body)
    print(f"\nComment: {comment.body}")
    print("Sentiment:", comment_sentiment)

# Get an overall sentiment score by averaging the compound scores of the post and comments
compound_scores = [sentiment["compound"]]
for comment in submission.comments.list()[:50]:
    compound_scores.append(sia.polarity_scores(comment.body)["compound"])

overall_sentiment = {
    "compound": sum(compound_scores) / len(compound_scores)
}

# Print sentiment analysis
print("\n\nTitle and selftext (body) sentiment (OP) score: ", sentiment)

# average sentiment of the comments
print("Average sentiment of the comments: ", overall_sentiment)

# Summarize overall sentiment
if(overall_sentiment["compound"] > 0.05):
    print("The overall sentiment of the post and comments is positive.")
elif(overall_sentiment["compound"] < -0.05):
    print("The overall sentiment of the post and comments is negative.")
else:
    print("The overall sentiment of the post and comments is neutral.")