Tweets Sentiment Analysis

Overview

This project implements a sentiment analysis model to classify tweets as positive or negative using a Random Forest classifier. The model leverages Natural Language Processing (NLP) techniques to preprocess text data and extract meaningful features for classification.

Installation

To set up the project, clone the repository and install the required packages:

git clone https://github.com/akshargrover/tweet-sentiment-analysis
cd tweet-sentiment-analysis
pip install -r requirements.txt

Usage

To run the sentiment analysis model, execute the following command in your terminal:

python twitter-sentiment-analysis.ipynb

This will train the model on the training dataset and evaluate it on the test dataset.

Data

The dataset used for training and testing consists of tweets labeled as positive or negative sentiment. The data is split into training and test sets to evaluate the model's performance.

Training Data: Contains tweets used to train the model. The training dataset is located in the twitter_training.csv file.
Test Data: Contains tweets used to evaluate the model's accuracy. The test dataset is located in the twitter_validation.csv file.

Dataset Details

The dataset includes a variety of tweets collected from Twitter, ensuring a diverse representation of sentiments.
Each tweet is labeled as either positive,negative, irrelevant and natural allowing the model to learn from all classes.
The data is preprocessed to remove noise and irrelevant information, enhancing the model's performance.

Model Training

The model is built using the following steps:

Data Preprocessing:
- Text normalization (lowercasing, removing punctuation, etc.)
- Tokenization and lemmatization using spaCy
- Vectorization using TF-IDF to convert text into numerical features.
Model Selection:
- A Random Forest classifier is used for its robustness and ability to handle high-dimensional data.
Hyperparameters:
- n_estimators: 200 (number of trees in the forest)
- max_depth: 15 (maximum depth of the trees)
- class_weight: 'balanced' (to handle class imbalance)

Results

The model's performance is evaluated using accuracy, precision, recall, and F1-score metrics. The results are printed in the console after running the model.

Example Output:

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
requirements.txt		requirements.txt
twitter-sentiment-analysis.ipynb		twitter-sentiment-analysis.ipynb
twitter_training.csv		twitter_training.csv
twitter_validation.csv		twitter_validation.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweets Sentiment Analysis

Overview

Table of Contents

Installation

Usage

Data

Dataset Details

Model Training

Results

Example Output:

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tweets Sentiment Analysis

Overview

Table of Contents

Installation

Usage

Data

Dataset Details

Model Training

Results

Example Output:

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages