Where tweets meet technology! 🚀
We’re here to decode the world of Twitter, one tweet at a time! By harnessing real-time streaming, big data processing, and intuitive visualization, we bring Twitter analytics to life. Whether it’s hashtags, sentiment, or trends, we’ve got it all mapped—literally. 🗺️✨
We’ve introduced an AI-powered assistant you can chat with directly! 🤖🗣️
- Ask it about how the website works
- Inquire about tweets on the map from your current location to another, or even between any two locations
- It's like having a smart guide by your side—right inside the platform!
Our TweetStream Analytics Pipeline has a touch of magic and a dash of science:
- 💬 Real-time Tweet Ingestion: Capturing the pulse of the world as it happens.
- 🔍 Intelligent Processing: Extracting meaning, hashtags, and sentiment.
- 🗂️ Smart Storage: Keeping data searchable and ready for exploration.
- 🌟 Stunning Visualizations: Beautiful maps, trends, and sentiment scores at your fingertips.
| 📂 Repository | 📝 Description |
|---|---|
| ElasticSearchManager | Indexes tweets for lightning-fast queries and scalable storage. |
| StreamProducer | Streams tweets live from Twitter or simulates them for testing. |
| StreamConsumer | Transforms tweets with hashtags and sentiment analysis for actionable insights. |
| Website | A stunning web app for interactive maps, trends, and sentiment gauges. |
| APIServices | RESTful APIs powering the frontend with meaningful data. |
| DocsAndReadme | Your one-stop shop for documentation, diagrams, and task tracking. |
The Twitter Stream Processing Pipeline is designed to handle real-time streaming tweets, process and transform the data for efficient querying, and visualize results in an interactive web application. This pipeline integrates multiple components, leveraging scalable tools and technologies like Apache Kafka, Elasticsearch, Apache Spark, and React to ensure seamless ingestion, processing, storage, and visualization of Twitter data.
- Collects a continuous stream of tweets using the Twitter API or a simulated tweet generator.
- Maintains the incoming tweet stream in an Apache Kafka topic for intermediate storage and scalability.
- Prepares tweet data for efficient searching over text, time, and space.
- Extracts hashtags from tweet text and stores them as a nested array.
- Uses Spark NLP or TextBlob to classify tweet sentiments as "Positive," "Negative," or "Neutral."
- Stores processed tweets and their metadata in an Elasticsearch database.
- Designed with a schema to support querying by text, time, hashtags, and geo-coordinates.
- Provides an input field where users can search for tweets by keyword.
- Visualizes results with:
- Map View: Displays tweets containing the keyword on an interactive map using Leaflet, based on geo-coordinates (longitude/latitude).
- Trend Diagram: Shows the temporal distribution of tweets (hourly and daily aggregation).
- Sentiment Gauge: Reflects the average sentiment score for tweets over a specified period of time.
- Reads geolocated tweet data from JSON files using Apache Spark.
- Sends data to a Kafka topic (
tweets) in batches (default size: 1000 records). - Simulates real-time streaming with a configurable delay (default: 30 seconds).
- Consumes tweet data from the Kafka
tweetstopic. - Filters geolocated tweets and extracts relevant fields (text, hashtags, time, coordinates).
- Sends filtered data to the Kafka topic
processedTweets.
- Consumes data from the Kafka
processedTweetstopic. - Performs sentiment analysis using Spark NLP.
- Classifies tweets as "Positive," "Negative," or "Neutral."
- Sends enriched data to Elasticsearch for visualization.
- Stores and indexes processed tweet data for querying and analytics.
- Configures Elasticsearch index with fields like text, hashtags, sentiment, coordinates, and timestamp.
- Built with FastAPI to query Elasticsearch for tweet data and sentiment insights.
- Features include:
- Sentiment scores and geo-based trends.
- Hashtag ranking and temporal analysis.
- Custom date range filtering.
- React.js-based frontend for visualizing tweet data.
- Features include:
- Latest tweets and sentiment distribution.
- Trending hashtags and topics.
- Sentiment by location (interactive maps).
- Daily/hourly sentiment trends.
To run the system, ensure the following tools are installed:
- Apache Kafka
- Apache Spark
- Elasticsearch (v7.x or higher)
- Python 3.8+
- React.js
- Docker (optional, for Elasticsearch setup)
Start a Kafka server and create required topics: tweets and processedTweets.
Use Apache Spark to read tweet data and send it to the Kafka tweets topic.
- Deploy the Filter Consumer to process tweets and forward to
processedTweets. - Deploy the Sentiment Analysis Consumer to classify sentiments and store them in Elasticsearch.
Use Docker for a quick setup:
docker run -p 9200:9200 -d docker.elastic.co/elasticsearch/elasticsearch:7.9.3Create an Elasticsearch index for storing tweet data:
curl -X PUT "localhost:9200/tweet" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"created_at": {"type": "date", "format": "EEE MMM dd HH:mm:ss Z yyyy"},
"text": {"type": "text", "analyzer": "standard"},
"hashtags": {"type": "nested", "properties": {"text": {"type": "keyword"}}},
"coordinates": {"type": "geo_point"},
"sentiment": {"type": "float"}
}
}
}
'Start the FastAPI application to query tweet data from Elasticsearch.
Deploy the React.js frontend to visualize tweets, sentiments, and trends.


