This set of APIs has been created to store twitter streaming data and retrieve data based on applied filters. It is a set of 3 APIs-
Technologies used:
- Python/ Flask framework
- MongoDB (Hosted on MLab)
- Twitter Streaming API
- Installation Instructions
- API 1 - API to trigger Twitter Stream
- API 2 - API to filter/search stored tweets
- API 3 - API to export filtered data in CSV
- clone the project
git clone https://github.com/gauravkulkarni96/twitter-streaming-filter-api.git - cd to project folder
cd twitter-streaming-filter-apiand create virtual environmentvirtualenv venv - activate virtual environment
source venv/bin/activate - install requirements
pip install -r requirements.txt - run the server
python run.py
This API triggers twitter streaming and stores a curated version of the data returned by Twitter Streaming API. The streaming is done as per the given parameters.
API - http://127.0.0.1:5000/stream/<keyword>?[parameters]
(methods supported - GET, POST)
Where <keyword> can be any keyword for which streaming needs to be performed and [parameters] are as follows -
| Parameter | Action |
|---|---|
| count | the streaming runs till given number of tweets are received |
| time | the streaming runs for given time (seconds) |
Examples:
http://127.0.0.1:5000/stream/modi?count=5 (runs till 5 tweets are fetched)
http://127.0.0.1:5000/stream/modi?time=10 (runs for 10 seconds)
http://127.0.0.1:5000/stream/modi?count=5&time=10 (stops streaming whichever comes first i.e. 5 tweets or 10 seconds)
| Parameter | Meaning |
|---|---|
| code | 0 (successful)/ 1(failed) |
| message | error message if api hit fails |
| status | success/failed |
Examples:
- Successful response
{
"code": "0",
"message": "Successful",
"status": "success"
}
- Failed Response
{
"code": "1",
"message": "No Parameters Passed",
"status": "failed"
}
This API fetches the data stored by the first api based on the filters and search keywords provided and sorts them as required.
API - http://127.0.0.1:5000/search?[filters][sort][page]
(methods supported - GET, POST)
Following are the elements of the api:
The filters follow format <filter>=<value> where <filter> can be one or more of filters mentioned below and <value> should be in the specified format.
Following filters can be applied
| Filter | Meaning | Value Format (refer table below) | Example |
|---|---|---|---|
| hashtag | filter tweets by hashtags in tweet (case insensitive) | <hashtag> |
hashtag=AbKiBaarModiSarkar |
| keyword | filter tweets by keyword which was used in API 1 for streaming | <keyword> |
keyword=modi |
| name | filter tweets by name/ screen_name of users (case insensitive) | <textFilterType>-<filterValue> |
name=co-gaurav |
| location | location of the user posting the tweet | <location> |
location=delhi |
| text | filter tweets by content (case insensitive) | <textFilterType>-<filterValue> |
text=sw-gaurav |
| type | filter tweets as retweets/quote/original tweets | original/retweet/quote | type=retweet |
| mention | filter tweets by user mentions(case insensitive) | <textFilterType>-<filterValue> |
mention=em-gauravkul96 |
| followers | number of followers of the user | <numFilterType><filterValue> |
followers=lt100 |
| rtcount (mostly 0 in streaming) | retweet count of tweet | <numFilterType><filterValue> |
rtcount=gt100 |
| favcount (mostly 0 in streaming) | favourite count of tweet | <numFilterType><filterValue> |
favcount=lt100 |
| lang | Language of tweet | any specific language in BCP 47 format | lang=en |
| datestart | Tweets posted on or after a specific date | dd-mm-yyyy |
datestart=10-01-2018 |
| dateend | Tweets posted on or before a specific date | dd-mm-yyyy |
dateend=28-02-2018 |
In the format <textFilterType><filterValue>, <filterValue> can be any string and <textFilterType> can be
| textFilterType | Meaning |
|---|---|
| sw | starts with |
| ew | ends with |
| co | contains |
| em | exact match |
In the format <numFilterType><filterValue>, <filterValue> can be any number and <numFilterType> can be
| numFilterType | Meaning |
|---|---|
| gt | greater than |
| lt | less than |
| eq | equal to |
| ge | greater than or equal to |
| le | less than or equal to |
By default, sorting is done by date of tweet in descending order. Other sort types can be given by mentionin the sort parameter in the API in the format <sortField>-<order>
where <order> can be
| order | Meaning |
|---|---|
| asc | Ascending order |
| dsc | descending order |
and <sortField> can be
| sortField | Meaning | Example |
|---|---|---|
| name | sort by name | sort=name-asc |
| sname | sort by screen name | sort=sname-dsc |
| text | sort by tweet text | sort=text-asc |
| fav | sort by favourites count | sort=fav-asc |
| ret | sort by retweet count | sort=ret-dsc |
| followers | sort by follower count of user | sort=followers-asc |
| date | sort by date | sort=date-asc |
The API is paginated and returns 10 results in one call. The page number can be specified in the API call as page=[pageNo] for example page=5. Not speciftying the page number takes to page 1.
Examples
http://127.0.0.1:5000/search?favcount=lt1000&lang=en&datestart=10-01-2018&sort=date-asc
http://127.0.0.1:5000/search?name=co-gaurav&datestart=10-01-2018&dateend=15-01-2018&sort=text-asc&page=2
http://127.0.0.1:5000/search?rtcount=gt100
| Parameter | Meaning |
|---|---|
| page | current page number |
| next_page | next page number (1 if current page is last page) |
| last_page | Boolean true/false (true if current page is last page else false) |
| result | list of tweet objects that match the given filters |
| result_count | total number of matching results |
Examples
{
"next_page": 1,
"last_page": true,
"result": [{"lang": "en", "_id": "5a83f5063fe5103329f1f788", "text": "RT @LalitKModi: Thank you #RichardMadley \ud83d\ude4f\ud83c\udffb most appreciative of your kind words https://t.co/erkxF1q46i", "created_at": "2018-02-14 08:36:16+00:00", "hashtags": ["RichardMadley"], "retweet_count": 0, "user_mentions": ["LalitKModi"], "is_quote_status": false, "user": {"screen_name": "LaraeGalang3", "location": null, "_id": "5a83f5053fe5103329f1f786", "id": 963690891935473666, "name": "Larae Galang"}, "id": 963693359742312448, "favorite_count": 0, "is_retweet": true}],
"page": 1,
"result_count": 1
}
{
"next_page": 3,
"last_page": false,
"result": [...],
"page": 2
"result_count": 43
}
This API returns the data in CSV. If opened in browser, it downloads a CSV file containin the data and if hit using another program, it returns the data in CSV format.
API : http://127.0.0.1:5000/getcsv?[filters][sort]
(methods supported - GET, POST)
[filters] and [sort] are the same parameters as defined in the Second API and there is no [page] parameter as all the matching data is returned.
Examples
http://127.0.0.1:5000/getcsv?hashtag=richardmadley
http://127.0.0.1:5000/getcsv?favcount=lt1000&lang=en&datestart=10-01-2018&sort=date-asc
If the request to the API is sent using a browser, it downloads a CSV file containing data based on filters. If the request is sent by another program/ application like Postman etc., the API returns the data in CSV format.