Why is this project required?
Our first year design class was approached by NTP with a question:
Can you design an application to crowd source tornado sightings around Canada?
My team developed the following need statement that we would need to meet for our project to be useful: NTP needs a way to compile relevant social media posts efficiently to collect more data points for storms.
Final Design Documentation
Since I was the one with technical experience, I was tasked with programming the implementation of our design.
Item | Price | Description/Usage |
---|---|---|
Mapbox API Key | Free (50,000 map loads) | Used for the front-end map |
Heroku Hosting | Free (Student Plan) | Hosting service that can easily integrate with existing version control. Used to host the back end for the application |
Vercel Hosting | Free | Hosting for Next.js applications. Used to host the front-end of the application |
MongoDB Database | Free (depending on the hosting solution) | We are using the MongoDB database for its NoSQL structure to allow complex objects that Twitter uses for their backend as well. We are hosting a database on a group member’s computer but without that, it would cost an estimated $7/month. |
Auth0 | Free (not going to exceed usage) | Auth0 provides and easy to use service for authentication, and is a service we leveraged to provide user accounts for NTP to manage access to the application |
The backend of the website is designed with Python, within the Python code is a scraper for Twitter known as snscrape
. This library has support for multiple social media platforms, but we chose to only use the Twitter package. We can target specific locations in our queries using this package which gives it an edge over other libraries such as Tweepy
. But to schedule the usage of snscrape
, we need to create multiple different threads for each query for asynchronous operation. We chose to use BackgroundScheduler
which is an easy-to-use library that supports adding and removing threads dynamically during the runtime of the application. In these threads, the queries will perform a request to Twitter in intervals equivalent to the frequency of the query defined by the user. The resulting Tweets then need to be stored before being sent to the user. We are currently using MongoDB which has a native connection library built for Python. We use that to connect to the remote database and add the queries by performing an update operation. This will make sure there are no duplicate Tweets in the database that have the same ID. Once we have found the Tweets, we put them through the algorithm we have developed to determine the relevance of the Tweets to the selected keywords and the content interaction, basically finding the trending status of the Tweet. The resulting relatabilityScore
is what we use to determine how the Tweet is displayed to NTP. The higher the score, the more relevant and important the Tweet is to NTP. When implemented in Python, the tweet is broken up into different metrics and analyzed then saved as a Tweet object. This code is seen below:
def solveAlgo(query, tweets):
# Initialize empty list of tweets
tweetList = []
for tweet in tweets:
# Calculate the total media attached to the post
mediaCount = 0
media = []
if tweet['media'] is not None:
for m in tweet['media']:
if type(m) == sntwitter.Photo:
mediaCount += 1
media.append({
'type': 'photo',
'url': m.fullUrl
})
elif type(m) == sntwitter.Video:
mediaCount += 1
for videoType in m.variants:
if videoType.contentType != 'application/x-mpegURL':
media.append({
'type': 'video',
'url': videoType.url,
'contentType': videoType.contentType
})
likes = tweet['likes']
retweets = tweet['retweets']
replies = tweet['replies']
# Calculate the interaction score of the tweet
interactionScore = 0
if (likes + retweets + replies) != 0:
interactionScore = (likes**2 + retweets**2 + replies) / math.sqrt(likes**2 + retweets**2 + replies**2)
# Calculate the keyword count of the tweet
keywordCount = 0
for k in query.keywords:
keywordCount += tweet['content'].lower().count(k.replace('(', '').replace(')', '').lower())
# Calculate the relatability score of the tweet
relatabilityScore = ((mediaCount) + (interactionScore)) * keywordCount
# Default to query location if tweet location is not available
location = {
'type': 'Point',
'coordinates': [float(query.location.split(',')[0]), float(query.location.split(',')[1])]
}
if tweet['coordinates'] is not None:
location = {
'type': 'Point',
'coordinates': [tweet['coordinates'].longitude, tweet['coordinates'].latitude]
}
# Create a new tweet object
_tweet = Tweet(
tweet['id'],
query.id,
likes,
retweets,
replies,
tweet['date'],
location,
tweet['content'],
media,
keywordCount,
interactionScore,
relatabilityScore
)
tweetList.append(_tweet)
return tweetList
This then moves on to the connection between the front-end and back-end. We use the HTTP protocol to transfer the JSON data format between the data sources. To communicate, we chose the flask library to create an HTTP server for the Python backend. We set up multiple endpoints for a client to interact with and provide data to the database. Whenever the client needs to retrieve data, they will use a GET request and whenever new data needs to be sent to the back end, a POST request must be used.
Improvements
If we had more time, we would first increase the accuracy of our algorithm by using a machine learning model based on previous storms validated by NTP. Although our current method does work, it was developed using ~10 reported tornadoes by NTP. With the stored tweets we have found, we can build an AI model to detect better tweets by using image data along with sentiment analysis to detect the context somebody is tweeting as well so more unique and relevant content can be brought forward. As a team, we also brainstormed a partner app for training the algorithm where NTP can sift through tweets kind of like Tinder, where they would be given tweets that we have found, and they can determine whether they are good for detection. By doing this, NTP would be training a neural network that would be used for the previous improvement. We would also develop a mobile-friendly version of the application. With the current app, we were aiming for simplicity, but by adding another screen size to the application, the amount of code for the website layout and design would double as another grid layout would have to be created for smaller screen sizes that have a vertical screen. If we had more time this would have been possible to implement.
What I Learned
I learned the importance of multiple ideas in a team and also good communication within a team can lead to a better final product.
In the end, my team won the Client Choice Award which means that out of the 7 other teams, my team created the best solution to their problem. NTP is now using this project in their current tornado identification process. As tornadoes become more prevalent in Canada, I am very proud of my contribution to help predict tornadoes in the future and potentially save lives.
Source Code
The source code for the project and more detailed documentation can be found on my GitHub. The backend can be found here & the frontend can be found here.