Projects

Data Science


NLP Sentiment Analysis

Social networks and media platforms are challenged with effectively facilitating open conversations. This results in many such communities preventing users from expressing their opinions, giving feedback etc. on their platform. On the other hand, in open forums the possibility of being verbally abused and harassed by strangers online is not a great motivator for people to engage and learn from one another’s perspectives.

Goal: Develop a set of NLP models that can classify texts that are toxic, severe-toxic, threatening, insulting, hateful and/or obscene in nature.

CAUTION!!! You may find content of this dataset offensive!

GitHub: https://github.com/iamontheinet/datascience/tree/master/NLP_Sentiment_Analysis

Shelter Animal Outcomes

Every year, approximately 7.6 million companion animals end up in US shelters. Many animals are given up as unwanted by their owners, while others are picked up after getting lost or taken out of cruelty situations. Many of these animals find forever families to take them home, but just as many are not so lucky. 2.7 million dogs and cats are euthanized in the US every year.

Examine features like breed, color, sex, age, etc. to see if/how they relate to shelter animal adoption outcomes.

GitHub: https://github.com/iamontheinet/datascience/tree/master/shelter_outcome

Kickstarter Campaigns Analysis 

Use features like location, length of campaigns, ideal pledge, project categories, etc. to predict campaign outcomes and answer questions like:

* What’s the best length of time to run a campaign?

* What’s the ideal pledge goal?

* What type of projects would be most successful at getting funded?

* Is there an ideal month/day/time to launch a campaign?

GitHub: https://github.com/iamontheinet/datascience/tree/master/kickstarter_campaigns

Other Projects

Amazon Alexa meets Big Data Platform

Imagine reliably asking Amazon Alexa, Amazon Echo Dot, Google Home, or a chatbot to run analytics queries against a big data platform. For example, “What were the top three revenue generating products last week?” or better yet “Start my Spark cluster” — all without firing up your computer, scrolling through a report, looking through spreadsheet columns, or asking an analyst or data admin. Big Data at the tip of the tongue–pun intended.

The concept of conversing with a computer is very interesting and has been around for a while–think Star Trek’s “LCARS” and Hal from “A Space Odyssey”. While we might be a long way off from those realities, recent advancements from Amazon, Google, Microsoft, IBM and other natural language and AI technologies have brought us closer. We can expect a lot of new, creative services being built in the near future.

Meet Qulexa — Although it’s just a sample application, Qulexa in my mind is a preview of what I am envisioning where we’re headed when it comes to accessing insights using AI, ML and Big Data technologies as never possible before.

GitHub: https://github.com/iamontheinet/qulexa

REST API for storing key-value pairs for IoT devices

Node.js application that runs on Tessel and uploads data (Light and Sound levels, Temperature, and Humidity) using HTTP POST via REST API. Following this application that runs on Tessel, I created a single-page application that retrieves the data using HTTP GET via REST API and visualizes it. So, just like how a lot of the things get developed, hackaway.io was born out of necessity.

While hackaway.io is not a full-blown “product” by any means and it is not intended for production, it sure is dead simple to use in your hacks and POCs.

GitHub: https://github.com/iamontheinet/hackaway.io

Concurrency in web apps using Aerospike NoSQL DB

Concurrency control is one of the main aspects of multi-player games where all the checks, conditional writes and game state updates must be made as fast as possible and with minimal client/server calls in order to keep the game fair and square. This is especially critical in turn-based games where careless implementation (such as putting code that alters the game state in the client) can lead to concurrency related “race condition” from creeping in.

In this application, concurrency control is achieved by putting conditional writes and game state updates on the server using User Defined Functions. UDFs are a powerful feature of Aerospike DB and they can be used to extend the capability of the Aerospike DB engine both in terms of functionality and performance.

GitHub: https://github.com/iamontheinet/concurrency-tictactoe-app