SnapChecker

SnapChat Data Lost OH NO

Project Description

New Years Eve marked a scary moment for many as we entered into 2014. Right after the Target data breach where tens of millions of credit cards worth of data was compromised, news started trickling out pointing to a data breach at SnapChat. Due to a long known security vulnerability going unaddressed, a group of power users were able to write a script that could query the SnapChat servers for phone numbers and download the accompanying account details. This lead to a data-set of approximately 4.6 million users' data being posted publicly for download.

As soon as I located the data, I was able to parse it to a usable format and work with the collection in a manner that allowed me to use it on a shared web server without creating an unnecessarily high load. The site was built for no purpose other than looking up a specific phone number or username and seeing whether it is compromised or not. The site was also able to determine the probability of an "unaffected" phone number having been leaked but not included in the data-set. This statistic was based on geographic densities of the phone number area codes leaked. The site also gave information to the user about the possible ramifications this leak could have.

To view the project, you can visit this link and take a look around the website.

Although the site may be considered basic and lacking visually, keep in mind it was thrown together in less than an hour to provide users with one of the first services capable of using the data leak in a productive manner. The dataset took far longer to add and proved highly difficult to manage as 4.6 million database entries is far from insignificant. To make manners worse, every bit of the data was housed in one massive INSERT statement making it impossible to add to a database, especially given the lack of transactions for data integrity. After parsing the information, all of these issues were addressed proving I am more than capable of handling exceptionally large data sets.