A multipart project using Apache Hadoop to run Map Reduce jobs on a csv collection of scores of the universities across the UK, to cluster and find most used words. Additionally a regression classifier written in python to encode this into a sparse matrix and build a multi-dimensional classifier to predict future scores