|
Hacking Netflix Prize: Calculating 316 million movie correla... KNN is one of the most popular CF algorithms. Central to KNN is how to define the neighborhood relationship, or the distance between two objects. For this purpose, Pearson Correlation is a pretty good measure is used frequently (see my previous post for my basic KNN approach). In the Netflix dataset, there are 17770 movies, so we need to calculate 17770*17770, or about 316 million Pearson correlations. Not a trivial task. In this post, I'll describe the tricks I used to optimize my Pearson Correlation calculation, which cut my running time from 2.5 hours to less than 2 minutes. It won't help your RMSE directly, but it may help indirectly by allowing you to explore the KNN parameter space faster. And although I used Pearson Correlation, the methods described in this post can be applied to many other neighborhood measures too. http://dmnewbie.blogspot.com/2009/06/calculating-316-million... tags: knn ml
Python nearest neighbors binary classifier | This Number Cru...
|