Netflix prize random whole numbers

General, Netflix October 11th, 2006

Discussing last night’s efforts with Andrew on the way in this morning, we were hashing out why the random algorithms always generated the same RMS values, and the fact that predictions can have decimal values while recommendations are whole integers. I realized I never tested random whole numbers against the recommendations, so I did a quick test. For whole values 1 to 5 on the probe set, I got a RMSE of 1.929. That’s better than my 0.00-5.00 attempt (2.176) but not as good as my 1.00-5.00 attempt (1.748). And that’s about what I expected.

I might need to do some poking around ACM for papers on predictive algorithms. Would Bayes help me here?

Other interesting data I have to play with is the date the recommendations were made, the title and year of the movie, and the general overall numbers for how many recommendations were made for a certain movie. I’m sure I can pull something probative out of that.

Netflix Prize night one

General, Netflix October 11th, 2006

I broke down and registered for the Netflix Prize so I could see their datasets. Here’s the results from day one of testing. If you get bored by posts where I geek out, you best browse away now. Also, if you’re thinking about tinkering with the Netflix Prize stuff, and don’t want any discovery/spoilers, you should probably skip this too. (Not that I would tell you my winning solution, but this is the start of me fleshing out a search for an algorithm. *cue the Indiana Jones music*)

The dataset is kinda fun. It’s 650MB compressed, and just over 2GB decompressed. There are 17,700 movies with anonomyzed rating data associated with it. Some movies have just a few dozen ratings, others have tens of thousands. The file formats are all text. Ratings go from 1-5 on an integer scale, (but predictions can be floats.)

The general premise of the contest is to predict ratings for movies. They “test” this against what people actually rated the movie. The difference between your prediction and the actual rating is evaluated. The root mean square of these errors (RMSE) is the metric. Their current system has a RMSE of .9514. To win a “progress prize,” an algorithm has to get down to .9419. To win the $1M, you have to do better than .8563.

They only allow one submission a week, so they give you a small subset of data (probe set) similar to what they eval you on (1.4M ratings to predict, the qualifying set is 2.8M) that you can compare against the training set to act as solutions for self-evaluation.

My first task was getting that probe set collected. A small perl script took the data out of the training set file and created one 1.4M line text file with the movie/user/rating information together on each line.

Then, it was time to test an algorithm. We talked about randomness at wings tonight, so my first stab (as a base case) was to randomly select a 0.00-5.00 value for each entry in the probe set and calculate the RMSE from there. It was 2.17647. That’s not good, but it gives lots of room to improve. I realized that users cannot select below one, so I really want rand()*4+1 to give 1.00-5.00. That yielded 1.748. Progress! The odd thing is, a 1.4M sample set negated any benefit from the randomness in the algorithm. Each time I ran it, I got almost exactly the same RMS value. I ran it several times. I guess I need to learn more about RMS, as it’s telling me that on average any random value I can come up with is going to be 1.748 away from my 1.00-5.00 target. That’s interesting; I expected more swing because of the randomness, but I guess not with 1.4M samples.

My next algoritm reduced the error distance in the RMS equation — I predicted everyone would rate every movie a 3. It’s in the middle between 1 and 5, so it seemed logical. My program generated a delta between each real rating and the number three, and my RMSE shrunk to 1.31335.

The next algorithm (you didn’t expect me to give up there, did you?) involved averages. Given the 100M or so ratings in the training set, what’s the average movie rating? Obviously, that would be closer than the median for the training set, so it should see some improvement. A small script to mine the movie averages, and then assess a global average over all movies in the training set, told me the global average was 3.61. Sounds reasonable. That value as a constant prediction yielded a RMSE of 1.12923; my best so far.

But since I have the per-movie averages, I’ll do better to recommend the individual average for that movie instead of the global average. This is significant because it’s the first time one of my algorithms looks at the movie data to determine a recommendation. It’s probably the simplest one I can do, but it’s a start. Those averages ranged from 1.29 to 4.72, and yielded a RMSE of 1.05195. If I can drop another .10 in a future iteration, I’ll not only be as good as Cinematch, I’ll also be on the Leaderboard.

I have ideas on how to improve upon these numbers, both entirely from data mined within the training set and from outside information, but from a little bit of work tonight, at least I have a starting point. And a new project. If you’re interested in helping me out, let me know. I’ll work on my homework tomorrow. ;)