Hunch is one my favorite startups to watch. Here's why:
- They're building a real-time recommendation engine on a massive scale. Real-time recommendations are hard.
- They're getting people to populate a huge database of facts about themselves for Hunch and have fun doing it.
tl;dr - Interesting CS, interesting engineering problems, great product design, and a straightforward business model.
1. Real-time recommendations Recommendations involve large-scale graph & matrix processing and datasets hundreds of times the size of RAM. Existing systems like
Hadoop do it offline in batch mode. At Hunch, it has to be done in soft real-time, meaning milliseconds. In those milliseconds you need to fetch lots of different small pieces of data (and without the luxury of knowing what you're going to need ahead of time – bye bye partitioning) and then process them to get an answer.
Caching, the usual technique for speeding things up, is a bitch to do right in Hunch's scenario. The data has way too many dimensions to attempt to pre-compute the answers, and cache invalidation & coherency, and data replication are complicated by the soft real-time requirement (changes have to propagate
fast).
Parallelization is not as useful as it might seem at first either – matrix operations require high data locality and don't lend themselves well to distribution in a real-time processing scenario (exchanging chunks of data between machines is expensive). Another aspect of Hunch that makes this whole thing yet harder is irregularity – you could have a huge spike in the number of concurrent users on the site, and you can't just throw a dozen additional machines into the cluster to deal with it, like you can with something like video transcoding or crawling webpages.
2. Great product design Hunch's engineers have done a great job overcoming the engineering challenges. But even the best recommendation engine is useless without data (forgetting for a second that without data it would've been impossible to build "the best recommendation engine" in the first
place).
More data beats good algorithms. A recommendation system as ambitious as Hunch needs
lots and lots of data. Where do you get it?
Let's track back a little. #1 thing that Hunch needs to work: lots & lots of taste profiles. What's a taste profile? - a set of answers to questions like "Do you prefer French Bulldogs or Corgies?", "Do you live in a city or in the country", and "Do you drink your coffee black or with milk?", and the larger this set the better. So the #2 thing you need is taste questionnaiers, again the more of them the better.
This is a very big and very difficult problem to solve, and Hunch's product designers are doing a great job. Basically, they got everyone
to design and fill in a huge database for them, while having fun while doing so. To use the familiar format:
- Build a real-time recommendation engine.
- ????
- Profit!
The product designers at Hunch are working out the (????) successfully.