There are two type of approaches that are used by the recommender systems. Collaborative filtering and content-based filtering. Today, I'll introduce the collaborative filtering approach here.
Collaborative filtering methods are based on collecting and analyzing a large amount of information on users' behaviors, activities or preferences and predicting what users will like based on their similarity to other users.
If a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue x than to have the opinion on x of a person chosen randomly. So as you can see in the above picture, this is the simple illustration about collaborative filtering.
So, besides Netflix and Amazon which mentioned in the previous posts, many other websites are using collaborative filtering for recommender system such as Last.fm, a music radio website; Facebook, MySpace, LinkedIn, these network website recommends new connections via collaborative filtering (CF).
For websites who want use CF system, they have to be aware of several problems.
- Data sparsity: Cold start problem. Briefly, a brand new product need to be rated by a substantial amount of users before it could be recommended. The product won't be limited if the system is using the content-based approach, which we'll see it in the next post.
- Scalability: the computing power need to be very strong if the websites have a great amount of products and customers.
- Synonyms: the same or very similar items which have different names would not be recognize the same within the CF systems.
- Grey sheep: users who are not consistent with their like and dislike. This may cause the CF fails to recommend items.
- Shilling attack: people may give their own items a lot of good rating and bad ratings to their competitors.
To avoid these problems, some of the companies prefer to use the content-based approach, and some of them use the "Hybrid" system! We'll see these approach in the next post.