According to Francesco, the author of Recommender System Handbook, content-based filtering is using the technique to analyze a set of documents and descriptions of items previously rated by a user, and then build a profile or model of the users interests based on the features of those rated items. Using the profile, the recommender system can filter out the suggestions that would fit for the user.
Go deep into the process, there are three steps of the recommendation process.
- Content analyzer: the main responsibility of the process is to represent the content of items. it can extract the information or specific features from the item by feature extraction techniques.
- Profile learner: this process will collect data representative of the users preferences and try to generalize the data, then construct the user profile.
- Filtering components: this process will try to match the features of the user profile with the features of the items. And then, the system will recommend items that fit for the user.
Here is a flow chart of the process captured from Recommender System Handbook.
it tell you how the system processes the information source and user profiles, so that they can recommend items based on the process.
So, compared to collaborative filtering, there are some advantages and drawbacks of content-based filtering that we should understand.
Advantages
- User independence: collaborative filtering needs other users' rating to find the similarity between the users and then give the suggestion. Instead, content-based method only have to analyze the items and user profile for recommendation.
- Transparency: collaborative method gives you the recommendation because some unknown users have the same taste like you, but content-based method can tell you they recommend you the items based on what features.
- No cold start: opposite to collaborative filtering, new items can be suggested before being rated by a substantial number of users.
Disadvantages
- Limited content analysis: if the content does not contain enough information to discriminate the items precisely, the recommendation will be not precisely at the end.
- Over-specialization: content-based method provides a limit degree of novelty, since it has to match up the features of profile and items. A totally perfect content-based filtering may suggest nothing "surprised."
- New user: when there's not enough information to build a solid profile for a user, the recommendation could not be provided correctly.
There are different merits and drawbacks either for collaborative filtering or for content-based filtering. So most of the websites they start to use the hybrid system to combine the advantages of those two method and try to give their customers an easier and more valuable recommendations.