The goal of the YouTube video recommendation system is plain: to provide personalized high-quality video recommendations to its users. The way YouTube is able to accomplish this goal is anything but. Unsurprisingly, the folks at Google have developed elegant solutions for this problem as explained in the paper ‘The YouTube Video Recommendation System.’ What follows here is an abbreviated and simplified account of this paper for the non-technical.
There are a number of challenges that get in the way of providing personalized video recommendations. First, the amount of video uploaded to YouTube is staggering. Second, much of this video has poor metadata such as incomplete or irrelevant titles and descriptions. Third, the metrics that are available to the YouTube recommender for measuring user interest are much vaguer than those available to other recommender systems like Amazon. For example, purchasing a product is a clearer indicator of user interest than is watching a video. Furthermore, YouTube video recommendations must be fresh as many YouTube videos have a short shelf life, and older videos will often be of little interest to users.
The YouTube recommendation system draws its data from two main sources. The first is content data, such as metadata like titles and descriptions. The second is user activity data, which is categorized as explicit attributes, such as ratings and favorites, and implicit attributes, such as view time.
Determining Related Videos and Recommendation Candidates
Before it generates recommendation candidates, the system determines a set of related videos that a user is likely to watch after viewing a given seed video. To do so, it makes use of a method called co-visitation or association rule mining to identify pairs of videos watched in a given session and compute a relatedness score for these videos. The system then combines the related videos’ association rules with a user’s activity on the site, such as videos watched and favorited by the user, to create what it calls a seed set. Once this is done, it traces paths of related videos out from this seed set to generate candidate recommendations. Think of the seed set as the center of a web and the potential recommendation candidates as points on that web extending outwards from the center. The closer to the center of the web a point is, the more related to the seed set it is; the farther out, the less related.
Ranking Recommendation Candidates
Once a set of candidate recommendations have been generated they are ranked according to various signals, which can be organized into three groups: 1) video quality, 2) user specificity and 3) diversification. Video quality signals include metrics such as view count, video ratings, comments, favorites and sharing activities. User specificity signals are used to boost videos that are similar to a user’s unique preferences. Seed video properties such as view count and time of watching are used to generate these user specificity signals. In order to increase diversity, recommendation candidate videos that are too similar to one another are removed and are replaced with more varied content. The logic for such diversity is that a user has multiple interests and corresponding viewing preferences; therefore, a set of recommendations too similar only to the seed video will not accurately reflect the user’s overall tastes.
The YouTube recommender system has done well to improve user engagement. At the time of the paper’s publication, recommended videos accounted for approximately 60% of clicks on the homepage. Furthermore, it was found that over a 21-day period, the click through rate (CTR) for recommended videos performed at 207% of the average CTR for Most Viewed videos.
Room for Growth
While the YouTube recommender system has performed well, it seems there is room for improvement. Indeed, other recommender systems, such as the Trouvus engine, have achieved greater results than those documented in the YouTube paper. Furthermore, it should be noted that content providers that use YouTube to host content, will not necessarily have their own content recommended to users by the YouTube system. It makes sense, therefore, for such content providers, assuming they have their own digital properties (on which they can place their videos), to look into acquiring their own recommender system.