Monthly Archives: November 2015

The Likeability index: a method for classifying likeability of YouTube videos

I have had a YouTube channel for about 5 years.  I have a few hundred subscribers, and a few dozen of them regularly watch my videos and comment.  I think I have had around 30,000 views a year.  All in all, my contribution is a mere speck on the sandy beaches of YouTube content.

What makes a YouTube video popular?

Partly for narcissistic reasons, and partly out of curiosity, I want to know what it is that people like and don’t like on YouTube.  I find it fascinating that a video like “Charlie bit my finger” gets thumbs down, for example.  What could possibly motivate someone to dislike this video?  Are they just internet trolls?  Are they child haters?  Did they accidentally click the wrong thumb sign?


The problem is that while views, likes and dislikes are all important in determining what people think about a YouTube video, it is hard to interpret them together to get an overall sense of likeability.  What is needed is an index that combines this information into a single, sensible metric. Sounds like a job for someone with some free time on a Sunday afternoon!

A likeability index!

To start, I had to sample videos.  I know of no way to select a true random sample of YouTube videos, so instead I created a list of random words (found here) that I then used as search terms on YouTube, and then picked the video at the top of the results page associated with each search term. For each of these videos (N=125) I collected date of upload, number of visits, number of likes and number of dislikes.  Four of these videos had no information on likes and dislikes, so I dropped them.  This left me with 121 videos.

Then I came up with an index for ranking the likeability as follows.  First, I calculated the total views, likes and dislikes per day based on the date I extracted the data and the date the video was uploaded.  I refer to these are TPD, LPD and DPD, respectively.  Then I calculated two ratios


with the following reasoning.  I assume that if a video is ‘perfectly’ liked, then every visit would correspond with a like, and TPD would equal LPD.  Similarly, if a video is ‘perfectly’ disliked, then every visit would correspond with a dislike (TPD would equal DPD).  To combine these two metrics into a single index I could just subtract one from the other, but this would suggest that likes and dislikes are of equal importance when judging the relative likeability of videos.  As it turns out, the ratio of likes to dislikes on the sample of videos I took is about 20 to 1.  I take this to mean that dislikes matter considerably less than likes when it comes to the judgement of a YouTube video’s overall likeability.  So I combine these two ratios into a single index


that can be easily modified with a different constant if so desired.  It’s worth noting (I realized this after the original post) that the formula can be simplified algebraically to remove the day term.  This leaves us with

There are important shortcomings worth noting.  This likeability index does not measure popularity, which is probably more important from a  marketing standpoint.  Furthermore, it is probably not constant over time; when a video is posted, the subscribers to the channel will probably give it a short term boost in likeability index, since they probably have a more positive disposition towards the channel than the average viewer.  This figure shows the relationship between the likeability index and the number of days since upload in my sample of videos from YouTube.


As you can see, the longer a video is on YouTube, the lower its likeability rating.  I suspect this has to do with the flow and ebb of popular culture; interest in a video starts out with enthusiasm, which gradually gives way to apathy over time.  So to make a proper comparison of likeability, one should probably compare videos uploaded at a similar time–say, the same year.

Here is a graph of my most popular video (about 16,000 views), with the likeability index on the Y-axis, and date on the x-axis.

YouTube likeability index

As you can see, the likeability index for this video stabilizes at around 0,00225, which I would interpret as the ‘natural’ or true likeability of that video.  The early variations reflect clusters of likes I probably got from subscribers in the past.  How does this compare with the likeability of videos in my sample?  Well, the median likeability score in my sample is about 0.005, which means that I seem to have a poor likeability score on my video. However, the median likeability index for the year my video was uploaded is 0.003, which puts me in more middling territory.  Thank goodness!

My conclusions

There is no obvious benchmark for evaluating this likeability index, however I think it is a nice tool for systematically assessing the likeability of a video on YouTube


Provinces 2.0

Canadian provinces have economic, historical and cultural characteristics that make them distinct from one another, but it is pretty natural to ask whether or not the administrative boundaries of provinces match the economic and social landscape of the country.  Do the provincial boundaries divide into regions of shared interests, or are do they split up more natural groupings of people into artificial geographic areas?

One way to explore this question is by mapping out census data on things like employment, language, population density, etc., and then visually comparing how these attributes are distributed across the country to determine whether or not they align with existing provincial boundaries.  Here is a map of census divisions portraying the distribution of median income based on the 2006 Canadian census:


From this map we can observe where median income is higher (dark blue) or lower (light blue), and the apparent regional patterns–for example, lower in eastern Canada, and higher in southern Ontario and western Canada.  It is clear from the map, however, that the provinces are themselves internally heterogeneous; northern Saskatchewan and Manitoba have lower median income than the southern urban regions, for example. Perhaps northern regions could be sensibly ‘grouped’ into their own province if we thought median income was an important defining attribute of provincial identity, or if it empowered traditionally underrepresented groups in a way that traditional provincial boundaries did not.

An alternative approach is to use a computer algorithm to re-draw the provincial boundaries based on the attributes of interest.  Here I use a political districting algorithm developed in my lab to redraw the provincial boundaries in a way that makes them as internally similar as possible with respect to median income.  I also ensured that the provinces are roughly equal in population size (between 2.5 and 5 million in each).  The result is a map of the country with new provincial boundaries (‘Provinces 2.0’).


Here’s one of just southern Ontario:


In practice, defining provincial boundaries based on similarities of income makes little sense.  Indeed, it may make more economic sense to create provinces that are economically heterogeneous so that wealthier regions can contribute provincial tax revenue to poorer regions. Furthermore, these alternative boundaries can be based on different attributes (e.g., language, ethnic background, unemployment, economic growth, housing stock, voting behaviour) or some combination of attributes that prove important.  It could also be interesting to observe how how these maps may change over time.

Expect some more experiments with these maps in the near future…