🌟 [Post of The Day] How Facebook encodes your videos - Facebook Engineering
The sheer volume of video content on Facebook means finding ways to do this that are efficient and don’t consume a ton of computing power and resources.
While more advanced codecs like VP9 provide better compression performance over older codecs, like H264, they also consume more computing power.
there needs to be a way to prioritize which videos need to be encoded using more advanced codecs.
Facebook deals with its high demand for encoding high-quality video content by combining a benefit-cost model with a machine learning (ML) model that lets us prioritize advanced encoding for highly watched videos
How we used to encode video on Facebook
Facebook has a specialized encoding compute pool and dispatcher. It accepts encoding job requests that have a priority value attached
A number of factors, including whether a video is a licensed music video, whether the video is for a product, and how many friends or followers the video’s owner has.
As new video codecs became available, it meant expanding the number of rules that needed to be maintained and tweaked
different computing requirements, visual quality, and compression performance trade-offs
The challenge is to support content creators of all sizes, not just those with the largest audiences, while also acknowledging the reality that having a large audience also likely means more views and longer watch times.
Enter the Benefit-Cost model
What’s changed, however, is how we calculate the priority of encoding jobs after a video is published.
The Benefit-Cost model grew out of a few fundamental observations:
A video consumes computing resources only the first time it is encoded.
A relatively small percentage (roughly one-third) of all videos on Facebook generate the majority of overall watch time.
Facebook’s data centers have limited amounts of energy to power compute resources.
We get the most bang for our buck, so to speak, in terms of maximizing everyone’s video experience within the available power constraints,
by applying more compute-intensive “recipes” and advanced codecs to videos that are watched the most.
Definitions for benefit, cost, and priority:
Benefit = (relative compression efficiency of the encoding family at fixed quality) * (effective predicted watch time)
Cost = normalized compute cost of the missing encodings in the family Priority = Benefit/Cost
Relative compression efficiency of the encoding family at fixed quality:
“Encoding family” refers to the set of encoding files that can be delivered together.
H264 360p, 480p, 720p, and 1080p
VP9 360p, 480p, 720p, and 1080p
Minutes of Video at High Quality per GB datapack (MVHQ): Given 1 GB of data, how many minutes of high-quality video can we stream?
Effective predicted watch time:
a sophisticated ML model that predicts how long a video is going to be watched in the near future across all of its audience.
about 20 percent of video consumption happens on devices that cannot play videos encoded with VP9
Normalized compute cost of the missing encodings in the family: the amount of logical computing cycles we need to make the encoding family deliverable
Predicting watch time with ML
several built-in challenges:
Watch time has high variance and has a very long-tail skewed nature.
The best indicator of next-hour watch time is its previous watch time trajectory.
two technical challenges
Newly uploaded videos don’t have a watch time trajectory.
Popular videos have a tendency to dominate training data.
Watch time nature varies by video type.
Improvements in ML metrics do not necessarily correlate directly to product improvements.
Building the ML model for video encoding
To solve these challenges, we decided to train our model by using watch time event data.
build two models, one for handling upload-time requests and other for view-time requests.
The view-time model uses the three sets of features mentioned above.
The upload-time model looks at the performance of other videos a content creator has uploaded and substitutes this for past watch time trajectories.
Once a video is on Facebook long enough to have some past trajectories available, we switch it to use the view-time model.
The impact of the new video encoding model
Doing this has shifted a large portion of watch time to advanced encodings, resulting in less buffering without requiring additional computing resources.
the model automatically assigns a priority that would maximize overall benefit throughput.
Overall, this makes it easier for us to continue to invest in newer and more advanced codecs to give people on Facebook the best-quality video experience.
high_derivative, commenting: this will not change much on an immediate basis since the nano-particle delivery system + its manufacturing chain are quite complicated and not easily scalable.
meepmorp, corroborating: Basically, even if you could make the necessary mRNA particles properly, the it's useless without the delivery mechanism. Only a few companies even produce the necessary lipid products needed to deliver the vaccine, and they (and their suppliers) are already doing everything they can. The IP concerns had basically zero effect on the availability of vaccines.
Google introduced the Sitemap standard in 2005 to allow webmasters to eliminate the confusion by just providing a list of all their pages.
Most websites now provide sitemap files instead of relying on the general crawl.
Google gave up at some point trying to work out which of two similar pages is the original.
Instead there is now a piece of metadata which you add to let Google know which page is the "canonical" version.
Google also gave up trying to divine who the author is
Now that Google+ has been abandoned they instead read metadata from Facebook's OpenGraph specification
For other data they parse JSON-LD metadata tags, "microformats" and probably much more.
How does Google deduce the product data for an item from the product description page?
The answer is that they simply don't - they require sellers to provide that information in a structured format, ready for them to consume.
Google of course do do text analysis
not better natural language processing but a metadata trick: using backlinks as a proxy for notability
was a huge step forward,
but PageRank is not about understanding what is on the page
So many searches are now resolved by the "sidebar" and "zero click results" that traffic to Wikipedia has materially fallen.
Perhaps the best measure of this problem is how often I have to append the search terms "reddit" or "site:reddit.com" to a query.
he accumulated knowledge of human civilisation is still mostly in books.
Humanity wrote books for thousands of years and has only written web pages for a few decades.
When you search, you are really just searching the sum total of things that people have put, and managed to keep, on the web since about 1995.
Metadata tends to displace Artificial Intelligence
Manually attached metadata trumps machine learning in many fields once they mature
When your elected government snoops on you, they famously prefer the metadata of who you emailed, phoned or chatted to the content of the messages themselves.
self driving cars use the current GPS co-ordinates to access manually entered data on speedlimits
detect fraudulent credit card transactions.
The neural nets worked very well, but not well enough to not be a nuisance
American Express now use the combination of a cardholder provided whitelist of merchants and text message codes
A general pattern seems to be that
Artificial Intelligence is used when first doing some new thing.
Then, once the value of doing that thing is established, society will find a way to provide the necessary data in a machine readable format, obviating (and improving on) the AI models.
The virtues of metadata
it's open and there for anyone to read
Having to plead for access to or pay for metadata usually ends up empowering monopolies or creating needless data middlemen
The vices of the AI myth
Google mythos that
they have some godlike power to algorithmically understand web pages.
metadata is somehow ancillary and that search engines will work all it out on their own.
discourages webmasters from bothering with the basic things that will help people discover their pages
But "machine readable" strictly dominates machine learning.
An ounce of markup saves a pound of tensorflow.
Larry Page and Sergey Brin were originally pretty negative about search engines that sold ads.
Appendix A in their original paper says:
we expect that advertising-funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers and that
we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm