John Foliot on the new video formats

W3C Invited Expert John Foliot explains the difference between TTML and WebVTT, and which one you should be using

This article first appeared in issue 224 of .net magazine – the world's best-selling magazine for web designers and developers.

John Foliot works at Stanford University, where he runs the Standard Online Accessibility Program, and he is an Invited Expert to the W3C. Here he explains what you need to know about the two new competing formats for online video ...

.net: What is TTML?

JF: TTML (Timed Text Markup Language) is a W3C Standard finalised in 2010, but was many years in the making. An XML-based language, it contains a rich set of features and capability of expression. Significant contributions to TTML were made by representatives of Samsung, Microsoft, Apple, the BBC, WGBH/NCAM and RealNetworks. Flash-based players from companies such as Adobe and Longtail Video (the JW FLV Player) provide native support for TTML files today.

Most recently, the Society of Motion Picture and Television Engineers (SMPTE) created a superset of TTML called SMPTE Timed Text which added some additional capabilities to address legacy meta-data requirements from their constituents.

.net: And what about WebVTT?

JF: WebVTT (Web Video Text Tracks) is a time-stamp format intended for marking up external text track resources. It’s copyrighted by Apple Computer, Inc, Mozilla Foundation, and Opera Software ASA, although liberally licensed.

The principal author is Ian Hickson, and it’s heavily based on the SRT time-stamp format, which emerged from the fan-sub community (primarily in Europe) who were using it to create and share unauthorised subtitles to movie torrents and similar content.

Legend has it the original SRT proposal emerged on a list-serve email, and still today there’s no formal SRT specification available. The original SRT format is quite minimal and lacks any means of adding styling or sectioning, which WebVTT addresses.

.net: So what’s the difference between the two formats?

Outside of syntax, both formats provide essentially the same type of information to a parsing engine in terms of timing and style/output considerations. The difference is primarily in origin: one was created by many of the commercial players interested in web video, while the other emerged “from the streets” in an ad hoc fashion.

Many of the browser engineers are unhappy with TTML however, due to its XML base (which adds a bit of complexity), its overall size and some requirements that don’t apply to web browsers today. On the flipside, there’s concern over the way that WebVTT has emerged, some (minor) holes in the current spec and the fact it’s currently not covered by the W3C patent policy.

Acknowledging this concern, a W3C community group is working toward bringing WebVTT into the W3C to satisfy the concerns of commercial interests, to review possible legislative requirements emerging in various countries (such as work from the FCC in the US), and provide a consensus-based review of the spec to ensure that it meets all requirements from a technical and policy perspective.

.net: And so which one should I be using?

JF: That’s a tricky question, and the answer is not cut-and-dried. Until WebVTT becomes a W3C Recommendation, many large commercial content providers (who, remember, have already invested in TTML/SMPTE-TT and its entire production tool-chain) will remain hesitant to use WebVTT.

As well, if you’re stuck with supporting legacy browsers for some time, then you are already faced with falling back to a Flash-based player for those users, and many of those players already support TTML/DFXP files; there are also a few HTML5 polyfills that are providing TTML support to the <video> element.

On the other hand, the majority of the browsers have signalled that they’re very reticent to support TTML natively, and have to date focused their efforts on WebVTT support; while none have categorically stated that they won’t support TTML, they haven’t begun to look at supporting TTML to date. Today then, you need to evaluate a number of factors and make your choice based on those factors: no matter which format you choose, however, you’ll need to assist the browser in providing user-agent support at this time.

Whichever format ultimately wins, the real winners will be the millions of deaf and hard-of-hearing users who use the web daily. So regardless of which time-stamp format you choose, start captioning your videos today!

For more on video see The Future of HTML5 video and Silvia Pfeiffer on “a new type of web”

Subscription offer

Log in with your Creative Bloq account

site stat collection