The developer’s guide to the HTML5 APIs

Rich Clark, one of the HTML5 Doctors, gets under the hood of the APIs that form the bulk of the HTML5 spec and tells us about their purpose and progress.

Whilst we see, read and hear a lot about the new semantic elements in HTML5 we arguably hear far less about the application programming interfaces (APIs) that make up a large part of the specification itself.

As I'm sure you're aware that there are two versions of the HTML5 specification, one published by the W3C and another by the WHATWG. The living HTML specification maintained by the WHATWG contains additional APIs to those in the W3C HTML5 spec (although generally they are also maintained by the W3C but in separate specifications).

Alongside those in the specification are a number of related APIs that form part of the standards stack and are often grouped under the "HTML5" umbrella term. In some cases the APIs have been around and implemented for a while, but they've never been documented; something which HTML5 has set out to change.

In this article we're not going to look at code but instead we'll focus on describing the APIs, their purpose and progress. We'll then point you in the right direction to find out more.

We'll start by looking at the APIs in the W3C HTML5 spec.

01. Media API

The media API is part of the media element which includes two of HTML5's poster children, the video and audio elements. The elements themselves are simple to implement but what's less well known are the JavaScript methods available within the associated API. There are a number of methods including play() and pause() as well as load() and canPlayType(). Many of the methods are shared between both media types with a subset of additional properties (eg poster) specifically related to the video element. Combined with additional events and attributes the API allows us to, amongst other things, create custom controls.

To find out more, take a look at the following articles.

02. Text Track API

The text track API leads on nicely from the media API. It is designed to allow us to interact with text tracks (subtitles or captions for example) for the audio and video elements.

You can return the number of text tracks and their length associated with a media element, the kind of text track (subtitles, captions, descriptions, chapters and metadata), language, readyState, mode and label.

This API will have far more support when browsers begin to implement native subtitling, using WebVTT for example. In the meantime, to get up to speed, look at these resources:

03. Drag and Drop

The drag and drop API has been the topic of much debate. Originally created by Microsoft in version 5 of Internet Explorer, it is now supported by Firefox, Safari and Chrome. So what does it do?

Well, as the name suggests, it brings native drag and drop support to the browser. By adding a draggable attribute set to true, the user has the ability to move any element. You then add some event handlers on a target drop zone to tell the browser where the element can be dropped.

The API's real muscles are flexed when you start to think outside of the browser. Using drag and drop, a user could drag an image from the desktop into the browser or you could create an icon that gets loaded with content when dragged out of the browser by the user to a new application target.

Drag and Drop is covered in depth in the below articles.

04. Offline Web Applications/Application Cache

With the blurring of native apps (mobile and desktop) and web apps comes the inevitable task of wanting to take our applications offline. The Offline Web Applications specification details how to do just that using application caching.

Application caching is carried out by creating a simple manifest file which lists the files that are required for the application to work offline. Authors can then ensure their sites function offline. The manifest causes the user’s browser to keep a copy of the files for use offline later. When a user views the document/application without network access, the browser switches to use the local copies instead. So in theory, you should be able to finish writing that important email or playing the web version of Angry Birds while you're on the underground/subway.

With relatively strong browser support, particularly in the mobile arena (Firefox, Safari, Chrome, Opera, iPhone, and Android), it's something you can start using right now. For further reading, I suggest:

05. User Interaction

Like offline, user interaction is part of the primary HTML5 specification. It's worth mentioning here because some of its features, such as the contenteditable attribute, are extremely useful when you're creating web applications. contenteditable has been around in internet Explorer since version 5.5 and works in all five major browsers. Setting the attribute to true indicates that the element is editable. Authors could then, for example, combine this with local storage to track changes to documents.

For more information, take a look at the current spec but note that some sections have been moved to the HTML Editing APIs work in progress.

06. History

A browser's back button is the most heavily used piece of its chrome. Ajax-y web applications break it at their peril. Using HTML5's History API, developers have a lot more control over the history state of a user's browser session.

The pre-HTML5 History API allowed us to send users forward or back, and check the length of the history. What HTML5 brings to the party are ways to add and remove entries in the user's history, hold data to restore a page state and update the URL without refreshing the page. The scripting is fairly straightforward and will help us build complex applications that don't refresh the page from which we can continue to share URLs as we've always done.

For more detail on the History API:

07. MIME type and protocol handler registration

This API allows sites to register themselves as handlers for certain schemes. By using the registerProtocolHandler method, an example use case could be:

an online telephone messaging service could register itself as a handler of the sms: scheme, so that if the user clicks on such a link, he is given the opportunity to use that Web site (W3C HTML Spec)

Certain schemes are whitelisted such as sms, tel and irc. In addition there is a registerContentHandler method that allows sites to register as handlers for content with a certain mime type.

The spec is the best place to get started when learning about MIME type and protocol handler registration.

08. APIs in the WHATWG specification

So far we've looked at specs that exist in both the W3C and WHATWG versions of HTML5. We'll now very briefly introduce a few more APIs that are documented within WHATWG's living standard HTML spec but have been broken out into smaller, more manageable specifications by the W3C. The purpose and the majority of the content is the same in both versions.

09. The "HTML5" buzzword APIs

If I were to list out all the other APIs that are closely related to HTML5, I'd be here for a while. Another time perhaps. A few of those often incorrectly described as HTML5 are Geolocation, Indexed DB, Selectors, and the Filesystem API.

Mike Smith from the W3C has compiled a comprehensive list of all aspects of the web platform and browser technologies which is well worth bookmarking.

10. Demos and browser Support

We've hinted at browser support throughout this article but as support is constantly evolving the best place to keep up-to-date (besides testing of course!) is caniuse.

If you find that something isn't yet supported by browsers, don't despair. It's likely that there's a polyfill to help you mimic the native behaviour. Start your search with this list of HTML5 Cross Browser Polyfills.

For demos from which you can view source for most of the APIs we've discussed, look no further than Remy Sharp's HTML5 Demos.


We've merely scratched the surface of each of these detailed, useful, powerful APIs. In order to find out more and get under the skin of each, go and throw yourself knee deep in code. You'll be surprised at what you'll find while researching and experimenting. As for those APIs that aren't quite fully baked yet, hopefully the article has whetted your appetite for what will be coming to a browser near you soon.

Thanks to Oli Studholme, Remy Sharp, Mike Smith and Ian Devlin for their feedback and input to this article.

Rich Clark is head of interactive at KMP Digitata, a digital agency based in Manchester. He's the founder and curator of HTML5 Gallery and editor and author of HTML5 Doctor.

Liked this? Read these!