Whilst we see, read and hear a lot about the new semantic elements in HTML5 we arguably hear far less about the application programming interfaces (APIs) that make up a large part of the specification itself.
As I'm sure you're aware that there are two versions of the HTML5 specification, one published by the W3C and another by the WHATWG. The living HTML specification maintained by the WHATWG contains additional APIs to those in the W3C HTML5 spec (although generally they are also maintained by the W3C but in separate specifications).
Alongside those in the specification are a number of related APIs that form part of the standards stack and are often grouped under the "HTML5" umbrella term. In some cases the APIs have been around and implemented for a while, but they've never been documented; something which HTML5 has set out to change.
In this article we're not going to look at code but instead we'll focus on describing the APIs, their purpose and progress. We'll then point you in the right direction to find out more.
We'll start by looking at the APIs in the W3C HTML5 spec.
01. Media API
To find out more, take a look at the following articles.
- Media Elements, W3C
- Everything you need to know about HTML5 video and audio, dev.opera, Simon Pieters
- HTML5 audio and video: what you must know, NetTuts (a chapter from Introducing HTML5), Bruce Lawson and Remy Sharp
02. Text Track API
The text track API leads on nicely from the media API. It is designed to allow us to interact with text tracks (subtitles or captions for example) for the audio and video elements.
You can return the number of text tracks and their length associated with a media element, the kind of text track (subtitles, captions, descriptions, chapters and metadata), language, readyState, mode and label.
This API will have far more support when browsers begin to implement native subtitling, using WebVTT for example. In the meantime, to get up to speed, look at these resources:
- Text Track API, W3C
- Web Media Text Tracks Community Group
- Media Multiple Text Tracks API Wiki, W3C
- The YouTube Caption API, Speech Recognition, and WebVTT Captions for HTML5, Google I/O 2011, Naomi Black, Cynthia Boedihardjo, and Jeffrey Posnick
- Captionator.js Polyfill
- WebVTT and video subtitles, Ian Devlin
- Video subtitling and WebVTT, HTML5 Doctor, Tom Leadbetter
03. Drag and Drop
The drag and drop API has been the topic of much debate. Originally created by Microsoft in version 5 of Internet Explorer, it is now supported by Firefox, Safari and Chrome. So what does it do?
Well, as the name suggests, it brings native drag and drop support to the browser. By adding a draggable attribute set to true, the user has the ability to move any element. You then add some event handlers on a target drop zone to tell the browser where the element can be dropped.
The API's real muscles are flexed when you start to think outside of the browser. Using drag and drop, a user could drag an image from the desktop into the browser or you could create an icon that gets loaded with content when dragged out of the browser by the user to a new application target.
Drag and Drop is covered in depth in the below articles.
- Drag and drop API, W3C
- Native, Drag and Drop, HTML5 Doctor, Remy Sharp
- Drag and Drop, MDN
- The drag and drop API, HTML5 Laboratory, Ian Devlin
04. Offline Web Applications/Application Cache
With the blurring of native apps (mobile and desktop) and web apps comes the inevitable task of wanting to take our applications offline. The Offline Web Applications specification details how to do just that using application caching.
Application caching is carried out by creating a simple manifest file which lists the files that are required for the application to work offline. Authors can then ensure their sites function offline. The manifest causes the user’s browser to keep a copy of the files for use offline later. When a user views the document/application without network access, the browser switches to use the local copies instead. So in theory, you should be able to finish writing that important email or playing the web version of Angry Birds while you're on the underground/subway.
With relatively strong browser support, particularly in the mobile arena (Firefox, Safari, Chrome, Opera, iPhone, and Android), it's something you can start using right now. For further reading, I suggest:
- Offline Web Applications, W3C
- Let's take this offline, Dive into HTML5, Mark Pilgrim
- Running your web applications offline with HTML5 AppCache, dev.opera, Shwetank Dixit
- Go offline with application cache, HTML5 Doctor, Mike Robinson
- Offline Browsing in HTML5 with ApplicationCache, Sitepoint, Malcolm Sheridan
- Get off(line), Web Directions, John Allsopp
05. User Interaction
Like offline, user interaction is part of the primary HTML5 specification. It's worth mentioning here because some of its features, such as the contenteditable attribute, are extremely useful when you're creating web applications. contenteditable has been around in internet Explorer since version 5.5 and works in all five major browsers. Setting the attribute to true indicates that the element is editable. Authors could then, for example, combine this with local storage to track changes to documents.
A browser's back button is the most heavily used piece of its chrome. Ajax-y web applications break it at their peril. Using HTML5's History API, developers have a lot more control over the history state of a user's browser session.
The pre-HTML5 History API allowed us to send users forward or back, and check the length of the history. What HTML5 brings to the party are ways to add and remove entries in the user's history, hold data to restore a page state and update the URL without refreshing the page. The scripting is fairly straightforward and will help us build complex applications that don't refresh the page from which we can continue to share URLs as we've always done.
For more detail on the History API:
- History API, W3C
- Manipulating History for Fun & Profit, Dive into HTML5, Mark Pilgrim
- Introducing the HTML5 History API, dev.opera, Mike Taylor & Chris Mills
- Manipulating the browser history, MDN
07. MIME type and protocol handler registration
This API allows sites to register themselves as handlers for certain schemes. By using the registerProtocolHandler method, an example use case could be:
an online telephone messaging service could register itself as a handler of the sms: scheme, so that if the user clicks on such a link, he is given the opportunity to use that Web site (W3C HTML Spec)
Certain schemes are whitelisted such as sms, tel and irc. In addition there is a registerContentHandler method that allows sites to register as handlers for content with a certain mime type.
The spec is the best place to get started when learning about MIME type and protocol handler registration.
08. APIs in the WHATWG specification
So far we've looked at specs that exist in both the W3C and WHATWG versions of HTML5. We'll now very briefly introduce a few more APIs that are documented within WHATWG's living standard HTML spec but have been broken out into smaller, more manageable specifications by the W3C. The purpose and the majority of the content is the same in both versions.
- Canvas 2D Context — allows us draw natively in the browser. Using canvas without the 2D Context API we wouldn't be able to draw. It's our brushes, palette and paint all rolled into one. The API is extensive and pretty much all canvas articles introduce some of the different methods and events of which there are too many to detail here. WHATWG Canvas Element, 2D Context and W3C HTML Canvas 2D Context Spec
- Cross document and channel messaging — cross document messaging defines a way for documents to communicate with one-another regardless of their source domain without enabling cross-site attacks. In a similar vein, channel messaging uses independent pieces of code to communicate directly. WHATWG HTML, Cross document messaging, WHATWG HTML Cross channel messaging and W3C HTML5 Web Messaging spec
- Microdata — adds an additional layer of semantics to your documents from which search engines, browsers and more can use to extract information and provide an enhanced browsing experience. WHATWG HTML, Microdata and W3C Microdata spec
- Web Storage — a spec for storing client side data (key value pairs) similar to cookies. WHATWG HTML, Web Storage and W3C Web Storage spec
- Web Sockets — allows pages to use the WebSocket protocol to send two way messages between a browser and server. WHATWG Web Sockets and W3C Web Socket API
- Server sent events — allows for push notifications to be sent from a server to a browser in the form of DOM events. WHATWG HTML, Server-sent events and W3C Server-Sent Events
09. The "HTML5" buzzword APIs
If I were to list out all the other APIs that are closely related to HTML5, I'd be here for a while. Another time perhaps. A few of those often incorrectly described as HTML5 are Geolocation, Indexed DB, Selectors, and the Filesystem API.
10. Demos and browser Support
We've hinted at browser support throughout this article but as support is constantly evolving the best place to keep up-to-date (besides testing of course!) is caniuse.
If you find that something isn't yet supported by browsers, don't despair. It's likely that there's a polyfill to help you mimic the native behaviour. Start your search with this list of HTML5 Cross Browser Polyfills.
We've merely scratched the surface of each of these detailed, useful, powerful APIs. In order to find out more and get under the skin of each, go and throw yourself knee deep in code. You'll be surprised at what you'll find while researching and experimenting. As for those APIs that aren't quite fully baked yet, hopefully the article has whetted your appetite for what will be coming to a browser near you soon.
Thanks to Oli Studholme, Remy Sharp, Mike Smith and Ian Devlin for their feedback and input to this article.