Designer/developer Mat “Wilto” Marquis takes a look at the img tag, explains why a largest-size-fits-all approach to images is untenable and discusses different solutions in this ongoing saga
I have a complicated relationship with the img tag.
I should say up front that the img tag will always hold a special place in my heart. I mean, we've had a lot of good times together. The gifs – oh, the animated gifs! How we would laugh, watching loops of cats falling off household furniture.
Things change, though. I've changed; the web has changed. More importantly: users' browsing contexts have changed, and radically. People are accessing the web on a staggering range of devices: phones with more computing power than the average space shuttle, ultra-high-res tablets and enormous televisions. As a result, many of us have changed as designers and developers, both in terms of our philosophical approaches to the web and in terms of the techniques we've developed as a result. Responsive web design gives us an incredibly powerful set of tools for handling uncertainty, allowing us to take huge strides toward delivering content to users in a way that seems bespoke for any browsing context. Thanks in no small part to responsive web design, we've begun to eschew the static-width comfort zones we've been trying to impose on the web for years – 1024px, 960px, 320px, and so on – and we've started to embrace the flexibility that has been inherent to the web all along.
The img tag hasn't changed much with us, though. Fortunately for us – and our newfound appreciation for flexible layouts – images have been flexible all along. By removing the height and width attributes from an img tag and setting a max-width of 100% in our CSS, we're saying "prevent this image from ever overflowing its parent container." As our layout resizes, so do our images.
Here's where things get complicated, though: in using flexible images, we have to ensure that the media in play is large enough to scale up to whatever size we can reasonably expect, on the largest possible display. We've just introduced a tremendous expense to users that may see no real benefit – a 2,000 pixel wide image means a massive, wasteful request on a 300 pixel wide display.
Now, one could argue that this isn't a show-stopping issue; that we have plenty of bandwidth to go around, even factoring in the advent of "Retina" images. That is, of course, assuming the worst connection one faces is, on average 3G/4G, using an advanced and feature-laden device. However, millions of people worldwide are currently accessing the internet on mobile connections alone, paying for every kilobyte consumed, using devices only slightly better than feature phones. We need to be mindful of the fact that serving massive images indiscriminately means serving them to users at an additional cost – not just in terms of bandwidth, but actual economic cost – so that some users can enjoy a slightly improved experience. At least speaking for myself, I'd sooner put "best viewed in Internet Explorer 6" in the footer of a website than "best viewed in the first world." The latter is on a very short list of things I find more offensive than the former.
The Web [...] should be accessible from any kind of hardware that can connect to the Internet: stationary or mobile, small screen or large.
Tim Berners-Lee, Long Live the Web: A Call for Continued Open Web Standards and Neutrality
While we at Filament Group were working on the BostonGlobe.com project, we recognised right away that a largest-size-fits-all approach to images would be completely untenable. In response to this, Scott Jehl developed a means of introducing breakpoints, of sorts, to an img tag.
If you've been following the ongoing "responsive images" saga you know that this approach has since fallen apart, as have many similar solutions. Several browsers have implemented a particularly aggressive flavour of DNS prefetching, allowing UAs to fetch assets prior to the parsing of a document's body. It's hard to get too upset about pages that load more quickly, granted, but it does mean that we're unable to communicate the client's screen size to the server in time to defer that initial request. In fairness, the cookie method we'd settled on was based on the non-specced behaviour of major browsers at the time, and solutions based on assumed behavior rarely prove to be most future friendly.
Faced with a total return to the drawing board, we began by breaking our objectives down to a few key concepts:
- All users – under any and all circumstances – should receive a representative image.
- Eliminate the redundant request on larger displays – or, at the very least, mitigate that request as best we can.
- From an art direction standpoint, simply scaling an image may not always be the best approach. Being able to specify alternate image sources rather than resizing the largest image on the server would allow for smaller images to be more appropriately cropped and zoomed to suit the image's subject.
During our newly-redoubled efforts we reached out to similarly-stumped members of the developer community that were documenting their efforts publicly. What followed was a flurry of brainstorming and late-night Git commits not suitable for public consumption – sordid tales of <noscript>, dynamically-injected <base> tags, and document.write. We weaved the modern developer's equivalent of campfire horror stories, plumbing the darkest depths of frontend coding in search of a solution. It was not pretty, and more importantly: no technique we attempted worked well.
As we grew increasingly frustrated, it became clear that we would need to loop in members of browsers' developer relations teams to gain some outside perspective. Mike Taylor of Opera and Paul Irish of Chrome joined our ongoing talks, and though it served to shed some light on our countless failed "responsive images" approaches, our newly-gained clarity didn't put us any closer to a real solution.
As developers joined the conversation – and with the continued help of several browser representatives – it became more and more clear that the issues we faced might only be solved well with a new element as our starting point.
The idea is to use the video tag's markup pattern as the inspiration, as it's specced to allow the use of media queries within attributes on its source elements and reliably displays the markup inside the tag in any browser that doesn't recognise it. As with the video and audio tags, this solution shouldn't require any custom scripting, HTTP/SPDY headers, or server-side technologies to reliably deliver content tailored for the end user's context. Through use of media attributes we would not only be able to reduce wasteful image requests for the sake of users with smaller displays, but we could tailor our images' resolutions for users with high-res displays or for print.
We've gone through countless iterations on the officially proposed markup pattern before reaching its current incarnation:
<picture alt="Alt tag should accurately describe the image represented by all sources, though cropping and zooming may differ.">
<source src="mobile.jpg" /> <!-- Matches by default. -->
<source src="high-res.jpg" media="min-width: 800px" /> <!-- Overrides the previous source over 800px before any assets are fetched, resulting in a single request. -->
<img src="mobile.jpg" />
<!-- Fallback content, in the event the <picture> tag is unsupported. -->
Unfortunately, despite all the public-facing discussion of the <picture> element, we haven't heard much interest voiced by the WHATWG. We understand, of course, that most people deeply involved in the standards process labour under a massive backlog of issues and suggestions. We can't expect an immediate response to every request for a new element coming from a group of frustrated developers – no matter how many rooftops that request is shouted from. It's difficult not to be discouraged by this silence, however – especially given the growing concern about serving screen-appropriate images, following the advent of Retina-screened iPads and the rapid adoption of responsive web design.
Despite the relative quiet from the WHATWG, we've reached an exciting point in our efforts – the point where our proposed element feels ready for a sample implementation.
As a result of working closely with members of the Opera and Chrome developer relations teams throughout this process, the <picture> markup pattern has been presented to both teams by way of what I'm half-jokingly calling a "pseudospec". This is not a specification intended to be codified word-for-word, but rather a single place to find key implementation details without leafing through page after page of prose in the Community Group. I've posted that to a public GitHub repo for the time being, but would prefer to keep discussion centralised in the Community Group. In fact, based on the details and requirements outlined in the link above, Scott Jehl has already written a polyfill to bring <picture>'s functionality to browsers that wouldn't otherwise support it. While this polyfill will work today, it is still purely theoretical and should never be used on a production site.
The picture pattern is far from an absolute ideal, of course. Finding a means of detecting bandwidth is an absolutely critical step in delivering assets in an optimised way – no small amount of work is being done in that department. It's critical that we have a way to take advantage of new techniques for detecting client-side capabilities as they become available to us, and the picture element would give us a solid foundation to build upon. Should this element become a standard, it would give us a means of handling disparate assets that represent a single subject, a baseline method of delivering image sources in a far less wasteful way, and a responsible fallback pattern for older browsers. By falling in line with the video tag's source pattern, future methods of tailoring asset delivery could apply to a huge range of assets. The img tag is ill-suited toward these ends, given all the retrofitting that would be required. Further, we could never say with total certainty that modifications to a tag that has existed since time immemorial wouldn't introduce issues in older browsers.
The picture element is no replacement for img. There will always be times where a simple img element will be the best tool for the job, while the picture element will be reserved for larger "hero" images and high-resolution photographs. It's important that we have this sort of flexibility, and I like to think that flexibility is something the img tag has always understood. It's nothing personal of course, img tag. And no matter what happens next: we'll always have cat gifs.
Words: Mat Marquis
Mat “Wilto” Marquis is a designer-slash-developer working at Filament Group in Boston. Mat is a member of the jQuery Mobile team, technical editor at A List Apart, and an active member of the open source community. He's probably flipping out about something on Twitter as we speak.