How to create an ebook (part 3)

In part one and part two of this series, we looked at e-publishing options and techniques for new works. But what about the scores of out-of-print books that deserve a new life in the eBook space?

I'm neither a librarian nor archivist, by trade or penchant. But like many readers of this site, I am a great lover of older books. And even with all the good work being done by so many people, institutions, and organizations like Project Gutenberg and Google Books, it's taking years to bring many of them into the 21st century.

Since I could shoot the book flat, I took advantage of that to shoot spreads at a time, thus cutting my photography work in half. This means I'd need to split the spreads in post – no big deal. But it also meant I was shooting each page with half of the resolution of shooting one page at a time. Yet I tested it, and with 18 megapixels (3456x5184), there was enough resolution to clearly capture the text character shapes for optical character recognition (OCR) later.

Holding the rig steady: The cardboard rig had a back and side ridge to keep the book stationary (see above). But I also needed to tape the cardboard rig to the floor to kept it in place.
Strobes and sharp focus: A strobe or flash (NOT on camera) has the advantage of crisp light that "freezes" the shot, and reduce chance of blur. It's also a more powerful light, allowing you to use a smaller aperture on the lens, this increases depth of field and focus issues. Any blurring and you can forget about successful OCR later. Make sure the camera is pointing straight down and centered, and test-test-test with sample shots in Photoshop.
Keeping the camera steady: Continuous lighting is a fine alternative, but you need to be extra-careful of sharpness. Even tripping the camera with your finger can create enough movement to soften the image. Use a remote trigger, or in a pinch, a delayed firing option. Lock the mirror on SLRs.
The well-placed lights: Whichever light source you use, make sure the lights are not reflected in the Plexiglas. Two lights placed at 45 degree angles gives the most even light. But one light can be used if done carefully. Make sure to set an exposure that does not overexpose the white paper, which would make OCR harder.
Save it as jpegs: Going straight to a high quality JPEG will save you conversion work later on. It will also save you a lot of disk space. I can't see much benefit in shooting this raw.

Go straight to PDF: If you took care in your photography, and shot one page at a time (or don't mind it being spreads), and you're fine with assembling your book in the PDF format, then you are in great shape. You can simply use Adobe Acrobat or other PDF utility program to assemble all the pages into one file. Fini!
If you need to split the spreads, there are a few ways to do this. One of the easiest I've found was to use the free software called Briss, or a more robust program that is also open source called Scan Tailor which does many other things as well, including deskewing, adding/removing borders, etc. Another option is to use Acrobat. Here's a tutorial for that.
OCR for live text: PDFs are great, but chances are you will want an actual reflowing ebook. This requires optical character recognition. See more on OCR options below.
Correct for tonality: Even the best OCR packages need clean source images. Your images should not be too dark, and the text characters should have a solid dark tone. Corrections can be made in Photoshop to one page. When it's the best it can be, create a Photoshop Action and apply to all the pages.
Remove the extraneous: You won't want your OCR to start including any header and footer text that's on the pages, like titles and page numbers. So crop the pages to remove all of that prior to doing the OCR.
Export from OCR and start the ebook: The final step is OCR, after which you will bring the captured text into a word processing application like MS Word or maybe Open Office, for clean up. When done, you can start the process of actually creating your ePub reflowable ebook. See part one and part two for more on that process.
Taking care of art: While there is some software that tries to automate parts of working with images, you may find it easier to just handle a small number of images manually, depending on your needs.

The Generalist: Adobe Acrobat. Both Standard and Pro versions offer OCR tools. Acrobat is the Swiss Army knife of PDF tools and does so many things. While its Standard version is competitively priced, the pro version is pricey. The good news is you probably already have it as part of your Creative Cloud subscription.
The Specialized Commercial Package – Abbyy FineReader. FineReader is a competitively priced OCR favourite among many doing book conversions. It consistently gets really good reviews, and is said to be easy to "train" for better results.
Open Source and Institutional Favorite – Tesseract OCR/FreeOCR. Tesseract was originally developed at HP and has been underwritten by Google since 2005. It is billed as "the most accurate open source OCR engine available", and is available for Windows, Mac and Linux. A GUI front-end called FreeOCR is available for it for use on Windows.

Thank you for reading 5 articles this month* Join now for unlimited access

Enjoy your first month for just £1 / $1 / €1

*Read 5 free articles per month without a subscription

Join now for unlimited access

Try first month for just £1 / $1 / €1

The Creative Bloq team is made up of a group of design fans, and has changed and evolved since Creative Bloq began back in 2012. The current website team consists of eight full-time members of staff: Editor Georgia Coggan, Deputy Editor Rosie Hilder, Ecommerce Editor Beren Neale, Senior News Editor Daniel Piper, Editor, Digital Art and 3D Ian Dean, Tech Reviews Editor Erlingur Einarsson and Ecommerce Writer Beth Nicholls and Staff Writer Natalie Fear, as well as a roster of freelancers from around the world. The 3D World and ImagineFX magazine teams also pitch in, ensuring that content from 3D World and ImagineFX is represented on Creative Bloq.

How to create an ebook (part 3)

Archivist subculture

Image capture

Hardware

Post production

OCR options

Related articles