The Wellcome Player, the Universal Viewer and IIIF

Digirati have been working with the Wellcome Library, The British Library and the National Library of Wales to further develop the viewing technolgies introduced in the Wellcome Player.

Please see IIIF and the Universal Viewer before implementing the data model described on these pages.

Digitising a book

Letters, old books, brochures… how can I digitise them and put them online?

This example is a book produced in 1941 showing the effects of The Blitz on British cities. The foreword, by J B Priestly, can be read as an appeal to American readers, six months before Pearl Harbor brought the US into the War.

The Player supports embedding, so you can include this on your own site.


Small-scale digitisation


An organisation like the Wellcome Library has a production-line approach to digitising and then dynamically serving digitised works (e.g., here and here) but using the same Player, anyone can put a digitised book online, even on a completely static site. This walkthrough requires the absolute bare minimum of server-side resources, apart from a fair amount of disk space. It also reveals the parts of the process that are really going to need automation on the server to make it possible on a large scale. If you have one or two books, or a lot of patience and time, you could adopt the method described here. But if you are a library, even a very small one, you’re going to need something else beyond this – something on the server to connect your digitised assets to the web without some of the manual process described here.

Even for a single book, there’s no getting round the fact that you’re going to have to spend some time with a scanner. You need high quality images of the pages, and you can achieve very good results with a cheap domestic scanner. This article uses a Canon all-in-one device:

Click to enlarge

So there’s nothing for it but to put some music on and get scanning. It took quite a few hours to scan all 99 images in this book.

It will help you a great deal if you name your files sequentially, and also include in the file name the printed page number (if you have one). So the cover might be 001_cover.jpg (no page number) but the page with printed page number 57 might be 063_57.jpg, and page 7 might be 012_iii.jpg. This naming convention isn’t required by the Player, it just helps us keep track as we’re doing it all manually.

I used the free Paint.NET (for Windows) to acquire the images from the scanner and then crop. I also frequently needed to perform very minor rotations on the scanned images, as I found it very difficult to get the book (which is not in very good condition) lined up on the scanner. This particular book is a difficult challenge for a cheap domestic scanner. I scanned at 300 dpi; at anything less than that I was getting severe moiré pattern effects from the rather coarse halftone print process. Moiré effects are still present; better quality further would have been achieved by scanning at 600 dpi, using more sophisticated filters. Armed with a higher end scanner and Photoshop you can achieve much better results, but I wanted to use basic tools and equipment.

Click to enlarge

After we’ve scanned a few pages we may want to skip forward and create a book of just these few pages for the player, just to make sure it’s all going to work. You can come back and add the remaining pages later. In this walkthrough I’m going to assume that it is going to work, and scan all the way through to the end. I end up with a folder of 99 images, sensibly and sequentially named:

Click to enlarge

The Player uses Deep Zoom technology to show page images. The high resolution image that I scanned isn’t served up to the viewer directly; instead, the Player sees hundreds of small image tiles. This model is familiar from Google Maps – you don’t need to load the whole world at multiple levels of detail, your browser just loads the map tiles necessary to fill the current view at the current zoom level. There’s an excellent in depth description of deep zoom in this pair of blog posts:

A single high resolution image can be cut into hundreds or thousands of separate small image tiles, it is these your browser requests to display the image as you pan or zoom around.

If you are on Windows you can use the free tool Deep Zoom Composer to create the image pyramid folders. This is probably the easiest way to do this for casual or non technical users. On other platforms you can use Node.js Deep Zoom Tools, deepzoom.py or this Perl script.

We need to generate folder structures containing all the tile images for each page of the book. On a large scale project these tiles would be served dynamically by an image server (IIPImage in the case of the Wellcome Library), we would not create the tiles in advance or store them on the web server. I want this example to work on an entirely static site so I’m going to use Deep Zoom Composer to generate the folders for each image.

The first step is to create a new project and import all the images into Deep Zoom composer:

Click to enlarge

Deep Zoom Composer is actually more sophisticated than we need for our current task. The next step, “Compose”, allows you to combine multiple images in a canvas. We don’t actually need to do this, we’re going to put only one image at a time onto the canvas, make it fit the full size of the canvas, export as a deep zoom pyramid, then return to the compose step and repeat for the next image, until we have separately exported all 99 of our images, into 99 separate folders.

So, more music on and start exporting. Put one image onto the canvas:

Click to enlarge

Make it fit:

Click to enlarge

Export:

Click to enlarge

Then repeat. Return to the Compose screen, delete the image from the canvas, drop the next image onto the canvas, resize to fit, then go to Export again.

One annoyance is that Deep Zoom Composer will assume you want to export this with the same name as before – but you don’t, you want to rename it. It’s a new export for the new image. Unfortunately the image name is not displayed on this screen, we just have to keep track of what image it is. Luckily the name we gave it is displayed when you select the image at the compose step on the previous screen.

Click to enlarge Click to enlarge

You also need to make thumbnails, one for each image. Again, follow the naming convention of the files to make it easier to manage. If you have a more sophisticated image editing package you can probably batch-process the creation of thumbnails. They should have a width of 90 pixels to fit the default player thumbnail view, with the height varying proportionally.

Here I am using the IrfanView Thumbnails utility to create the thumbnails via a batch process, resizing to a width of 90 pixels and leaving the height to sort itself out:

Click to enlarge

We now have 99 folders, one for each image pyramid, and 99 thumbnails. Now we need to get the player running, and for that we need a package file that tells the player about these images and thumbnails. You can see the packages for the player examples given earlier:

The John Dee painting is only one image, therefore its package file is quite small. The package is the first thing the Player loads, and it provides the information the Player needs to make further requests for thumbnails, tiles and other resources.

We also need a web site to run this in. Assuming we’re starting from scratch, we can download a release build of the Player from Github. Download the latest release (e.g., as a zip file) and unpack it:

Click to enlarge

Alternatively you can clone the project and build the latest version of the Player.

You need to run this through a web server on your local machine to see it in action. Here I am using WebMatrix, I have opened the folder as a web site and I’m viewing index.html in the browser:

Click to enlarge

We need to provide our own package file, and point the player at it. At the moment the package file needs to be hand edited. It’s best to start a sample and then customise it with search and replace operations in the web browser. This is the package file for the "Britain Under Fire" book:

<div class="wellcomePlayer" data-uri="http://player.digirati.co.uk/digitised/britainUnderFire.js" style="width:100%; height:700px; background-color: #000"></div>
<script src="http://player.digirati.co.uk/wellcomeplayer/js/embed.js"></script>

Open the package file to take a look:

http://player.digirati.co.uk/digitised/britainUnderFire.js

The table of contents is generated from the section structure, which is described in more detail on the data model page. The repetitive job of converting an existing package is made a lot easier with a good text editor, especially one that can do regular expresson based search and replace. Regardless of how good your tools are this is a long and tedious job. The main task is setting the properties of each asset (page image) in the assetSequence (the book). The dziUri needs to point to the xml file produced by Deep Zoom Composer, and the thumbnailPath needs to point to the thumbnail for the page image. You also need to add the pixel dimensions of the image.

Click to enlarge

As you can see, it is possible to use the Player, static web hosting and a various tools to create the images and package data file. But this approach is very laborious and time consuming. For a handful of works each of a few pages (e.g., a small collection of letters) the completely hand-turned approach is feasible. But even for a 100 page book like "Britain under fire", we have reached the limits of what's sensible to do without automation, without something running on the server to do lots of this for you. The Digital Delivery page suggests some next steps.