Sunday, 22 June 2014

What's online is < 1% of the primary sources worldwide

If Dan Snow thinks he can do history by searching primary source material from the comfort and convenience of his office, I fear he is not doing very sound or thorough history. Yes, digitisation is making a lot of material available without the stress and strain of visiting actual archives, but as an archivist, and as a historian who has just returned from a trip involving crossing Canada from Toronto to Victoria to look at archival collections which it is unlikely will go online within the foreseeable future, I think I can state fairly categorically that any historian who thinks they won't ever need to go and get hands-on with the sources is limiting themself in ways that are very bad for the process of history.

I refer back to my post in late April flagging up the problems of what is given priority in digitisation programmes, how complete it is, how much necessary context is there.

I might also remark that in many instances I have encountered, repositories holding important collections of personal and institutional papers don't necessarily even have a decent online catalogue so that one can ascertain what they've actually got.

Plus, maybe the research Mr Snow conducts does take him to glamorous places, but many archives are to be found in locations which are not only not particularly places one would choose to visit for any other reason, but not even particularly easy to get to. (Though I will say, Victoria BC in June is lovely, even if most of my days were passed in a semi-basement furiously making my way through handwritten and occasionally typed correspondence.)

What has possibly transformed research in this digital age (from my experience on the basis of this recent research trip) is the ability to take digital photos of the documents for later perusal (I was fortunate to be working in special collections which permitted this). Having  worked, in the past, in institutions where there was not even a routine procedure for obtaining photocopies and one was obliged to rely on the kindness and available time of the staff, or in other institutions where the charges were inordinately high presumably to discourage the ordering of large amounts, being able to take away images from collections which I had a very short time to ingest has been a major boon.


  1. In many ways you're right. Most material isn't online. Most of it never will be. But I do have the feeling that you're coming at this from a direction I'd not like to see perpetuated. My historical research can't be done in the archives because I'm trying to answer different types of questions than you. They aren't more or less important than what you research. They're just different.

    I analyse sets of records which need to be in digital format to get the types of results I'm after. You're (presumably) looking for evidence gleaned through close reading that provides you with a deeper understanding of a particular theme.

    I do appreciate what you're trying to say here and that it relates specifically to comments made by one person. But I do worry that this perspective comes to reflect poorly on digital history generally (whilst we were minding our business doing what we love!). Our two types of research compliment one another very well. I'm glad you do what you do. But suggesting I'm a bad historian because I don't do that too isn't productive for the profession. It marginalizes what I do without trying to understand it.

    The answer to a historical question isn't always in a box somewhere. Sometimes it's the connections between thousands of boxes that matter. And you don't always need to read the letters in the box to find out something new about the past.

    1. I got the impression - rightly or wrongly - from that article that he was very much talking about trad history, i.e. accessing digitised documents online with the rather misleading invocation of a former necessity to go to 'glamorous places' (!!) to consult archives - rather than the kind of exciting work that can be done with large digital datasets.

      I think there's a distinction here between 'digital history' as a productive methodology and a rather lazy assumption that somebody can access all the documents they need sitting at their own comfortable desk. Which, as I was trying to argue, is problematic when so little has been digitised and given the problems previously mentioned about issues of priorities, selection, decontextualisation.

  2. Also, access to travel funding may vary depending on where you are in the world. Online sources enable more people to become historians.

    1. They certainly do, but my note of caution here is being sounded about how relatively little is currently available online, and issues to do with the kinds of sources that have been so far been digitised, favouring certain kinds of material and certain historical areas more than others. (Though even if one is interested in the First World War, there's still masses of stuff that isn't - yet - online.)

  3. May I suggest that at present many digitisation projects are following either grant money or paying interest groups, each provides the necessary income to help towards the running costs/profits. Therefore it is correct for both Adam's comment and your own that the majority of archival materials will never be digitised or even easily available, it would be an almost impossibility from so many directions.
    It matters not if a collection is digitised unless the content is accurately and fully indexed. Without care and thought being put into indexing then the digital content becomes invisible to a researcher, simply an extensive range of pretty, or often poor resolution, pictures. This is also the case with traditional archive collections – how many times have you looked through a box and found the content does not reflect, in any degree of accuracy, the catalogue entry? This is probably a reflection on the constraints of funding and value placed by depositories on the materials they hold not only in document collections but also in physical ‘archaeological’ or ‘natural history’ collections, but must be weighed against the changing styles of recording methodology.
    The challenge is to improve technology to the point where identification of content is semi-automated, this happens with increasing accuracy with OCR on typeface, can it happen to handwritten text? – Difficult I suspect, as you are dealing not only with changing letter formation but also the individual scribes styles.
    As someone who has provided 10s of 1000s of digital images to a wide variety of collection holders I can see first-hand how these are then presented for researcher’s consumption. The best are good but at worst they are a waste of funder’s money. The reliance on basic metadata for searching is as much a hindrance as a help, one could propose that only embedded transcription of content would improve the situation which brings the circle back to the costs of funding projects to their fullest in regard for making resources findable for all interest groups.
    I have often used a simply method to illustrate the difficulty faced; show a picture to an audience and ask them each to describe the content they would list if they were indexing the item. From each person you will get a completely different list which primarily depends on their personal interests. The archivist is faced with this decision on almost every single item. If a researcher finds the item from their search string the archivists work goes unnoticed, if the item is found by accident the archivist will face criticism. It is a no win situation in many respects.

    1. I commented upon the absolute primacy of good metadata in a previous post: However, I do think that researchers are (for a long time yet, and probably always!) going to have to delve, because indexing absolutely everything that might be of interest to future scholars is not only prohibitive in terms of resources, it's pretty much impossible. There is a worrying trend to suppose that if something doesn't come up on a google-type search it's not there.

      Even if you have full-text searchability - as with the Wellcome Library's London's Pulse initiative with Medical Officer of Health Reports - users have to be aware of the kinds of terminology the people who compiled the reports would be using.

      I have also ranted in this blog about researchers who claimed to have 'discovered' something 'hidden in the archives' which was exactly where one would expect to find it, and not uncommonly visible in the catalogue.