In 2014, Gale became the first humanities primary source publisher to give customers access to the Optical Character Recognition (OCR) text that underpins all our resources, both through Text and Data Mining (TDM) drives and through single-document OCR download on the Gale Primary Sources platform.
In the intervening four years, Gale has worked closely with researchers, scholars and teachers worldwide to understand how they’re using this OCR data to advance scholarship, make discoveries and further research. In doing so, we have built up a clear picture of some of the key barriers to successfully taking on a Digital Humanities project, and some of the challenges that customers have had when text mining archival content, both from Gale and others.
These challenges can broadly be summarised as:
- Access to relevant data in a format optimised for analysis
- Hosting, organising and sharing of large amounts of OCR and metadata
- Existing tools are difficult to use
What was the result? Gale Digital Scholar Lab – Gale’s brand-new cloud-based text and data mining environment, which combines familiar open-source tools with Gale’s unmatched digital archive collections in an integrated platform.
The Gale Digital Scholar Lab has been developed in conjunction with DH scholars and in partnership with the wider DH community, to address the three crucial challenges outlined above. By integrating an unmatched depth and breadth of digital primary source content with the most popular digital humanities tools, Gale Digital Scholar Lab provides a new lens to explore history, empowering researchers to generate innovative research and reach original conclusions.
At launch Gale Digital Scholar Lab includes approximately 166 million pages of Gale’s unique primary source material, digitised from the world’s premier research libraries, optimised for analysis. The Gale Digital Scholar Lab allows quick and efficient creation of bespoke Content Sets that can save researchers weeks, or even months, when compared to traditional methods. Plus, as a cloud-hosted tool, it removes the onus on libraries and faculties to host, manage and organise vast amounts of OCR data.
By integrating the most-requested open-source analysis tools in the Gale Digital Scholar Lab and providing simple options for customisation, Gale allows scholars of all experience levels to run powerful analysis and extract meaningful visualisations that can be used to form the basis of a Digital Humanities project.