│By Sarah L. Ketchley, Senior Digital Humanities Specialist│
Recently, three Python Notebooks were added to Gale Digital Scholar Lab to offer additional flexibility in processing and analysing text data. Each of the Notebooks can be downloaded by a researcher, then used or adapted to suit individual needs. This blog post offers some considerations for those interested in incorporating Python-based workflows into their text analysis pipeline but aren’t quite sure where to start.
This blog can also be read in conjunction with Women’s History Month in Gale Digital Scholar Lab: Named Entity Recognition, Python Notebooks, and an Intrepid Female Diarist which offers some practical programming insights into a project using Named Entity Recognition.
What is Python?
In the realm of programming languages, Python stands out as a versatile and powerful tool that has gained immense popularity across various domains, from web development to data analysis and artificial intelligence. Its design philosophy emphasises code readability and simplicity, and its extensive library support makes it an ideal choice for beginners and seasoned developers alike. Python offers tools and resources to support a wide range of applications, including frameworks to support web development like Django or Flask, data science libraries like Pandas and NumPy, and machine learning and AI libraries like TensorFlow or PyTorch.
Getting Started with Python: A Humanistic Approach
As disciplinary experts in their own fields, many humanists have neither the time nor inclination to add learning to code to their slate of tasks. The interdisciplinary nature of many digital humanities projects means that dedicated programmers are often core members of the team. However, learning basic coding skills offers some benefits to scholars planning to use DH methodologies to work with their data.
Such skills could enable a humanist to test hypotheses, for example, or create initial research pipelines, conduct preliminary analyses and so on, which can be essential factors in establishing a proof of concept when applying for funding. Being conversant in code is helpful in a team comprised of scholars and researchers from different disciplinary backgrounds: the ability to communicate effectively across teams is one of the core components of a successful DH project.
Fortunately, there are a slate of useful resources available to guide the novice through the process of getting comfortable working with code, and many are free. For example, William Mattingly’s Python Humanities offers written and video instruction to introduce novices to Python. There is also a three-hour introductory course available on Mattingly’s YouTube channel.
The Programming Historian offers over thirty introductory lessons in using Python to carry out a variety of analysis tasks, suitable for beginners to the more advanced. University Libraries often offer workshops to faculty and students – it is always worth following your institution’s training calendar to identify worthwhile opportunities.
Python Notebooks in Gale Digital Scholar Lab
There are now three Python Notebooks available in the Lab, in a format that enables all users – beginners to more experienced – to engage with the process of running a pre-written Python script, line by line, to process text data, run analyses, and display outcomes in CSV or graphic format. This executable code can be run using either Google Colab or Jupyter Notebooks. Each line of code, or code block, is annotated within the Notebook. The annotation begins with a hashtag – such as ‘#provide the name of the file you’re uploading’, in the figure below. These comments provide additional descriptive context for what the associated line of code is doing.
Google Colab
Google Colab, short for Google Colaboratory, is an online platform that provides a Python development environment integrated with Google Drive. It allows users to write and execute Python code in a browser, eliminating the need for local installations or setups. Its cloud-based approach is user-friendly, and the interface will look familiar to those used to working with the Google suite of tools. The platform stores the Notebooks you’ve been working on directly to your Google Drive account, enabling easy sharing and access across devices.
After clicking ‘Get a Copy’ of the chosen Python Notebook, a zip folder will download to your local machine. You can then sign into Google Drive and either search for ‘Colab’ or navigate directly there from https://colab.research.google.com/.
Click ‘File’ then ‘Upload Notebook’ and choose the .ipynb file you downloaded from the Lab. Once uploaded, the interface will look something like this:
You’ll see that each block of code includes executable text, and contextual commentary to explain what is happening in each section. You’ll begin at the top of the document and run each executable cell by hovering over the [ ] area or by entering Ctrl+Enter.
Running each block in this way gives insight into the order of processing: for example, loading the necessary libraries pre-installed in Google Colab, before executing the relevant commands, while adapting the code to dig into specific elements of interest within the analysis pipeline. This approach can help researchers effectively interpret their analysis results since they have a measure of control over each stage of the process; it also helps develop a thoughtful and critical approach to ‘out of the box’ tools.
Each of the cells is fully editable within the platform, so researchers can tweak the code as necessary. They can add their own commentary too, using Markdown.
Jupyter Notebooks
Like Google Colab, Jupyter Notebooks (formerly known as IPython Notebooks) are interactive web-based environments for creating and sharing documents that contain live code, equations, visualisations, and explanatory text. They support various programming languages, and Python is one of the most widely used within the platform.
If you want to use Jupyter Notebooks locally, you can install it via Python’s package manager, pip. Once installed, you can launch a local Notebook server using the command `jupyter notebook` in your terminal. This opens a web-based interface in your default browser, allowing you to create, edit, and run notebooks stored on your local machine.
The addition of Python Notebooks to the Lab’s slate of collateral designed to support research and pedagogy provides an invaluable bridge between work carried out inside the platform and work done externally, using other tools and methodologies. The wide range of open educational resources available for the beginner Pythonista provide accessible entry points for the novice.
If you enjoyed reading this blog post, check out others in the ‘Notes from our DH Correspondent’ series, which include: