Current Issue
Preview Issue
Previous Issues
Preview Issue
Previous Issues
- 2022: 16.3
- 2022: 16.2
- 2022: 16.1
- 2021: 15.4
- 2021: 15.3
- 2021: 15.2
- 2021: 15.1
- 2020: 14.4
- 2020: 14.3
- 2020: 14.2
- 2020: 14.1
- 2019: 13.4
- 2019: 13.3
- 2019: 13.2
- 2019: 13.1
- 2018: 12.4
- 2018: 12.3
- 2018: 12.2
- 2018: 12.1
- 2017: 11.4
- 2017: 11.3
- 2017: 11.2
- 2017: 11.1
- 2016: 10.4
- 2016: 10.3
- 2016: 10.2
- 2016: 10.1
- 2015: 9.4
- 2015: 9.3
- 2015: 9.2
- 2015: 9.1
- 2014: 8.4
- 2014: 8.3
- 2014: 8.2
- 2014: 8.1
- 2013: 7.3
- 2013: 7.2
- 2013: 7.1
- 2012: 6.3
- 2012: 6.2
- 2012: 6.1
- 2011: 5.3
- 2011: 5.2
- 2011: 5.1
- 2010: 4.2
- 2010: 4.1
- 2009: 3.4
- 2009: 3.3
- 2009: 3.2
- 2009: 3.1
- 2008: 2.1
- 2007: 1.2
- 2007: 1.1
ISSN 1938-4122
Announcements
DHQ: Digital Humanities Quarterly
2021 15.4
Articles
[en] A Named Entity Recognition Model for Medieval Latin Charters
Pierre Chastang, UVSQ-Université Paris-Saclay; Sergio Torres Aguilar, UVSQ-Université Paris-Saclay; Xavier Tannier, Sorbonne Université
Abstract
[en]
Named entity recognition is an advantageous technique with an increasing presence in
digital humanities. In theory, automatic detection and recovery of named entities can provide new ways
of looking up unedited information in edited sources and can allow the parsing of a massive amount of
data in a short time for supporting historical hypotheses. In this paper, we detail the implementation
of a model for automatic named entity recognition in medieval Latin sources and we test its robustness
on different datasets. Different models were trained on a vast dataset of Burgundian diplomatic charters
from the 9th to 14th centuries and validated by using general and century ad hoc models tested on short
sets of Parisian, English, Italian and Spanish charters. We present the results of cross-validation in each
case and we discuss the implications of these results for the history of medieval place-names and personal names.
[en] Modernism and
Gender at the Limits of Stylometry
Sean Weidman, Pennsylvania State University; Aaren Pastor, Pennsylvania State University
Abstract
[en]
Virginia Woolf writes in her novel Orlando that “it is clothes that wear us and
not we them; we may make them take to mould of arm or breast, but they mould our
hearts, our brains, our tongues to their liking.” Her observation remains vital
to the author’s longstanding, feminist critique of essentializing discourses,
but it also gives recourse to the ways our computational methods of studying
literature and its history sometimes “mould […] to their liking” their objects.
When studying style and gender, digital humanists have tended to use
computational methods to trace embedded, hidden linguistic structures, showing
how they can contain, conceal, and condition the gendered lives of social groups
and cultural milieus. In this essay, we present a stylometric case study — Woolf’s
Orlando — that reminds us why, when dealing with gender in modern literature,
computational critics must pay particular care when addressing and generalizing
from modernism’s experimental styles. We outline the limitations of our own and
prior approaches to questions of gender and literary style, and we ask whether
flawed stylometric analyses can acquaint us with ways that modernism’s stylistic
innovations productively haunt the conclusions of digital literary
criticism.
[en] Compounded Mediation:
A Data Archaeology of the Newspaper Navigator Dataset
Benjamin Lee, The Library of Congress & The University of Washington
Abstract
[en]
The increasing roles of machine learning and artificial intelligence in the construction
of cultural heritage and humanities datasets necessitate critical examination of the
myriad biases introduced by machines, algorithms, and the humans who build and deploy
them. From image classification to optical character recognition, the effects of
decisions ostensibly made by machines compound through the digitization pipeline and
redouble in each step, mediating our interactions with digitally-rendered artifacts
through the search and discovery process. As a result, scholars within the digital
humanities community have begun advocating for the proper contextualization of cultural
heritage datasets within the socio-technical systems in which they are created and
utilized. One such approach to this contextualization is the data
archaeology, a form of humanistic excavation of a dataset that Paul Fyfe defines
as “recover[ing] and reconstitut[ing] media objects within their changing ecologies”
. Within critical data studies, this excavation of a dataset - including its
construction and mediation via machine learning - has proven to be a capacious approach.
However, the data archaeology has yet to be adopted as standard practice among cultural
heritage practitioners who produce such datasets with machine learning.
In this article, I present a data archaeology of the Library of Congress’s Newspaper Navigator dataset, which I created as part of the
Library of Congress’s Innovator in Residence program . The dataset
consists of visual content extracted from 16 million historic newspaper pages in the Chronicling America database using machine learning techniques. In
this case study, I examine the manifold ways in which a Chronicling
America newspaper page is transmuted and decontextualized during its journey
from a physical artifact to a series of probabilistic photographs, illustrations, maps,
comics, cartoons, headlines, and advertisements in the Newspaper
Navigator dataset . Accordingly, I draw from fields of scholarship
including media archaeology, critical data studies, science and technology studies, and
the autoethnography throughout.
To excavate the Newspaper Navigator dataset, I consider
the digitization journeys of four different pages in Black newspapers included in Chronicling America, all of which reproduce the same photograph of
W.E.B. Du Bois in an article announcing the launch of The Crisis,
the official magazine of the NAACP. In tracing the newspaper pages’ journeys, I unpack
how each step in the Chronicling America and Newspaper Navigator pipelines, such as the imaging process and the construction
of training data, not only imprints bias on the resulting Newspaper
Navigator dataset but also propagates the bias through the pipeline via the
machine learning algorithms employed. Along the way, I investigate the limitations of
the Newspaper Navigator dataset and machine learning techniques
more generally as they relate to cultural heritage, with a particular focus on
marginalization and erasure via algorithmic bias, which implicitly rewrites the archive
itself.
In presenting this case study, I argue for the value of the data archaeology as a
mechanism for contextualizing and critically examining cultural heritage datasets within
the communities that create, release, and utilize them. I offer this autoethnographic
investigation of the Newspaper Navigator dataset in the hope that
it will be considered not only by users of this dataset in particular but also by
digital humanities practitioners and end users of cultural heritage datasets writ large.
[en] Classifying and Contextualizing Edits in Variants with Coleto: Three Versions of Andy Weir’s The Martian
Erik Ketzan, Trinity College Dublin; Christof Schöch, University of Trier
Abstract
[en] This paper introduces Coleto, an automatic collation tool for the comparison of variant
texts in English, German, or French, which separates edits from variant texts so that
textual changes can be classified and contextualized. Coleto’s proposed methodology for
the classification of edits in variants includes: major/minor expansion, major/minor
condensation, changes to numbers and whitespace, and common orthographic features. From
this classification schema, Coleto generates: an aligned table of edits in the variants,
visualizations of the frequency of classified edits, and a visualization of edit density
across the progression of the texts. As a sample use case, we present mixed-method
analyses of Andy Weir’s science fiction bestseller, The Martian,
aided by Coleto’s functions and generated outputs. Code available at: https://github.com/dh-trier/coleto
[en] Character Recognition Of Seventeenth-Century Spanish American Notary Records Using Deep Learning
Nouf Alrasheed, Department of Computer Science. & Electrical Engineering. University of Missouri-Kansas City; Praveen Rao, Department of Health Management & Informatics, Department of Electrical Engineering & Computer Science. University of Missouri-Columbia; Viviana Grieco, Department of History. University of Missouri-Kansas City
Abstract
[en]
Handwritten character recognition is a challenging
pattern recognition problem due to the inconsistency of the handwritten scripts and
the lack of accurate labeled data. Historical documents written in cursive are even
more challenging as characters have unique and varying shapes. Frequently, words are
linked by lines and ornamental doodles. When historical documents are digitized, the
images contain various types of noise and degradation, which further complicates the
recognition of characters. In this paper, we present an empirical study of how well
state-of-the-art convolutional neural networks (CNNs) for image classification
perform for the task of recognizing handwritten characters in seventeenth-century
Spanish American notarial scripts. Professional historians, paleography experts and
trained labelers were involved in preparing the labeled dataset of Spanish
characters for training the CNNs. The
labeled dataset used in this experiment was created from the manuscripts written by
one of the multiple scribes that contributed to the collection of approximately
220,000 digitized images of notary records housed at the
Archivo General de la Nación Argentina (National
Archives). We removed the noise in these images by applying standard image
processing techniques. After training different CNNs, we computed the classification
accuracy for all the characters. We observed that ResNet-50 achieved a promising
accuracy of 97.08% compared to InceptionResnet-V2, Inception-V3, and VGG-16, which
achieved 96.66%, 96.33% and 70.91%, respectively.
[en] Finding Narratives in
News Flows: The Temporal Dimension of News Stories
Blanca Calvo Figueras, University of Gronigen; Tommaso Caselli, University of Groningen; Marcel Broersma, Centre for Media and Journalism Studies, University of Groningen
Abstract
[en]
Previous studies indicate that the capacity of media to influence the salience of
issues in the public realm is strongly dependent on specific attributes that
characterize these issues. In this work, we investigate two internal aspects of issue
types related to the attribute of duration. First, we address whether news stories
belonging to different issue types can be identified and represented using a set of
quantifiable temporal dimensions (i.e. lifespan, intensity, and burstiness). Second,
we conduct a qualitative analysis to investigate whether news stories of different
issue types have different narrative patterns, regardless of their specific topic. We
use a corpus of 50,385 political news articles in Spanish from 2018 as a case study,
and propose a novel system to aggregate the articles into stories. Our results show
that stories belonging to different issue types do have distinguishing behaviours,
especially along the intensity dimension. At the same time, the qualitative analysis
indicates a tendency to associate narrative patterns to issue types. This analysis
shows the potential of using news stories as research units to study framing
strategies.
[en] Innovation Through
Collaboration in Humanities Research
Maria Bonn, School of Information Sciences, Univesity of Illinois Urbana Champaign; Harriett Green, Washington University in St. Louis; Angela Courtney, Indiana University Bloomington; Megan Senseney, University of Arizona
Abstract
[en]
“Humanities Collaboration and Research Practices: Exploring
Scholarship in the Global Midwest” was funded by a Humanities Without Walls
(HWW) Global Midwest award to explore the community of practice engaged in the HWW
Global Midwest initiative. Led by Harriett Green, then at the University of Illinois
of Urbana-Champaign and Angela Courtney from Indiana University Bloomington, the
“Humanities Collaboration and Research Practices”
project (hereafter referred to as HCRP) examined the collaborative research practices
of HWW Global Midwest awardees to understand how humanities research happens at the
level of practice, process, and collaboration. The project team conducted
semi-structured interviews between fall 2015 through spring 2016 with twenty-eight
researchers who participated in projects funded by the first round of HWW Global
Midwest awards. Participants were asked about the aims of their collaborative
projects, the processes for developing their collaborations, the types of resources
used to support collaboration and project management (and whether additional
resources are required), the challenges they encountered, data sharing practices, and
how their research approaches and methodologies were influenced by engaging in
collaborative research. What emerges from these interviews is a rich portrait of the
ongoing evolution of collaborative humanities research and its social and
intellectual benefits, both actual and still potential, as well as indications of the
institutional and cross-institutional support and development needed to realize that
potential.
Reviews
[en] The Age Old Question: A Review of What is Digital History? by Hannu Salmi
Tracy L. Barnett, University of Georgia
Abstract
[en]
A book review of Hannu Salmi’s What is Digital History?.
Salmi’s work offers a clear and coincide introduction to the history of digital
humanities, notes the current state of the field, and address common criticism faced
by digital humanists. In particular, this book would be of interest to those teaching
undergraduate or graduate courses in historical methodology, public history, or
digital history.
Author Biographies
URL: http://www.digitalhumanities.org/dhq/vol/15/4/index.html
Comments: dhqinfo@digitalhumanities.org
Published by: The Alliance of Digital Humanities Organizations and The Association for Computers and the Humanities
Affiliated with: Digital Scholarship in the Humanities
DHQ has been made possible in part by the National Endowment for the Humanities.
Copyright © 2005 -
Unless otherwise noted, the DHQ web site and all DHQ published content are published under a Creative Commons Attribution-NoDerivatives 4.0 International License. Individual articles may carry a more permissive license, as described in the footer for the individual article, and in the article’s metadata.
Comments: dhqinfo@digitalhumanities.org
Published by: The Alliance of Digital Humanities Organizations and The Association for Computers and the Humanities
Affiliated with: Digital Scholarship in the Humanities
DHQ has been made possible in part by the National Endowment for the Humanities.
Copyright © 2005 -
Unless otherwise noted, the DHQ web site and all DHQ published content are published under a Creative Commons Attribution-NoDerivatives 4.0 International License. Individual articles may carry a more permissive license, as described in the footer for the individual article, and in the article’s metadata.