Options
Person-Centric Mining of Historical Newspaper Collections
Journal
Research and Advanced Technology for Digital Libraries
ISSN
0302-9743
1611-3349
Date Issued
2016
Author(s)
Editor(s)
Fuhr, N.
Kovacs, L.
Risse, T.
Nejdl, W.
DOI
10.1007/978-3-319-43997-6_25
Abstract
We present a text mining environment that supports entity-centric mining of terascale historical newspaper collections. Information about entities and their relation to each other is often crucial for historical research. However, most text mining tools provide only very basic support for dealing with entities, typically at most including facilities for entity tagging. Historians, on the other hand, are typically interested in the relations between entities and the contexts in which these are mentioned. In this paper, we focus on person entities. We provide an overview of the tool and describe how person-centric mining can be integrated in a general-purpose text mining environment. We also discuss our approach for automatically extracting person networks from newspaper archives, which includes a novel method for person name disambiguation, which is particularly suited for the newspaper domain and obtains state-of-the-art disambiguation results.