Measuring Tool Bias & Improving Data Quality for Digital Humanities Research
2020
Online
Hochschulschrift
Zugriff:
Cultural heritage institutions increasingly make their collections digitally available. Consequently, users of digital archives need to familiarize themselves with new kinds of different digital tools. This is particularly true for humanities scholars who include results of their analyses in their publications. Judging whether insights derived from these analyses constitute a real trend or whether a potential conclusion is just an artifact of the tools used, can be difficult. To correct errors in data, human input is in many cases still indispensable. Since experts are expensive, we conducted a study showing how crowdsourcing tasks can be designed to allow lay users to contribute information at the expert level to increase the number and quality of descriptions of collection items. However, to improve the quality of their data effectively, data custodians need to understand the (search) tasks their users perform and the level of trustworthiness they expect from the results. Through interviews with historians, we studied their use of digital archives and classified typical research tasks and their requirements for data quality. Most archives provide, at best, very generic information about the data quality of their digitized collections. Humanities scholars, however, need to be able to assess how data quality and inherent bias within tools influence their research tasks. Therefore, they need specific information on the data quality of the subcollection used and the biases the tools provided may have introduced into the analyses. We studied whether access to a historic newspaper archive is biased, and which types of documents benefit from, or are disadvantaged, by the bias. Using real and simulated search queries and page view data of real users, we investigated how well typical retrievability studies reflect the users' experience. We discovered large differences in the characteristics of the query sets and in the results for different parameter settings of the experiments. Within digital archives, OCR errors are ...
Titel: |
Measuring Tool Bias & Improving Data Quality for Digital Humanities Research
|
---|---|
Autor/in / Beteiligte Person: | Traub, Myriam Christine ; Interaction, Afd ; Interaction ; Hardman, Lynda ; Van Ossenbruggen, J.R. |
Link: | |
Veröffentlichung: | 2020 |
Medientyp: | Hochschulschrift |
Schlagwort: |
|
Sonstiges: |
|