OLAC Record
oai:lindat.mff.cuni.cz:11234/1-4615

Metadata
Title:A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Documents
Bibliographic Citation:http://hdl.handle.net/11234/1-4615
Creator:Novotný, Vít
Seidlová, Kristýna
Vrabcová, Tereza
Horák, Aleš
Date (W3CDTF):2021-12-10T12:28:54Z
Date Available:2021-12-10T12:28:54Z
Description:This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
Identifier (URI):http://hdl.handle.net/11234/1-4615
Language:German
Czech
Latin
English
Language (ISO639):deu
ces
lat
eng
Publisher:Masaryk University, Brno
Rights:Public Domain Dedication (CC Zero)
http://creativecommons.org/publicdomain/zero/1.0/
Subject:ocr
optical character recognition
language identification
image super-resolution
sr
Medieval
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-4615
DateStamp:  2021-12-10
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Novotný, Vít; Seidlová, Kristýna; Vrabcová, Tereza; Horák, Aleš. 2021. Masaryk University, Brno.
Terms: area_Europe country_CZ country_DE country_GB country_VA dcmi_Text iso639_ces iso639_deu iso639_eng iso639_lat olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-4615
Up-to-date as of: Thu Oct 5 0:43:08 EDT 2023