OLAC Record
oai:www.clarin.si:11356/1069

Metadata
Title:Spoken corpus Gos VideoLectures 1.0 (transcription)
Bibliographic Citation:http://hdl.handle.net/11356/1069
Creator:Verdonik, Darinka
Potočnik, Tomaž
Sepesy Maučec, Mirjam
Erjavec, Tomaž
Date (W3CDTF):2016-08-02T09:42:47Z
Date Available:2016-08-02T09:42:47Z
Description:Gos Videolectures is an add-on to the Gos reference speech corpus of Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos Videolectures recordings are a selection of public lectures available through web-portal Videolectures.net provided by the Jožef Stefan Institute, and covers in its first release 4.5 hours of speech. This resource contains only the transcriptions of the corpus - the audio recordings are avaiable at CLARIN.SI handle http://hdl.handle.net/11356/1070. All transcriptions for Gos Videolectures were done manually and carefully checked. The main guidelines for transcription were those of the Gos corpus (http://www.korpus-gos.net/Support/About). The transcription tool Transcriber 1.5.1 (http://trans.sourceforge.net/en/presentation.php) was used for making transcriptions. It can be also used for reading or exporting transcriptions (.trs files) to different formats. The transcriptions comprise the TRS files with tabular metadata, their conversion to TEI and to the CWB vertical file format. Each recording has two TRS files, one with the phonetic and the other with the normalised transcription. The TEI and CWB encodings join these two transcriptions at the token level, with the normalised words being also automatically PoS tagged and lemmatised. The corpus can be used for training continuous speech recognition for Slovene language, for phonetic research or any other research of Slovene academic speech.
Identifier (URI):http://hdl.handle.net/11356/1069
Is Replaced By (URI):http://hdl.handle.net/11356/1158
Language:Slovenian
Language (ISO639):slv
Publisher:Faculty of Electrical Engineering and Computer Science, University of Maribor
Rights:Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
https://creativecommons.org/licenses/by-nc/4.0/
Subject:speech database
spoken corpus
academic speech
speech transcription
speech recognition
TEI
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1069
DateStamp:  2018-10-18
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Verdonik, Darinka; Potočnik, Tomaž; Sepesy Maučec, Mirjam; Erjavec, Tomaž. 2016. Faculty of Electrical Engineering and Computer Science, University of Maribor.
Terms: area_Europe country_SI dcmi_Text iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1069
Up-to-date as of: Tue Aug 20 10:27:02 EDT 2019