OLAC Record: BulTreeBank Tokenizer

OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-1240

Metadata

Title: BulTreeBank Tokenizer

Bibliographic Citation: http://hdl.handle.net/11372/LRT-1240

Contributor: Simov, Kiril

Creator: Simov, Kiril

Date (W3CDTF): 2014-07-30T21:33:43Z

Date Available: 2014-07-30T21:33:43Z

Description: The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories.

Identifier (URI): http://hdl.handle.net/11372/LRT-1240

Language: No linguistic content

Language (ISO639): zxx

Publisher: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences

Type: toolService

Type (DCMI): Software

OLAC Info

Archive: LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Description: http://www.language-archives.org/archive/lindat.mff.cuni.cz

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:lindat.mff.cuni.cz:11372/LRT-1240

DateStamp: 2021-06-29

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Simov, Kiril. 2014. Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences.
Terms: dcmi_Software iso639_zxx

http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-1240
Up-to-date as of: Mon Jun 16 1:04:43 EDT 2025

Metadata
Title:		BulTreeBank Tokenizer
Bibliographic Citation:		http://hdl.handle.net/11372/LRT-1240
Contributor:		Simov, Kiril
Creator:		Simov, Kiril
Date (W3CDTF):		2014-07-30T21:33:43Z
Date Available:		2014-07-30T21:33:43Z
Description:		The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories.
Identifier (URI):		http://hdl.handle.net/11372/LRT-1240
Language:		No linguistic content
Language (ISO639):		zxx
Publisher:		Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Type:		toolService
Type (DCMI):		Software
OLAC Info
Archive:		LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:		http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:lindat.mff.cuni.cz:11372/LRT-1240
DateStamp:		2021-06-29
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Simov, Kiril. 2014. Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences.
Terms:		dcmi_Software iso639_zxx