OLAC Record
oai:www.ldc.upenn.edu:LDC98T32

Metadata
Title:JURIS
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Canavan, Alexandra, and Paul Morgovsky. JURIS LDC98T32. Web Download. Philadelphia: Linguistic Data Consortium, 1998
Contributor:Canavan, Alexandra
Morgovsky, Paul
Date (W3CDTF):1998
Description:*Introduction* This publication represents a release of the JURIS (Justice Department Retrieval and Inquiry System) data collection that has been made available to the Linguistic Data Consortium (LDC) by the U.S. Department of Justice. The time span of the text ranges from the 1700s to the early 1990s. *Data* There are 1,664 individual text files in the corpus, 1011 on the first CD-ROM and 653 on the second. The original archive consisted of 219 files ranging between less than 1 MB and nearly 70 MB in size. In order to make the data more accessible for researchuse, we chose to divide the larger files into pieces, such that the average file size was about 2 MB when uncompressed (the largest uncompressed file size is about 4.5 MB). Divisions of the files were done at document boundaries, so all files contain whole documents. There are a total of 694,667 document units in the corpus and these can be categorized to some extent with regard to their content. The following is a partial list of categories and their descriptions drawn from JURIS documentation contained in the corpus. The terminology and organization of categories are those used in the JURIS documentation: * Case Law * Executive Order * Regulations * Federal Register * Statutory Law * Administrative Law * International Agreements * Freedom of Information Act and related documents * Indian Law * Tax Law * Brief As many of the documents contain Social Security Numbers of the parties involved, these have been redacted to protect the privacy of those individuals. All valid Social Security Numbers have been replaced with the string XXX-XX-XXXX. In some documents, number strings may be identified as Social Security Numbers, but they are in fact substitutions such as the series, 123-45-6789 or 987-65-4321. These ersatz numbers have been left unchanged. Some personal names have also been redacted and replaced with XXXXXXXX. *Updates* There are no updates at this time.
Extent:Corpus size: 880640 KB
Identifier:LDC98T32
https://catalog.ldc.upenn.edu/LDC98T32
ISBN: 1-58563-135-3
ISLRN: 380-519-899-609-3
DOI: 10.35111/4w95-zg53
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC98T32
Rights Holder:Portions © 1998 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC98T32
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Canavan, Alexandra; Morgovsky, Paul. 1998. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC98T32
Up-to-date as of: Mon Mar 25 7:20:05 EDT 2024