Language documentation common practicesThis is a featured page

NOTE: this structure is offered only as a suggested skeleton. Please modify as you see fit.

Introduction:
The fieldworker (and this includes documentary linguists) needs to learn the structure of the language and produce a grammatical description, dictionary, and texts usable by researchers and others (including interested members of the speech community). He or she is usually also singlehandedly responsible for producing theoretical and comparative work on the language. Not much will be said here about producing grammars (though see Dryer 2006) or the general scholarly work, other than to note that the whole set of responsibilities gives the fieldworker a considerably heavier workload than the average for linguists, so tools and standards that save time are especially needed. The tasks in need of standards and tools for sharing and access are: recording and/or digitizing; transcribing; creating texts; creating a lexicon; and archival storage and access of recordings, texts, and lexicon (and perhaps other materials).

Recommended Readings:
Dryer, Matthew S. 2006. Descriptive theories, expanatory theories, and basic linguistic theory. In Felix Ameka, Alan Dench and Nicholas Evans, eds., Catching Language: Issues in Grammar Writing. Berlin: Mouton de Gruyter.

Software:
(See Existing standards... below.)

Existing standards, common practices, and/or best practices:
Recording: Information available from the DoBeS project of the Max Planck Institute for Psycholinguistics, Nijmegen; the Hans Rausing Endangered Languages Project, SOAS, London; and others.

Transcribing: Many field linguists use Transcriber, available from SourceForge.net.

Text work. The main tasks are inputting or importing transcribed material, morphological interlinearizing, syntactic annotation, parsing, lemmatization, concordance building, and preparation for corpus searching and archiving. For standards for some kinds of interlinearization and annotation see the report of Working Group 1. To my knowledge there are no current widely used software tools that could be described as best practice, though there is a growing consensus about what they need to do (e.g. multiple annotation tiers; link to recordings; enable non-interlinear annotation for discontinuous, non-compositional, multi-word, and non-linear categories and functions; enable printout of text segments with standard publishable interlinears; enable corpus searches of all kinds; enable other access). There is also very little knowledge of what might go into theory-neutral syntactic annotation.
A list of tools dated 2004 and including some text tools is here. I am aware of these updates and additions: Kura; and a glossing tool under development by Thomas Mayer (presented at ALT8, July 2009). The DoBeS project has elaborate and fairly specialized tools for text and dictionary work.

Dictionary compilation. The most common practice for compiling dictionaries of all kinds (descriptive, defining, etymological, etc.) seems to be use of commercial database software to create a self-standing database that does not, e.g., link to a text corpus.

Archival storage and access: Archives have their own standards for metadata and data formats. The DoBeS project has an extensive list of archives with links.

Needed standards and data-sharing resources:

Easy-to-use tools for text work are badly needed. Basic research, leading ultimately to standards, for theory-neutral syntactic annotation is needed.



No user avatar
Johanna.Nichols
Latest page update: made by Johanna.Nichols , Aug 28 2009, 8:50 PM EDT (about this update About This Update Johanna.Nichols Drafted section "Language documentation common practices". - Johanna.Nichols

160 words added
5 words deleted

view changes

- complete history)
Keyword tags: None
More Info: links to this page
There are no threads for this page.  Be the first to start a new thread.