Version User Scope of changes
Oct 9 2009, 2:57 PM EDT (current) AliciaBW 1 word added, 3 words deleted
Aug 31 2009, 8:49 AM EDT alexispalmer

Changes

Key:  Additions   Deletions
As part of the Cyberling2009 workshop at Berkeley, this working group was charged with identifying and documenting existing and needed standards for the digital storage, retrieval, and search of linguistic data. Another concern was the potential for reuse of language data by parties other than the original creators of the data. We present the results of our working sessions as a set of wiki pages, as outlined below.

Members ---
Debbie Anderson, Eric Kansa, Pavel Mihaylov, Johanna Nichols, Alexis Palmer, Alicia Wassink

Process ---

While standards of some kinds
for storage, retrieval, and search of linguistic data do exist in linguistics, many subfields of linguistics talk more about "best practices" and "common practices" than they do about "standards". We discussed the ways these terms are used on our "Big ideasIdeas" page. We were not tasked with discussion of the related and important issue of annotation standards. For discussion of this issue, please see Group 1: Annotation Standards.

Summary of Big Ideas regarding data sharing (storage, retrieval and search):
How do we define 'other standards'?
Standards vs. Best Practices
How do we encourage adoption of standards in linguistics?
Data sharing: the publication model (for more on this, see the white paper from Group 4)
Standards are great, now how do I use them?
How can I participate in the creation of ISO standards?

Issues addressed within the WG2 wiki pages:
  • Unicode character encoding standards for increasing stable display, readability, and sharing of data
  • Relational database storage
  • Wiki-based sharing of research
  • Metadata tags for increased transparency and usability of data
  • Version control
  • Web standards for sharing datasets
  • Machine reusability of data (under construction)
Results ---
  1. A set of examples/case studies demonstrating applications of standards for storage, retrieval, and search and their utility for linguistic research.
  2. A set of subfield-specific seed lists of common practices, requirements, conventions, etc. The purpose of creating these lists is twofold. First, the lists should be helpful to individual linguists working in the subfield in question. Second, they can serve as reference material to linguists from other areas who might wish to annotate beyond their individual research concern.
  3. A seed list of existing standards for storage, retrieval, and search of linguistic data.
  4. A handful of recommendations regarding not-yet-existent but needed standards for linguistics cyberinfrastructure.
  5. Additional resources: relevant links, papers, etc.

Notes from working sessions