Group 2: Standards for Storage, Retrieval, and Search of DataThis is a featured page

As part of the Cyberling2009 workshop at Berkeley, this working group was charged with identifying and documenting existing and needed standards for the digital storage, retrieval, and search of linguistic data. Another concern was the potential for reuse of language data by parties other than the original creators of the data. We present the results of our working sessions as a set of wiki pages, as outlined below.

Members ---
Debbie Anderson, Eric Kansa, Pavel Mihaylov, Johanna Nichols, Alexis Palmer, Alicia Wassink

Process ---

While standards of some kinds
for storage, retrieval, and search of linguistic data do exist in linguistics, many subfields of linguistics talk more about "best practices" and "common practices" than they do about "standards". We discussed the ways these terms are used on our Big Ideas page. We were not tasked with discussion of the related and important issue of annotation standards. For discussion of this issue, please see Group 1: Annotation Standards.

Summary of Big Ideas regarding data sharing (storage, retrieval and search):
How do we define 'other standards'?
Standards vs. Best Practices
How do we encourage adoption of standards in linguistics?
Data sharing: the publication model (for more on this, see the white paper from Group 4)
Standards are great, now how do I use them?
How can I participate in the creation of ISO standards?

Issues addressed within the WG2 wiki pages:
  • Unicode character encoding standards for increasing stable display, readability, and sharing of data
  • Relational database storage
  • Wiki-based sharing of research
  • Metadata tags for increased transparency and usability of data
  • Version control
  • Web standards for sharing datasets
  • Machine reusability of data (under construction)
Results ---
  1. A set of examples/case studies demonstrating applications of standards for storage, retrieval, and search and their utility for linguistic research.
  2. A set of subfield-specific seed lists of common practices, requirements, conventions, etc. The purpose of creating these lists is twofold. First, the lists should be helpful to individual linguists working in the subfield in question. Second, they can serve as reference material to linguists from other areas who might wish to annotate beyond their individual research concern.
  3. A seed list of existing standards for storage, retrieval, and search of linguistic data.
  4. A handful of recommendations regarding not-yet-existent but needed standards for linguistics cyberinfrastructure.
  5. Additional resources: relevant links, papers, etc.

Notes from working sessions



AliciaBW
AliciaBW
Latest page update: made by AliciaBW , Oct 9 2009, 2:57 PM EDT (about this update About This Update AliciaBW Edited by AliciaBW

1 word added
3 words deleted

view changes

- complete history)
Keyword tags: None
More Info: links to this page
Started By Thread Subject Replies Last Post
mebeckman standards for ethical treatment of human subjects 1 Jul 15 2009, 10:49 AM EDT by alexispalmer
Thread started: Jul 14 2009, 6:18 PM EDT  Watch
In Working Group 1, we've been pondering where/how to make sure that there is a page that addresses privacy and ethical treatment of human subjects, which seems to be an issue that cross-cuts the charge to Group 2 and Group 4 as well as touching on the charge to us. Could we figure out how best to make sure that this doesn't slip through the cracks between the groups?
Do you find this valuable?    
Keyword tags: None
Show Last Reply
coffee001 Repository systems 0 Jul 12 2009, 6:55 AM EDT by coffee001
Thread started: Jul 12 2009, 6:55 AM EDT  Watch
Not sure if it makes sense in this context, but www.escidoc.org and www.fedora-commons.org might be relevant in relation to long-term archiving. Not really standards, but "infrastructure", though.
Do you find this valuable?    
Keyword tags: None
Showing 2 of 2 threads for this page
Powerpoint Presentation jn_cyberling_reanalysis_trail.ppt (Powerpoint Presentation - 1,356k)
posted by Johanna.Nichols   Jul 18 2009, 6:49 PM EDT
DRAFT example of analysis history