Version User Scope of changes
Aug 29 2009, 2:10 PM EDT (current) DeborahAnderson 29 words added, 9 words deleted
Aug 28 2009, 7:39 PM EDT DeborahAnderson 134 words added, 11 words deleted

Changes

Key:  Additions   Deletions
On this page:How do we define 'other standards'?
An aside...Standards vs. Best Practices
How do we encourage adoption of standards in linguistics?
Data sharing: the publication model
Standards are great, now how do I use them?
How can I participate in the creation of ISO standards?

How do we define 'other standards'?

For the purposes of this workshop, we take our domain of interest to be standards related to the sharing of language data within the linguistics community. The discussion is organized around a questionable* division of data sharing into four subtopics:
*We call this a questionable division because the four are deeply interrelated.

  1. Storage of digital data
  2. Retrieval and discoverability of digital data (i.e. discoverability at the document or resource level)
  3. Search of digital data (i.e. discoverability at the within-document level)
  4. Access and reusability of digital data


The table below displays some of the key areas of concern for each of the four topics we take to fall within our domain of interest.

STORAGERETRIEVALSEARCHACCESS/REUSE
METADATAMETADATAMETADATA
METADATA
DIGITIZATION of both primary data & metadataVERSIONING tracking major changes/decisions made and motivations for sameSOURCE MATERIALS linking to audio/video source
CITATION STANDARDS how to cite datasets
FORMATS & STANDARDS open access standards & formatsDIGITAL FINGERPRINTING
** ANNOTATION CONVENTIONS collection and dissemination of conventions used in existing data collectionsPRIVACY/LEGAL ISSUES related to user access, privilege assignment, copyright and data ownership
LEGACY DATA providing within-subfield model for best practices data sharingADAPTIVE CODING ability to adjust data coding scheme as knowledge evolvesCONSISTENCY consistency and quality of data annotations** SUBFIELD-SPECIFIC USABILITY CONCERNS specialized standards, metadata sets, ontologies, etc.
PUBLICATION
STABLE ADDRESSING OF RESOURCES

REUSABILITY repurposing of data for use in addressing new research questions by both humans and machine
ARCHIVINGWEB STANDARDS


** indicates action items listed in the table that may be implemented immediately or in the near term.



An aside...standards vs. best practices (a distinction with a difference?)

We weren’t certain that linguistics, as a field or academic culture, has a tradition of clearly differentiating standards from best practices. We do not have an organizational body within the field that sets standards. Subfields are autonomous and vary to the extent that standards or best practices are discussed, named and adhered to. We nonetheless considered a few possible distinctions that might generally be made between the two, so we can be as clear as possible about what we mean when we use the terms “standard” and “best practice” in these wiki pages:

STANDARDS:
  • Often, these are theory-neutral conventional systems for accomplishing some task (often related to analysis, description, or publication) in linguistics (e.g., the IPA system for phonetic transcription)
  • Named (so practitioners may name standards to which their practices adhere in published work, for example)
  • Official (new standards will explicitly obsoletize prior or existing ones) handed down from a high-level organization charged with regulating usage, nomenclature, etc.
  • Use is subject to sanction or mandate
  • Developed over time via a process involving the deliberations of an organizational body of experts, after discussion and consensus
  • Follow from best practices, ranked and subjected to selection
  • links: discussions regarding standards
  • links: political issues

BEST PRACTICES:
  • Often, principled practices rather than mandated systems for accomplishing some analytical, descriptive, or publication-related task in linguistics
  • Recommended, but not strictly enforced
  • Generated by practitioners in a bottom-up process, who wish to build consensus in practice and are often interested in motivating the need for a particular practice


How do we encourage adoption of standards in linguistics?

Data sharing: the publication model offers some possibilities with regard to building incentives for adopting standards, acknowledging use of annotated corpora, receiving and giving credit for the use of annotated and marked-up data (as a scholarly practice of value to the field). Working group 5 explored ways that other disciplines are sharing data, so we may learn from these examples.
Publication mechanisms for linguistic data collections are one possibility for encouraging adoption of standards.

  1. Receiving academic credit for publication of data would provide a needed incentive for doing the extra work needed to be sure that standards are followed.
  2. Peer review will improve the quality of shared data.
  3. Publication and proper citation of data facilitate demonstrating the scholarly contribution made by providing the data.
  4. Publication of legacy data would provide a valuable training ground for young researchers as well as providing a model for preparation of data according to best practices.

Standards are great, now how can I use them?
Widespread use of standards and/or best practices just won't happen unless it is easy for people to:
  1. Locate information re: standards and what they entail.
  2. Learn how to apply the standards to their own data.
Of course, a commitment to communication, collaboration, coordination, community building and open access to data are crucial for supporting the use of standards. Working Group 7 discussed this issue.

How can I participate in the creation of ISO standards?
ISO is home to a wide array of standards, and the process of standardization can appear to be opaque and daunting to the outsider. A short page devoted to how linguists can participate in ISO standards development is located at:
How to Get Involved in ISO Standards Development. (This page also includes a short section on how to participate in the development of the Unicode Standard.)

By actively participating in ISO standardization, linguists will have a vested interest in using them and advocating theirthe use amongst colleaguesof andsuch students.standards. Involvement by linguists also has the result of making sure standards are suited to current needs, and haven't become fossilized. Ideally, participants should get recognition from their host institution for work on standards development, a job that often requires many hours of time and (at times) considerable personal expense.

Future work:
WG2: Big ideas -- not yet written up
  1. simple but powerful tools
  2. privacy and ethics concerns must be considered