Version User Scope of changes
Jul 19 2009, 2:55 PM EDT EmilyMBender 26 words added
Jul 19 2009, 1:53 PM EDT EmilyMBender

Changes

Key:  Additions   Deletions
The collaboration structure group will be charged with considering methods for enhancing collaboration on three levels. Level 1 involves the linkage of individual researchers to the overall agenda of a shared, open-access, collaborative cyberinfrastructure. Level 2 involves the collaborations that are needed between system developers and tool developers to assure maximum interoperability and open access. Level 3 involves support for the use of the shared cyperinfrastructure to support collaborations between linguists and researchers in other sciences. For each of these levels, we need to design lightweight methods for ensuring ongoing collaboration and coordination.

The specific agenda items for this group include:
1. Data level interoperability -roundtrip, transduction, funding for process
2. Tool level interoperability and methods for collaboration in tool development.
3. Issues arising from a commitment to open access.
4. Contribution - NSF, NIH, LSA, incentives
5. Characterization of linguistic digital data types and methods for linking to non-digital data.
6. An agenda for developing linkages to other sciences - the bigger picture of Extended Linguistics.
7. Lightweight administration within a framework of complex organizations: NIH, NSF, LSA, CLARIN, etc.

Following are further analyses of these seven agenda items:

1. Data level interoperability.
  • Level 1 compatibility is compatibility of annotation format, such as a formats provided by frameworks such as Annotation Graphs (AG) or the Linguistic Annotation Framework's Graph Annotation Format (GrAF).
  • Level 2 compatibility is compatibility of data categories (content), wherein categories are the same conceptually and can be mapped to one another.
  • Level 3 compatibility is ...
2. Tool level interoperability and methods for collaboration in tool development.
  • LRT, TalkBank, E-Meld standards are largely similar
  • Media standardization issues for streaming serving
  • Roundtrips between tool formats: CHAT, Anvil, EAF, AG, EXMaRLDA, Wavesurfer, TEI, SALT
  • AG Tools approach
3. Issues arising from a commitment to open access. What data must be kept away from the public and what data can be made freely available? How can linguists work together to increase access to larger amounts of linguistically important data?
  • Sharing principles at talkbank.org/share
  • Legacy vs. forward protocols
  • Community, population constraints
4. Methods for promoting a higher level of data contribution and individual researcher "buy in".
  • Inducements: publication, easy tool linkage
  • Community: role of LSA
  • Obligations and standards: role of NIH, NSF, DARPA, IE
  • Leading role of the European Community
5. Characterization of linguistic digital data types and methods for linking to non-digital data.
Here, it is important to distinguish the emphasis on corpora and linked media from the many other types of digital data that are of interest to linguists. In the area of Linguistic Exploration, the fundamental objects may be word lists, sentence lists, or dictionaries. In Linguistic Anthropology digitized records of objects are important. This extends eventually over to Archaeology and even information on human genetics etc. In the Learning Sciences, there is an emphasis on linking classroom video to individual student portfolios that may include letters, tests, art work and so on. For digital libraries, it is important to make clear where the hard copies actually reside. For many of these objects, identification can be made through the assigment of digital object identifiers (DOIs). However, this is a largely unexplored territory for most linguists.

6. An agenda for developing linkages to other sciences - the bigger picture of Extended Linguistics. Here, the MacWhinney-Groves NSF report should be particularly helpful. This report will be available just before the start of the meeting.

7. Lightweight administration within a framework of complex organizations: NIH, NSF, LSA, CLARIN, etc. There is a perception that some work on the development of shared cyberinfrastrucure has been top-heavy on committee work and reports without producing a significant amount of shared interoperable resources. Is there a way to build organizational structures that produce open-access products? Who should determine patterns of collaboration or should these patterns "emerge" through specific less-organized exchanges. But then how these interactions be guided toward cooperation and interoperability? Perhaps an emphasis on standards for collaboration might be possible.

Recommended Readings
  • SIGAnn website
  • ComNet Proposal (see "attachments" at bottom of this page)
  • SILT Proposal (attachment)
  • FLaReNet website
  • ISO committee for Language Resource Management webpage
  • Linguistic Annotation Framework and Graph Annotation Format descriptions (attachments)

Additional Links


Communication Problem

  • People building tools/standards need to be aware of each other
  • People potentially using tools/standards need to be able to find them
  • People not thinking of tools/standards need to become aware of the importance
  • Linguistics is just one field that studies language data, even within linguistics subfields are quite fragmented

Communication Vehicles

  • Wikis/blogs
    • On-going maintenance of information collections
    • Reasons for people to come back to the on-line communication site
  • Scholarly organizations
  • Funded collaborations
  • Workshops/tutorials
  • Reviewing guidelines/review feedback
    • Pushing funding agencies to require plans (and follow through) for using standards and publishing data for proposals that use tools/create data
    • Pushing funding agencies to require proposals for new tools/standards to appropriately cite and situate themselves within the existing tools/standards ecology
    • Conference/journal reviewing check for appropriate citations of data, tools, resources
  • Resource maps/eliciting metadata (cf. LREC 2010)

The 4 Cs


  • Collaboration (lots of bilateral collaboration alone isn't enough)
  • Coordination (coordinating of standards and tools, including technical side)
  • Communication (the people side)
  • Community Building

Action items


  • Draft recommendations to funding agencies regarding standards, data publication, etc.
  • Draft recommendations to journal editors and conference organizers regarding citing tools/resources and publishing data