Notes Working Session 2
Notes Working Session 2 (WG2)
Notes from Working Session 2 -- 10-12:00, Saturday 18 July
(scribe Richard Wright)
The session began with a discussion of organizational issues led by Mary
We decided that we should break up into subgroups and address each sub-area in a linear form to work on the WIKI page.
Stuart - Suggests that we next step is to discuss the mornings interim working group reports in the context of our subgroup
Discussion of feedback and links to other presentations
On our main heading "What is annotation"
Stuart - we didn't get much feedback on Pre-existing annotation standards other than Marc's feedback which is a list of links with citations
Richard - Are we going to expand on the current list of annotation standards?
Stuart/Mary - Why don't we copy group 2 and have each of us provide "case studies" of existing standards
Because of the expertise of working group members, five areas were agreed upon and assigned:
1- Richard - IPA (community of practice, well established and democratic community of scholars the IPA, etc.)
2- Grev - Leipzig glossing rules
3- Mary - ToBI
4- Chuck - FRAMENET
5- Sarah - Sign annotation - tools but no standards and community empathy
In each of the discussions, the following desiderata should be discussed:
Desiderata for Good Annotation Conventions
* Interoperability
o Can the annotation be validated and used in different tools or computational models? Need to separate logical structure versus "presentation" format. Also need to think both:
+ horizontally (Is it possible to translate to/from other annotation schema?)
+ vertically (Is the standard useful for purposes different to the originally intended ones?)
* Extensibility/Adaptability
o Can the annotation schema be extended to other styles, other dialects, other languages, ...
o Is there a solid and suitably diverse core of users/maintainers to allow the standard to evolve and change in response to user feedback/new needs?
o Are there good standards/mechanisms for versioning? Meta-data for which version was used in annotating any corpus.
* Granularity
o Are there principled mechanisms for providing partial annotations ...
o ... and then later on extending (or revising) the annotations of the same corpus?
o ... and keeping track of which parts are in what state of annotation and verification/modification?
o Are there ways to gracefully make a more versus less specific annotation?
+ Chuck gave an example of NP:NP sequences such $20/hour which for syntactic parsing is just two NPs, but for interpretation need to know that this is a "rate phrase"
+ Mary gave an analogous example where knowing whether a word is accented or not in an English utterance is enough to use the information in interpreted which vowel it was, but for purpose of using the annotation to train an intonation synthesizer, need to know what accent type it was.
o Are there good ways to indicate degree of certainty?
* Useability
o Is there good (accessible and extensible) documentation?
o Is there a suitably diverse and continuous community for teaching (and testing ability of) new annotators / users?
o Are there good tools for annotating and using the annotations, and good community mechanisms for building/extending/sharing tools?
Next 15 min: Mary - Let's go through the top page and clean it up:
- What does it take to be an annotation standard?
-Stuart: I don't like "system"
-Mary: Convention?
-Chuck desiderata for an annotation standard?
General agreement
-Mary:
How can we promote best practices
-Mary: should we fold this into the previous sections
-Stuart: I think it deserves its own section
-General discussion about Friday night's discussion RE dissemination of data.
-Sarah: Some researchers insist on coauthorship for people who use their tools or corpora, is this a sustainable model? Can best practices be disseminated in a locked-down system?
-General agreement that the more barriers that are erected on the part of community members, the less likely it will be that their contributions to best practices will be disseminated.
-Mary: is there a mechanism for human subjects protection in publicly available archives who want their data removed?
-Stuart: Version control worries... re Brian's comment from the preceding night that version control of spoken corpora ... keeping track of the versioning.
-Richard: is version control a best-practices issue rather than a standards?
-Stuart: so where do we put it?
- Mary: it seems like that goes hand in hand with extensibility... adding to existing annotation.
-Stuart: This is on difference between industry and linguistics: all industry engineers start with versioning tools before working. Academia could learn from this. A way of knowing what you're talking about. It's essential.
-Sarah: It's takes time and money to maintain versioning -
-Stuart: If it doesn't change then you don't need versioning-
-Richard: the assumption should be that it will change, and therefore be amenable to change.
Mary: should we go on? 11:13
Chuck: are we going to talk about inter-annotator reliability?
Mary: we've got interannotater reliability as part of a standard?
Chuck: Not all annotation lends itself to inter-annotator reliability...
Richard: it should say whether it was done and how it was done even if it doesn't seem relevant to the annotator at the time.
Mary: Taking an example from child language research it's important to include the confidence intervals in the form of intertranscriber reliability.
Richard: but not all fields lend themselves to reliability?
Stuart: But the annotation standard should allow multiple (and potentially disagreeing) annotations.
Mary: This is then in best practices and in annotation standards.
11:25
Chuck: at some point we need to talk about training...
Stuart: that's under best practices ...
Sarah: When you get to granularity, especially fine grained distinctions, do you need training outside your area.
Chuck: you need granularity of analysis as in interpreting the meaning of complex noun phrases, and granularity of knowledge (certainty)
11:34
Mary: I'll revise by folding "community of users" into "what are annotation standards for"
Chuck: Suggests saying "what are annotations" and "What are annotations for"
Chuck: I'd to see annotations of annotations included: the output of a parse that was run on part of speech tags is a derived annotation
Mary: I'd like to see the community expand its ideas about what an annotation is: hand-corrected formant tracks are a form of annotation, NTT Kondo Amano dictionary with judgements of the written and aural forms to working class Japanese speakers, ie naive subject judgements .
...more examples about what might be considered annotation and general discussion
11:52
Mary: Should professional organizations go under "resources"
general agreement
Does the current structure follow the right order?
general agreement
Mary: I'll work on the table of contents over lunch.
Mary: What are the two groups that we want to talk to?
Stuart: Tools is one of them.
Mary: I'll talk to the 2 and 4 about human subjects concerns.
Chuck: is that part of the standards?
Richard: there are meta-data information at least that are part of the annotation standards.
Mary: For example you may want to have a standard way of removing some identifying information such as names without destroying other information such as prosodic and intonational information (like a hum).
There are no threads for this page.
Be the first to start a new thread.