Notes Working Session 1 (WG1)
Notes from Working Session 1 -- 2:30-5:00, Friday 17 July
(scribe Mary Beckman)
Some decisions that we made:
We need to coordinate with the tools group about tools for translation
between different annotation conventions.
We decided to work on the Cross-cutting issues/questions first as a group
and then break up into subgroups to finalize different pages.
We decided to set the last question (What existing resources are there for
helping to develop and maintain annotation standards?) to be a different page.
==> Mary did this
We also will make the list of standards be a different (set of) page(s).
We also will make references be on a different page.
==> Stuart did this
=========================================================================
Thoughts on cross-cutting issues/questions.
* What is "annotation" and what are annotation standards "for" ...
We need to make the point that annotation never should replace the data
that are being annotated. The annotations are *tags* or *pointers* into
the primary data.
Chuck: What are annotations for?
You have to have an idea of what an annotation is for, and how "right"
you have to be to get the purpose.
We need to emphasize the value of building in ways for evolution of a
standard, and versioning, and translation between versions.
Grev asked about the ToBI framework: You mean the framework changes from
one language to another? Mary: It's comparable to "What is a word?"
* Are annotation and use of annotated data the same in different
language domains?
Seems clearly not ...
Issues that are faced in annotating text for discourse interpretation
can be very different from the issues that are important for syntax,
as pointed out by ...
Chuck: Someone annotating German would face the issue of figuring out
the trigger and then scope of stretch of quoted speech. Grev: Similar
issues for interpreting reference of anaphoric expressions (or of
unexpressed anaphoric elements)
* Are annotation and use of annotated data the same when construed in
reference to different applications/research questions?
Are there ways to encourage more flexible thinking about what annotation is,
so that e.g. corrected formant tracks for vowels become a kind of annotation
for vowels, and felicity judgments elicited from native speakers using
Mechanical Turk become a kind of annotation of transcriptions of sentences,
etc.?
Nancy Ide's point:
Can we clearly distinguish between the need for a specific annotation format,
that will enable us to represent all kinds of annotation information in ways
that enable comparison, merging, and use with lots of existing tools, and
annotation categories, which are the descriptive linguistic labels associated
with primary data? Does this distinction make our job easier?
We worked on understanding the last question (that Nancy Ide had added):
We understand "format" as being about things like whether the annotation
is hierarchical (tiered) structure, so that decisions about well-formedness
of bracketting, and .... We just need to ask her.
* What does it take to be an annotation standard?
Chuck: What is a annotation standard? Can someone give me an example?
* How can we promote "best practices" for aspects of
developing/maintaining/using systems, ....
Grev: by refereeing, and not accepting journal submissions where the
presentation of examples that doesn't follow any standard at all.
Stuart: So adoption of a standard as required by a journal and so on?
But then we get conflation of formating for data analysis versus formating
for presentation, no?
Richard: the standards and rules for using them to different degrees of
precision need to be published in clear documentation
Stuart: the best is the enemy of the good problem
Richard: so you need resources for educating people to see how they can
figure out how to provide the level of detail that is appropriate for
the use.
Stuart: part of that is about the standard being flexible and another
is about education.
Mary: need a way of gracefully and usefully marking uncertainty.
Stuart: a way for getting feedback to develop the standard to encorporate
new categories and so on ...
Richard/Stuart: the standard needs to be tool friendly
Stuart: e.g. Shoebox is a bad format in a lot of ways
That's different from the logical structure of the annotation standard.
e.g., better to develop the standard in a way that is friendly to
implement as XML markup, etc.
Mary: analogous "format" issue in phonetics is difference between tagging
points and tagging intervals.
Chuck: different levels of dis-satisfaction with annotation ...
1) The annotation standard doesn't have a way of tagging something.
2) The annotator had made an analysis that gets it "wrong".
3) There is a way of extracting what the missing categories are
by some translation schema.
Gave the example of finding the "of" in [[details of construction...]]
Was able to change (2) to (3) by locating annotator and quizzing
about how he handled this. Turned out that had transcribed as
pro-clitic on following verb, and once Chuck understood this,
could find a way to search for the contruction.
==> Mary needs to ask Chuck to repeat, so as to fill in [[details]].
Sign language:
Sarah: the biggest problem has been getting any kind of buy-in from
the Deaf community.
Mary: maybe the point that we need to emphasize that we are not talking
about developing a writing system that will replace the video record
with a derived set of data. Rather we are talking about providing tags
for searching the primary data.
The next generation of annotation schema:
Starting with promoting graceful/useful ways of indicating uncertainty.
Does this translate into a way of having layers of annotation with
tools for making inter-transcriber agreement. And possibly even later
on incorporation of naive judges.
==> This is another issue that we should bring up with the tools group
=========================================================================
Preparing for the first Interim Report:
Use the IPA and Leipzig Interlinear Glossing Rules as good examples to
show both how they instantiate/fulfill each of the desiderata and how
they fall short
Start with the two big questions that we're going to start the wiki with.
Then list the four desiderata with a slide on each one.
interoperability
Can the annotation be validated and used in different tools or
computational models?
==> Separate logical structure versus presentation.
horizontal (does this system mesh with others)
vertical (works for different purposes)
extensibility
Flexibly evolve and
granularity
More or less specific,
useability -- documentation and tools and community of teaching.
There are no threads for this page.
Be the first to start a new thread.