Notes from WS3This is a featured page

begin: 20090718, 15:03

Working Session III

• MB: recap from WS2
• Linked in interim report 1
• Edited main page: decisions about relevance and organization
• Distribution of tasks
• SR: interim report 2 (stop by 5:15 to discuss)
• SC: take notes WS3
• Decided to have coordination discussion with working group 3: tools
• MB: Instead, have our conversation via the wiki’s top-level page on “Privacy and Ethics”
• Should we begin WS3 by working on “tools”?
• MB: Add to the Tools Working Group page, or approach with a thread first?
• SR/RW: Less invasive to begin as a thread

• Tools and Annotation (posted as a thread to Work Group 3: Tools, on 20090718 at 15:46, by MB)
• Tools for making tools for checking well-formedness of annotations
• SR: libraries, toolkits
• con: requires software engineering knowledge
• pro: non-standalone, provides anyone to roll your own
• e.g. NLTK toolkit
• Automation tools
• RW: translation tools, for converting non-identical standards for communication between users
• importing, exporting formats
• simple, dedicated purposes
• e.g. Scott Farrar’s Praat -> Unicode transcription tool
• ELAN
• Pros and cons
• MB: [+] can now call Praat, toggle video/audio
• RW: [+] allows you to verify tags, [-] but not align them with someone doing audio
• SC: [+] is open source, allows developers to extend
• RW: This should be part of the Annotation’s Desiderata

• <edit to Desiderata on main page>
• Interoperability: - open standards should have open sources

• GC: Question, I’m reading this as someone who missed the last WS, but I feel that the “Annotation Standards” notes on the main page greatly underestimates what I do!
• all together: This is the kind of fresh viewing we want to avoid conveying.
• what is on there now, that is severely not worded well:

... What is annotation?
In defining annotation, it is important to distinguish two conceptions of what annotation is.

1. Transcription: annotation constitutes (more or less) primary data for an analysis -- e.g. when the data are written texts where there is no recording of the original audio or visual signal (e.g., transcriptions of speech or texts that originated as written ”utterances“)
2. Tags: annotation provides search / entry points into analysis of primary (audio and/or video recording of the) data

* Is resistance, e.g., of Deaf community, to annotation that it is conceived of only as transcription that ”reduces the language to writing“?
... "

Fold above together gracefully with the below.

* Can a ”Standard“ be defined without reference to a set of uses and/or a community of users? We decided no, and will talk about this in the next section.
o What are the relevant dimensions of different uses to consider? and ...
o How do we define a ”community of users“? (see next topic)

* We can also talk about annotations that are derived annotations, such as taking the output of a parsing tool that was run on annotation POS tags and making that output a layer of annotations for the same database.
* We can also think of skilled formant ”correction“ as a kind of annotation.
* And responses from naive judges, elicited using Mechanical Turk or the like. ”

• MB: On the one hand, annotation is transcription just as the data is (#1 above). On the other hand, we need to make the distinction that annotation is not equal to a writing system such as something developed thousands of years ago. (#2)
• Why (#2) is necessary:
• First, the kinds of resistance from native communities, such as the Deaf community, to something that is immediately perceived as a written system. The convention of tags aims to capture the spatial signal that is visual
• Second, prosodic signals, for example, exist only to add information to the primary data. So they require an implementation for tagging the primary signal
• Consensus:
• 2 components provide information
• 1st is the input (the raw, primary data)
• 2nd is the information that adds the analysis, the tags, the transcription (the annotation)
• So if we’re all in agreement about what annotation is, what is problematic with the wording above such that a first-pass reader would make such off interpretations?
• SR: Is it that it looks like it’s saying transcription and tagging are one and the same, rather than significant aspects to annotation?
• GC: It’s that it looks like it undersells annotation. Many linguists already feel like they know what certain terms mean...
• MB: Let’s not use “transcription” without prefacing what kind of transcription we mean... Let’s not use transcription at all. Let’s go back to the three-way annotation that Chuck brought up yesterday.
• you can think of dissatisfaction with annotation as having multiple ways of saying it. Given the example “You had’ve seen it” instead of “You hadn’t have seen it”
• 1 - leaving it out altogether
• 2 - annotating it with an analysis that is wrong given what you’re presumed to be looking at
• 3 - locate the annotator, made a query about it. looked at standards, and had a way of extracting it.
• MB: I think annotation is good, if there’s a built-in way of making queries, modifications, backtracking, in order to recover the primary data using the conventions around the annotation
• This way, the goal is still to stay true to the primary data, not to stay true to the idea of the “standard”
• And I think that having annotative tags helps with this
• CF: Take the example “whose” vs. “who’s”, whose orthographic convention should predict “who’s” as the correct
• This suggests that the orthography (if we consider it an annotation) is not true to its own conventions
• MB: Let’s not try to make a distinction, then, but clean up our writing

• ... What is annotation?
• P1: ...“added conventionalized representation”... “to primary linguistic data”... what is linguistic data? Then... what is “linguistic data”?
• ex1: orthography
• ex2: segmental transcription
• ex3: tagging of named entities
• ex4: parsing
• RW: present these examples just as bullets for now?
• MB: as prose, pack it in
• SC: as prose, with clear examples of one source of primary data annotated with each example?
• RW: Bow et al.’s 2002 paper with many different examples of interlinear glosses
• SC: Does this flow of ideas mean we want to collapse “What is annotation?” with “What is annotation for?”
• P2 ...examples show need for reference of set of analyses and community of analysts who would provide the annotation conventions

• GC: annotation may provide value
• 1- that saves expended work, so that I can just proceed with the analysis I was going to make anyway
• 2 - that otherwise wouldn’t be available at all, because I have no access to the language otherwise (e.g. morphosyntactic analysis of some remote language).
• MB: I think this goes back to the purpose of the distinction I was trying to make before. It goes back to the meaning of what it means to make a “transcription”
• a cogent example, when you supply, as best you can, an alphabetic transcription of a two-year-old’s speech. You have to make decisions about the steps of transcription you make
• For our purposes, we need to get out of what the annotator is doing--2 kinds of information that are in conflict with each other
• 1 - Would this lingual obstruent be conceived as a stop for the adult phonemic target? (by the native speaker transcriber)
• 2 - What does the phonetician hear as closest to the adult phonetic space (by the native speaker phonetician)
• SR: I think MB’s point is, “annotation is not what you think it is”, and I think GC’s point is “annotation adds value that wouldn’t be there otherwise”
• SR/SC: Annotation (standards) can really help with bringing about the culture change for valuing crosslinguistic/crosslingual domains of interest using understood platforms of convention. SR: different domains can talk to each other. SC: different languages within the same domains, even, which doesn’t yet obtain.
• MB: So this leads to an expanded section on “What is annotation standards?”
• SR: how about 2 sections:
• (1) What is annotation and what is it good for?
• (2) What are annotation standards and what are they good for?

• What to say for Interim Report 2:
• Invite others to add to bibliography
• Recruits others into community
• addresses some of M.Liberman’s issues
• Clean up front end
• Walk through new wiki sections




end: 20090718, 17:26


ashraGurnch
ashraGurnch
Latest page update: made by ashraGurnch , Jul 19 2009, 3:12 AM EDT (about this update About This Update ashraGurnch Edited by ashraGurnch

2 words added
1 word deleted

view changes

- complete history)
Keyword tags: None
More Info: links to this page
There are no threads for this page.  Be the first to start a new thread.