Group 3: Tools (existing and future "killer apps")This is a featured page

The Tools group is charged with identifying and documenting existing and needed tools which will be the face of the cyberinfrastructure for ordinary working linguists. These tools include both those used by data creators (e.g., linguists annotating data that they later share) and data consumers (e.g., linguists using the annotated data of others to create new kinds of data).


Existing Tools

  • TypeCraft Collaborative text annotation
  • WALS The World Atlas of Language Structures Online
  • ODIN Online Database of INterlinear glossed text
  • TextGrid "TextGrid aims to create a community grid for the collaborative editing, annotation, analysis and publication of specialist texts. It thus forms a cornerstone in the emerging e-Humanities."
  • Natural Language Toolkit (NLTK) "Open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux."
  • eHumanities Desktop (project is in alpha development stage, no description available yet)
  • Roma: TEI validation tool "These pages will help you design your own TEI validator, as a DTD, RELAXNG or W3C Schema."
  • Chorus is a version control system designed to enable workflows appropriate for typical language development teams who are geographically distributed. Chorus is a Palaso Project.
  • e-Linguistics: building a cyberinfrastructure for linguistics (including a Python toolkit for data migration; documentation is still being posted)
  • Consistent Document Engineering Toolkit
  • Thai language specific tools

General purpose tools in use by linguists:
  • R Project for statistical computing and the linguistics packages in EMU.
  • Praat doing phonetics by computer
  • Python
  • ANVIL video annotation research tool

Needed Tools
  • A FOS aligner tool (or aligner development tool) at a grain finer than the intervals marked off fairly automatically in LDC's transcriber tool
  • ??

What's a killer application?

Google Maps may be a good example for a killer app:
  • it's killer in the way it brought mapping data to everyone.
  • it actually killed, e.g. gml - at least gml's hope for mass adoption.
  • it didn't piggyback on a standard, but set one: kml - and it turned out, creating xml files isn't that much of a problem, if you want it badly enough.
But
  • can there be someting like micro-killer-apps?
  • can there be something like scientific killer apps? doesn't "scientific" mean "too small to be killer"?
Following the Google Maps example a killer app would help pull data out of the drawers. This might happen in two ways:
  1. Make publishing data easier or
  2. provide big enough incentives to submit to tedious publishing.

What could killer apps for linguistics look like?

  • search engines? or the semantic web (see this blog post for an idea of what this could mean)?
  • data visualization?
  • can "archiving" or "longterm preservation" be a killer app? (Does not sound like it - does it.)sss
  • is reproducible research enough of an incentive to publish data?
may the killer app be something social/political - like a new model for scientific recognition on the web? and if so, what can we do to bring it about? Foster skills?

Killer applications are applications that are used lots and lots

Therefore a good question might be: Who are the linguists interested in finding and/or producing reusable data?

* computational linguists
(yes)
* corpus linguists
(yes)
* typologists
* encyclopedic works like the SIL guides are already interesting and useful, and there is growing interest in sharing data and linguistic ontology
* descriptive linguists
(as above)
* theoretical linguists
* much theoretical work does not currently use reuseable (or computationally-accessible) data, with some exceptions [e.g. the LFG and HPSG communities].

But data for the computational linguist is probably not quite the same as data for the typologist (or the theoretical linguist).

Likewise a killer app for a computational linguist is probably something very different from an application that a
descriptive linguist, engaged into field work, would care to call a useful tool. Theoretical linguists might be interested in searching for data along yet another set of dimensions. Finally the generation of reusable resources, if considered important at all, must pay off academically to attract more than the occasional linguist. Perhaps we can conclude from this that we rather need a cluster of tools than this one application - together they might be a killer. :)

So following the definition above ("killer apps are apps that are used a lot"), we can probably assume that future killer apps will be on the web.

There seem to be two concerns here:
  1. what does 'data' look like for each field, and can we share specifications?
  2. what does an 'application' look like for linguists of various stripes?
These concerns need not pit "computational" linguists against other types of linguist.

What does 'data' look like for each field, and can we share specifications?

As any science linguistics is based on data, yet the form this data takes and the role it plays crucially depends on the way we perceive of language and the particular approach chosen in investigating its nature. Does that then mean that there is no such thing as " the empirical base of our field". Not necessarily; it only means that this base must consist of a multitude of different types of linguistic data. If so, free access to and reusability of this data might be a commodity that is found useful by most of us.
Let's assume we could agree on that point, what exactly does that mean for future linguistic tools? All seems to come back to the same point, namely that we are chasing a ghost by looking for that one killer app; instead what we most likely need are several different tools, able to cater to the multitude of needs that define the linguistic field as a whole.

Desirable Characteristics of Apps

  • No dead ends for data: While some apps (e.g. filemaker) may be "killer" in how they help organizing data, they also make reusing the data hard.



No user avatar
robert_forkel
Latest page update: made by robert_forkel , Jul 19 2009, 5:21 PM EDT (about this update About This Update robert_forkel Edited by robert_forkel

4 words added
55 words deleted

view changes

- complete history)
Keyword tags: Thai language
More Info: links to this page
Started By Thread Subject Replies Last Post
mebeckman some other things we'd like you to consider adding to your page ... 1 Jul 19 2009, 11:46 PM EDT by bill_byrne
Thread started: Jul 18 2009, 6:37 PM EDT  Watch
sections on needed tools and existing tools, from the working group on annotation standards:

Open source toolkits for making tools for checking well-formedness of annotations (e.g., CDET at http://www.icsi.berkeley.edu/~jan/projects/CDET/).
Open source tools for automating some aspects of some types of annotation, such as ASR-based alignment of segmental transcription.
Open source toolkits for building translations between annotations and importing from data in other formats (e.g., NLTK shoebox/toolbox module; Scott Farrar's tool for translating Praat IPA into Unicode IPA).
Tools for aligning time stamps for annotations of the audio built in, say, Praat, with the time stamps for annotations of the video built in, say, ELAN.
Do you find this valuable?    
Keyword tags: None
Show Last Reply
JeremyKahn our presentation? 0 Jul 18 2009, 12:11 PM EDT by JeremyKahn
Thread started: Jul 18 2009, 12:11 PM EDT  Watch
issues we might want to mention:
at top level, three kinds of tools needed: (a) data-collection/creation, (b) data distribution, and (c) data analysis
most linguistics communities tend to (at best) go from (a) to (c) directly

we suggested that the 'killer app' might be one that helps us with (b)
goal there is better if individual work with the distribution tool helps you with (a) and (c)
thus the Flickr-for-linguists ideas
Do you find this valuable?    
Keyword tags: None
mebeckman some applications to add? 5 Jul 14 2009, 3:18 AM EDT by robert_forkel
Thread started: Jul 12 2009, 7:48 PM EDT  Watch
Here are some things that I would think of as killer apps in my corner of language:
R (http://www.r-project.org/) and the sub-communities and packages related to linguistics research that it has enabled, such as Emu (http://emu.sourceforge.net/), as well as new approaches to teaching statistics and numerical reasoning to linguists (see, e.g., http://www.ling.uni-potsdam.de/~vasishth/SFLS.html and http://www.ualberta.ca/~baayen/#statistics)
WebExp (http://www.webexp.info/) and related applications facilitated by languages such as Java, Python, etc.
Praat (praat.org) for the way that it is free and very easy for even high school students to learn how to use


Do you find this valuable?    
Keyword tags: None
Show Last Reply
Showing 3 of 4 threads for this page - view all