Working Group 1: Annotation Standards
Mary Beckman (co-chair)
wiki username: mebeckman
website: http://ling.osu.edu/~mbeckman
email: mbeckman@ling.osu.edu
I am a linguist who has worked (among many other things) on developing a framework for building annotation conventions for prosodic categories in specific language varieties (see http://ling.osu.edu/~tobi) and on developing a database of child and adult productions of word forms that target lingual obstruents (see http://ling.osu.edu/~edwards). Some of my current funded work is described at http://www.ling.ohio-state.edu/~edwards/socialDynamics/NSFhighlight.html and I am interested (among other things) in developing the infrastructure for facilitating this kind of collaborative approach that incorporates corpus-based work, experimental work, and computational modeling.
Stuart Robinson (co-chair)
wiki username: stuartrobinson
website: https://www.zapata.org/stuart
email: stuart@zapata.org
Sarah Churngwiki username: ashragurnch
website:
http://students.washington.edu/ashra/email:
ashra@u.washington.eduI am a graduate student at the University of Washington. My main goal at the Cyberling workshop is to foster discussion for how cyberlinguistic infrastructure may facilitate sign language transcription and annotation, as part of the Annotation Standards working group, through best practice efforts, dissemination through the web, etc. I hope to compound these discussions with current research in deaf literacy, in which the first language acquisition of sign languages is hypothesized to bootstrap spoken language literacy.
Greville Corbett
wiki username:
website: http://www.surrey.ac.uk/LIS/SMG/gcorbett.htm
email:
I work at the University of Surrey, where I lead the Surrey Morphology Group. I have worked particularly on the typology of features, as in Gender (1991), Number (2000) and Agreement (2006), which makes me very positive towards the Leipzig Glossing Rules, while aware of what still needs to be done there. The SMG has produced several typological databases, freely available over the web (http://www.surrey.ac.uk/LIS/SMG/web_resources.htm ), so I have a particular interest in how we ensure that the research and effort involved in setting up such resources is respected and acknowledged. I collaborated with Marina Chumakina, Dunstan Brown and Harley Quilliam on the Archi Dictionary (http://www.smg.surrey.ac.uk/archi/linguists/index.aspx) , with all the issues of scripts, sound files, picture files, web access and so on.
Chuck Fillmore
wiki username: cjfillmore
website:
email: fillmore@icsi.berkeley.edu
A long-retired member of the Berkeley linguistics department. Associated since retirement with the "FrameNet" project, producing a kind of valency dictionary whose categories are based on knowledge of the
semantic frames that underlie individual words' meanings (http://framenet.icsi.berkeley.edu). The analyses are based on annotations of sentences extracted from a large corpus of written English. Currently also trying to use the same methods to build a registry of "
minority grammatical constructions" (the ones that ordinary parsers can't handle, or that familiar compositional principles can't interpret). I've been trying with some student colleagues to develop an abbreviated way to annotate phrases in respect to how they instantiate individual constructions. Each construct annotation has to be keyed to a full description of the construction, in the way that each lexical annotation has to be keyed to a full description of the relevant frame. Annotation of all the words or all the constructions in individual sentences requires layered stand-off annotation sets.
I lacked the patience to complete the complicated interview that would have allowed me to insert a photo - I could have tried to find out what my zodiac sign is, but I don't have a good photo anyway: I'm tallish, red-haired, slow-moving, walk with a cane.
Richard Wright
wiki username: rawright@u.washington.edu
website: http://depts.washington.edu/phonlab/people/wright.html
email: rawright@u.washington.edu
I am an associate professor of linguistics and the
Phonetics Lab director at the University of Washington. I specialize in phonetic research and field lingiustics primarily in the acoustic domain. My interest in the Cyberling workshop stems from my experience developing and working with corpora of spoken language (in part with
Alicia Wassink), in working on voice-based human machine interfaces ("
The Vocal Joystick" ), and in working with
Steve Moran) to develop a typological database of phonological inventories tied to ontological models of phonological feature theories that expands on previous work such as UPSID and the Stanford Phonology Archive
(PHOIBLE). I am particularly interested in methods for representing the sounds of languages in ways that are machine readable and standardized across languages and across applications.
Working Group 2: Other Standards
Johanna Nichols (co-chair)
wiki username: Johanna.Nichols
website: http://linguistics.berkeley.edu/~ingush/, http://socrates.berkeley.edu/~jbn/
email: johanna@berkeley.edu
I am Professor emeritus in Slavic linguistics at UC Berkeley. I co-founded and co-direct (with Balthasar Bickel; see below) the Autotyp databases and research project (http://uni-leipzig.de/~autotyp/), which will probably have its complete genealogical classification on-line by the time the Cyberling workshop begins. I work on questions of phylogeny, detecting and demonstrating linguistic relatedness, deep linguistic prehistory, and typology, and I have a documentation project creating very large corpora of spoken Ingush and Chechen (East Caucasian). All of these projects require combining data from different fields (linguistics, archaeology, ethnography, human genetics). I am concerned with seeing standards and tools developed that will not require every documentary linguist to start from scratch electronically, and that will serve linguists and language users rather than vice versa. I have developed all-lower-ascii practical writing systems for Ingush and Chechen and a similar system for my East Caucasian etymological database and am distressed at what I perceive as increasing pressure to put corpora, syntactic examples, dictionary headwords, etc. into unreadable phonetic transcription just because of font availability. For my Ingush corpus I'm working out ways of interlinearizing and lemmatizing in languages where lemmas, inflectional categories, etc. are properties of clauses rather than words. I'm also concerned with the lack of instructions, documentation, labels, etc., etc. in languages readable by speakers of minority languages in Russia.
Alexis Palmer, Saarland University (CoLi) and the University of Texas at Austin (co-chair)
wiki username:
alexispalmer website:
http://comp.ling.utexas.edu/apalmeremail:
apalmer@coli.uni-sb.deIn May 2009 I began a new position as a postdoc in the
M2CI Cluster of Excellence at Saarland University in Saarbrücken, Germany. There I am working with Caroline Sporleder in a research group on
Computational Modelling of Discourse and Semantics. I'm also just finishing my PhD in
computational linguistics at the University of Texas at Austin, under the supervision of Jason Baldridge and Katrin Erk. My thesis research, which is connected with the
EARL project, has to do with integrating automatic labeling and human annotation for more efficient production of interlinear glossed text (IGT). In general, I'm interested in the potential for applying techniques and methodologies from computational linguistics (CL) for research in other linguistic subfields, and particularly for documentation of endangered languages. Bridging the space between CL and the rest of linguistics raises a host of issues -- data formatting and availability, code reusability and availability, modularity and generalizibility of computational models, etc. -- related to the aims and core ideas of Cyberling 2009.
Debbie Anderson
wiki username: DeborahAnderson
website:
http://linguistics.berkeley.edu/sei/email:
dwanders@sonic.netI run a project at UC Berkeley, the Script Encoding Initiative, that assists groups (and individuals) in getting eligible scripts and characters into the Unicode Standard/ISO 10646, the international character code standard. I am particularly keen on being sure linguists and members of the user communities (especially minority language speakers) get a voice in the development of standards. I am the UC Berkeley representative to the Unicode Consortium, and a member of the US delegation to ISO/IEC JTC 1 SC2 Working Group 2 on coded character sets.
Eric Kansa
wiki username: ekansa
website:
http://isd.ischool.berkely.eduemail: ekansa@ischool.berkeley.edu
Eric C. Kansa is Executive Director of the Information and Service Design Program and is an Adjunct Professor at the UC Berkley School of Information (I School). His primary role is to develop service design projects that bring I School students and faculty to work in collaboration with partner organizations. His research interests include efforts to enhance the accessibility and usability of research data collected in the field sciences, as well as, the impact of ubiquitous information accessibility in the consumer experience of services. Before coming to UC Berkeley, Eric was cofounder and former Executive Director, a nonprofit organization, the Alexandria Archive Institute. There he led development of
Open Context, an online system for publishing primary research data collected in the field sciences. This follows a position on the faculty of Harvard University, where he served as Lecturer and Undergraduate Tutor for the Department of Anthropology. He graduated from the University of California, San Diego with a BA in Cultural Anthropology. Eric was awarded a doctorate in Anthropology at Harvard University in 2001. Eric is currently Convener of the Society for American Archaeology's Digital Data Interest Group.
Pavel Mihaylov
wiki username: pavel.mihaylov
email: bin,at,bash,dot,info
I am a computational linguist working for
Ontotext, a mixed industry/research company based in Sofia, Bulgaria. My main occupation is web mining/information extraction and finite-state morphologies. Together with Dorothee Beermann, I work on TypeCraft (see Dorothee's bio). Other than the computational bit, I have a general interest in linguistics and languages.
Alicia Wassink
wiki username: AliciaBW
website: http://faculty.washington.edu/wassink/
email: wassink@u.washington.edu
I work in acoustic phonetics, sociolinguistics and creole linguistics. In the Cyberling workshop, I'm wearing my sociophonetician hat. I am currently co-authoring a chapter on best practices in sociophonetics pertaining to the instrumental analysis of vowels for addressing research questions of interest to sociolinguists. This chapter is about setting standards vis-a-vis best practices in data analysis, and isn't so much about data storage, retrieval or archiving, which are all topics I'm interested in discussing as part of this working group. As director of the
Sociolinguistics Laboratory at the University of Washington, Dept. of Linguistics, I've run metadata tutorials to train my students to use established protocols for associating metadata with their audiofiles that will increase data accessibility, searchability, acoustic analysis and ease of querying.
Working Group 3: Tools
Bill Byrne (co-chair)
wiki username:
bill_byrne website:
About meemail: billb@google.com
I design speech user interfaces at Google (e.g.
iPhone app) and have been in this field for the last ten years. Having completed a linguistics PhD in 1998, I continue to follow theoretical and applied work but I have always been troubled by the lack of data available to researchers. I see Cyberling as an interesting and encouraging activity for the field. I would love to help develop more ways for linguists in all subdisciplines to easily gain access to very large sets of data as well as share their own data with the rest of the world.
Robert Forkel (co-chair)
wiki username:
robert_forkel website:
https://dev.livingreviews.org/projects/epubtk/wiki/people/robertemail: forkel@mpdl.mpg.de
I am a mathematician-turned-software developer working at the
Max Planck Digital Library. My interest for web infrastructure for linguistics started while working on
WALS Online. Having a couple more projects coming in, I'm interested in ways to publish linguistic data as part of the
Linked Open Data Cloud. The data I'm concerned with mainly is word lists, interlinear glosses, etc.
What I hope to take away from cyberling is a clearer idea about the lowest level of quality/granularity which would make sharing data still fruitful - I'm looking for low-hanging fruit.
Dorothee Beermann
wiki username:
DorotheeBeermann website:
http://www.hf.ntnu.dorothee.beermannemail:
dorothee.beermann@hf.ntnu.noI am associate professor at the Norwegian University of Science and Technology (NTNU) in Trondheim, Norway.
My interest in Cyberlinguistics results from working across linguistic fields, from grammar engineering on the one hand to African linguistics on the other. Perhaps in particular when working across frameworks one would like to know what defines linguistics as a whole. Can we for example find a common answer to the question: 'What defines linguistic methodology?'?
I am in particular interested in the role that interlinear glosses (IGs) play in linguistic research. Not so happy with the role IGs play at present, I would like to help facilitate a development where they become an independent linguistic resource - accessible to all of us.
Together with Pavel Mihaylov I have created a linguistic tool that helps to generate, store and retrieve IGs in a setting that allows sharing them with a group of colleagues or to publish them online. (
TypeCraft).
Arienne Dwyer
wiki username:
website:
email:
[insert bio here]
Florian Jaeger, University of Rochester
wiki username:
FlorianJaeger website:
http://www.hlp.rochester.edu [nevermind the security warning; trust me]
email:
fjaeger@bcs.rochester.eduI received my M.A. in Linguistics and Computer Science (HU & TU Berlin, with a visit to UC Berkeley) and my PhD in Linguistics with a designation in cognitive science (Stanford University with a visit to MIT). I have been at Rochester (Brain and Cognitive Sciences and Computer Science) since 01/2007, where I working on efficient language production, maintenance of probabilistic linguistic representations, and other such stuff. My involvement in Cyberling 2009 is related to my interests in replicability and extensibility of scientific work. This implies development of tools and annotation standards that make the data sets developed by one researchers useful to others. I am also interested in cheap `technology' with possible high impact factor, such as taking laptops to the field to run psycholinguistic studies, or the use of online platforms like Mechanical Turk to elicit large amounts of data from many languages at low cost. This often results in unbalanced, highly clustered data (similar to corpus data) for which modern statistical methods are required (in which I am also interested ... conveniently).
Jeremy G. Kahn
wiki username: JeremyKahn
website:
email: jgk@washington.edu
I am a Ph.D. student in Linguistics at the University of Washington. I work as a Research Assistant to Mari Ostendorf in the Signal, Speech and Language Interpretation laboratory within UW Electrical Engineering, and I am currently a Visiting Fellow at the SRI Speech Technology and Research laboratory in Menlo Park. My primary research area is in using syntactic information to support a speech-recognition and machine-translation pipeline. I am interested in data annotation both on principle and as a matter of necessity; heavily statistical research software systems have tremendous issues with data compatibility, portability, and reconciliation.
Virach Sornlertlamvanich
wiki username: virach
website:
http://www.tcllab.org/virachemail:
virach@tcllab.orgI am the Assistant Executive Director of National Electronics and Computer Technology Center (
NECTEC), and the Co-director of Thai Computational Linguistics Laboratory (
TCL). My research interests lie in the computational linguistics that covers morphological, syntactic and semantic representation and analysis. My previous works have been provided in publications and implementations such as
LEXiTRON (English-Thai dictionary),
Asian WordNet,
Kui (Knowledge Unifying Initiator: online collaboration editing tool), word segmentation (
SWATH),
ORCHID (Thai POS tagged corpus),
ParSit (English-Thai online machine translation),
Sansarn (Thai search engine portal), etc. Currently, I am working as a chair of Asian Language Resource group of AFNLP (Asian Federation of NLP), director member of AAMT (Asia-Pacific Association for Machine Translation). I am also conducting a series of school of
ADD (Asian Applied NLP for linguistics diversity and language resource Development) for NLP networking and collaboration in language resource development especially for Asian countries.
Working Group 4: Data Reliability & Provenance
Peter Austin (co-chair)
wiki username: pkaustin
website: http://www.hrelp.org/aboutus/staff/index.php?cd=pa
email: pa2@soas.ac.uk
I am Marit Rausing Chair in Field Linguistics at the School of Oriental and African Studies in London and Director of the Endangered Languages Academic Programme. My research interests lie in the theory and practice of language documentation and description, endangered languages, morphosyntactic typology, Lexical Functional Grammar, and languages of eastern Indonesia and Aboriginal Australia. At SOAS I teach a course on
Technology and Language Documentation that covers data modeling, workflow, metadata, archiving, ethics and protocols, and software tools, and have participated in training workshops that cover these topics.
Martin Haspelmath, Max Planck Institute for Evolutionary Anthropology
wiki username: haspelmath
website: http://www.eva.mpg.de/lingua/staff/haspelmath/home.php
email: haspelmath@eva.mpg.de
I am a typologist interested in linking structural data from as many different languages as possible (typological databases). As of 2008, the World Atlas of Language Structures (http://wals.info) has been online, and I'm interested in getting more such datasets out. I hope that linguists will soon publish their dictionaries and corpora online, and that it will become normal to see such online resources as regular (peer-reviewed) publications, because without the incentive of regular publication I fear linguists will not share their materials.
Kurt Bollacker
wiki username:
website:
email:
[insert bio here]
Tracy Holloway King
wiki username: tracyhollowayking
website:
http://www-csli.stanford.edu/~thking/email:
tracyhollowayking@gmail.comI am on the LSA TAC committee (with a number of people on this list). I currently manage the natural language engineering groups at Powerset, a semantic search company acquired in 2008 by Microsoft. I have been a co-organizer for the first three grammar engineering across frameworks (GEAF) workshops. I am particularly interested in making sure that resources, platforms, and theories created can be used cross-linguistically.
Koenraad de Smedt
wiki username: Koenraad
website:
http://ling.uib.no/desmedt/email:
desmedt@uib.noI am professor of Computational Linguistics at the University of Bergen, Norway and head of the
Research Group on Language Models and Resources (LaMoRe). Lately I have been working on parsebanking. I am currently the national contact person for CLARIN in Norway and a member of the Science Opportunities Panel of the
eVITA Programme Committee (Research Council of Norway). I am also coordinator of the
CLARA, a Marie Curie ITN which will start up in the fall of 2009 (see
Organizations and Initiatives). In 2007 I organized a
Workshop on Unified Linguistic Annotation.
Paul Trilsbeek
wiki username:
paul.trilsbeek website:
http://www.mpi.nl/people/trilsbeek-paulemail: Paul.Trilsbeek@mpi.nl
I am an archive manager for the language archive at the
Max Planck Institute for Psycholinguistics, part of which is the
DOBES archive of endangered languages. At MPI we also develop an array of
linguistic tools and a framework for digital archiving of language resources. In the European
CLARIN project, which aims at creating a common infrastructure for language resources and technology, the MPI plays an important role in the technical work package (
WP2).
Working Group 5: Models from Other Fields
Scott Farrar (co-chair)
wiki username: sofarrar
website: http://faculty.washington.edu/farrar/
email: farrar@u.washington.edu
I am an Assistant Professor of Linguistics at the University of Washington, teaching computational linguistics in the Professional Master's in Computational Linguistics Program at the University of Washington. My primary interest is in computational linguistics with a focus on
e-linguistics, or how to apply computational techniques in traditional linguistics research. I received my PhD in Linguistics from the University of Arizona in 2003. Before joining the CLMA Program, I worked at the University of Bremen in Germany and in Cameroon on a fieldwork assignment researching endangered Beboid languages. I am currently funded by the National Science Foundation on grant BCS-0720670 entitled, "Implementing the GOLD Community of Practice: Laying the Foundations for a Linguistics Cyberinfrastructure."
Terry Langendoen (co-chair)
wiki username: TerryLangendoen
website:
http://linguistics.arizona.edu/~langendoen email:
langendt@email.arizona.eduI am Professor Emeritus of Linguistics at the University of Arizona, having retired from academia in 2005. From 2006 to 2008, I was a Program Director in Linguistics at the National Science Foundation, and for the past year have been working part-time as an Expert in the Robust Intelligence Program in the Division of Information and Intelligent Systems at NSF. I thank Nancy Ide for getting me interested in the problem of annotation of electronic linguistic data by inviting me to became part of the NEH-supported Text Encoding Initiative (TEI) in 1987. In that project, I worked with Gary Simons to develop recommendations for the encoding of linguistic structure in SGML, the predecessor to XML, including a general-purpose annotation format for feature structures. In 2001, I began work on the NSF-supported E-MELD project, and together with Scott Farrar and Will Lewis developed the initial specifications of the General Ontology for Linguistic Description (GOLD), with the idea of enabling linguists to annotate their data without being committed to specific markup syntax, as in TEI, and to enable data annotated in different ways to be computationally interoperable. At the January 2009 LSA Annual Meeting, Emily Bender and I organized a special session on Computational Linguistics in Support of Linguistic Theory; in our presentation we touched on several of the themes of this workshop. I discussed Cyberling 2009 briefly at the end of my contribution "Opportunities at NSF" in the special section entitled "
Keeing Science Moving in Tight Times" in the September 2009 issue of the Association for Psychological Science
Observer.
Balthasar Bickel
wiki username:
website:
email:
[insert bio here]
Steve Moran
wiki username: Steve_Moran
website:
http://staff.washington.edu/stiv/email: stiv@u.washington.edu
I am a PhD student in the Linguistics Department at the University of Washington. My research interests including language documentation and developing cyberinfrastructure for interoperability of linguistic data. As a field linguist I am active in Prof. Jeffrey Heath's Dogon languages project that is documenting the Dogon languages of Mali, and creating an online comparative lexicographic (and multimedia) website (
http://dogonlanguages.org/). With Richard Wright, I am also developing a typological database of phonological inventories that is tied to ontological models of phonological feature theories. We are making these resources available at our project website, PHOIBLE (
http://phoible.org/). Previously I worked for the Linguist List, specifically on the E-MELD project to foster the consensus of "best practice" standards for the digital archiving of endangered languages data.
Cornelius Puschmann, University of Düsseldorf
wiki username:
coffee001 website:
http://ynada.com/email:
cornelius.puschmann@uni-duesseldorf.deI am a postdoc at the
Department of English Language and Linguistics at the
University of Düsseldorf, Germany. My involvement in Cyberling 2009 stems from my role as the technical coordinator of
eLanguage, the
LSA's Open Access publishing platform. I am also a strong proponent for
Open Access and Open Data in linguistics and in other disciplines.
Dwight van Tuyl, Eastern Michigan University
wiki username: dvantuyl
website: http://linguistlist.org/people/dwight.html
email: dwight@linguistlist.org
I'm a programmer at the LINGUIST List at Eastern Michigan University. We've recently finished the GOLD Community website at http://linguistics-ontology.org which attempts to build a community around the General Ontology of Linguistic Description currently being developed by Scott Farrar of the University of Washington. At the LINGUIST List, we plan on using GOLD in our latest project, LEGO, for annotating lexical data with GOLD concept URI's. I'm hoping to come back from this workshop with an understanding of what tools andinterpolatablestandards could be used for projects like LEGO in order to provide a low barrier of entry for participating in a cyberinfrastructure for linguists.
Working Group 6: Funding Models
Mark Liberman (co-chair)
wiki username: MarkYLiberman
website:
http://ling.upenn.edu/~mylemail: MarkYLiberman@gmail.com
Professor (Linguistics, Computer and Information Sciences) at University of Pennsylvania; Director, Linguistic Data Consortium.
David Lightfoot (co-chair)
wiki username: DavidLightfoot
website:
email:
lightd@georgetown.eduDavid Lightfoot writes mainly on syntactic theory, language acquisition and historical change, which he views as intimately related. He argues that internal language change is contingent and fluky, takes place in a sequence of bursts, and is best viewed as the cumulative effect of changes in individual grammars, where a grammar is a "language organ" represented in a person's mind/brain and embodying his/her language faculty. That, in turn, entails a non-standard view of language acquisition as "cue-based." He has published eleven books, most recently The Development of Language (Blackwell, 1999), Syntactic Effects of Morphological Change (ed.) (Oxford UP, 2002), The Language Organ (with S.R. Anderson) (Cambridge UP, 2002), and How New Languages Emerge (Cambridge UP, 2006). He is also the author of more than 100 articles, book chapters and reviews. He is general editor for the Generative Syntax series published by Blackwell, and serves on the linguistics editorial board at Cambridge University Press. In 2004, he was elected a fellow of the American Association for the Advancement of Science, and in 2006, as a fellow of the Linguistic Society of America.
Dr. Lightfoot has held regular professorial appointments at several universities including McGill University, where he taught many undergraduates who went on to become major figures in linguistics and psychology including Mark Baltin, Alan Prince, Michael Rochemont, Alison Gopnik, Elan Dresher, Norbert Hornstein, Amy Weinberg, Renée Baillargeon and Elizabeth Cowper; the University of Utrecht in the Netherlands; and the University of Maryland, where he established and chaired for 12 years, a new department of linguistics with a unique focus--viewing linguistics as the study of the human language organ. He was also the associate director of the neuroscience and cognitive sciences program there. In 2001, he moved to Georgetown University as dean of the graduate school. In addition, he has held short-term appointments at universities in Austria, Brazil, Canada, Germany, Switzerland and the United Kingdom. In June 2005, he became assistant director of the National Science Foundation, heading the Directorate for Social, Behavioral and Economic Sciences.
Anthony Aristar
wiki username: aristar
website: http://linguistlist.org/aristar
email: aristar@linguistlist.org
I started life as a historical linguist and typologist, but was drawn, during the early Internet age (it was called Arpanet then!) to its potential as a medium for academic exchange. I founded the LINGUIST List in 1990 as a step in this direction, and the list grew so much that I and my co-Moderator Helen Aristar-Dry realized that we needed to rethink what we were and where LINGUIST could go. We started started applying for grants to build infrastructure for the discipline, and in 2004 Helen and I became co-Directors of the Institude for Language Information and Technology, housed at Eastern Michigan University. Our focus for the last few years has continued to be linguistic infrastructure, but we also now deal extensively with standards for linguistics on the Internet (e.g. our EMELD project) and work on digitizing endangered languages data. We have the following ongoing projects, funded by either NSF or NEH: MultiTree (http://multitree.linguistlist.org/) which is collecting all known hypotheses on language relationships, LLMAP (http://llmap.org) which is aimed at making GIS a fundamental part of linguistics, GOLDComm (http://linguistics-ontology.org/), which is aimed at expanding the GOLD ontology for linguistic description, LEGO (http://linguistlist.org/projects/lego.cfm) which has as its goal the development of several "building blocks" for lexical data interoperability within linguistics, and RELISH, a collaborative project with the Max Planck Institute for Psycholinguistics and The Johann Wolfgang Goethe-Universität Frankfurt, aimed a unifying two digital collections of endangered languages with special attention given to harmonizing the European and American standards for language documentation and lexicon building.
Collin Baker
wiki username: collinfb
website:
http://framenet.icsi.berkeley.eduemail: collinb@icsi.berkeley.edu
I am a linguist, working as manager of the FrameNet Project, founded and directed by Prof. Charles Fillmore, which is part of the AI group at the International Computer Science Institute in Berkeley. For the last decade, we have been building a rich lexical semantic database for English, based on frame semantic principles and grounded on manually annotated corpus examples of usage. We are currently participating in a joint annotation project for the American National Corpus (
http://americannationalcorpus.org), collaborating with colleagues building FrameNets for Spanish, German, Chinese, Japanese, etc., planning an alignment of FrameNet with WordNet, and exploring crowdsourcing as a means of gathering annotation data. I am interested in the problem of funding resource building, particularly long-term efforts.
Helen Dry
wiki username:
website:
email:
[insert bio here]
Laura Welcher
wiki username:
lbwelch website:
http://www.rosettaproject.orgemail:
laura@longnow.orgI direct The Rosetta Project at
The Long Now Foundation in San Francisco, and am one of the co-organizers of Cyberling 2009. My interest in cyberlinguistics originally developed out of my experience in linguistic fieldwork, using specialized tools like Shoebox/Toolbox, as well as trying to make general tools like Filemaker Pro work for lexicography. Both of these tasks quickly gave me the sense that better tools are needed for what linguists do! Besides language documentation, my work at The Rosetta Project has underscored the need for standards upon which to build tools. The Rosetta Project maintains an archival collection for all of the world's languages in multiple media formats. How does one search such an archive? How do experts interact with the content? How do users without any knowledge of language names, ISO codes, and language relationships find out information about the nearly 7,000 languages on the planet? These are some of the problems any project that claims to be "All Languages" must deal with. Our new archival structure is distributed and publicly interactive -- all languages and language relationships are available as open content in our
Rosetta Base in Freebase, all of the archived materials are in our
Rosetta Collection in the Internet Archive, and we are currently building a user-editable wiki interface on top of this (currently in alpha mode, so please ask me if you'd like a demo -- we also need a good name for it...Rosetta Panglossia?). A companion project to the digital archive is
The Rosetta Disk -- a microscopic version of the collection, built out of materials that can last for millennia -- this is one of the showpiece artifacts to get people engaged in long-term thinking, along with the Foundation's
10,000 Year Clock of the Long Now.
Working Group 7: Collaboration Structure
Brian MacWhinney (chair)
wiki username: macw
website: talkbank.org
email: macw@cmu.edu
Brian MacWhinney, Professor of Psychology, Computational Linguistics, and Modern Languages at Carnegie Mellon University, has developed a model of first and second language acquisition and processing called the Competition Model. He has also developed the CHILDES Project (childes.psy.cmu.edu) for the computational study of child language transcript data and the TalkBank (talkbank.org) system for the study of conversational interactions.
Emily M. Bender, University of Washington
wiki username:
EmilyMBender website:
http://faculty.washington.edu/ebender/email:
ebender@u.washington.eduI am one of the co-organizers of Cyberling 2009. My interest in cyberinfrastructure for linguistics stems from my work on grammar engineering for linguistic hypothesis testing. I see this as one example of computational methods in support of linguistic analysis: using computers to systematically work with larger data sets and manage greater complexity than we could do without computational aids. I am also very interested in the issue of culture change within the field of linguistics, i.e., how to create a culture in which data sharing and the validation of hypotheses against large datasets are expected and rewarded.
Nicoletta Calzolari
wiki username:
website:
email:
[insert bio here]
Nancy Ide
wiki username:
website:
email:
[insert bio here]
David Robinson
wiki username: drobinsonlsa
website:
http://www.lsadc.orgemail:
drobinson@lsadc.orgI am the Director of Membership and Meetings for the Linguistic Society of America. I will be attending Cyberling 2009 in order to assess how the LSA can best put the findings of this workshop at the disposal of LSA members as well as the profession at large, and what role the LSA can play in the development of a cyberinfrastructure for the profession.
Other Collaborators
Jeff Good
wiki username:
jcgood website:
http://buffalo.edu/~jcgood/email:
jcgood@buffalo.eduI am one of the co-organizers of the Cyberling workshop. (Unfortunately, however, I will not be able to attend most of it.) I am interested in how cyberlinguistic infrastructure can facilitate work in language description, typology, and comparative and historical linguistics.
Dan McCloy, University of Washington
wiki username:
danmccloy email:
drmccloy@u.washington.eduI am a graduate student in Linguistics at the
University of Washington. My research is primarily in formal semantics. As one of the organizers of Cyberling 2009, I am the primary point-of-contact for most inquiries about workshop logistics.
Tandy Warnow
wiki username:
website:
email:
[insert bio here]