|
Machine reusability of data
|
Sep 8 2009, 7:27 AM EDT |
added refs and links |
edit |
236 words added
27 words deleted
|
Change:
---Pixabaj, Telma Can (coordinator), Miguel Angel Vicente Méndez, María Vicente Méndez, and Oswaldo Ajcot Damián. Uspanteko text collection, in Text Collections in Four Mayan Languages, 2003-2007. OKMA (Oxlajuuj Keej Maya' Ajtz'iib'), Supported by Endangered Languages Documentation Programme (SOAS, University of London).---Schroeter, Ronald and Nicholas Thieberger.
View changes from previous version.
(Word count: 1360)
View all updates.
|
|
Existing Standards and Technologies
|
Aug 31 2009, 2:25 PM EDT |
|
edit |
|
Change:
There were only format changes (bold, italics, etc.) in this version. See this version for details.
(Word count: 1646)
View all updates.
|
|
Getting Involved in ISO Standards Development
|
Aug 31 2009, 2:23 PM EDT |
Moved from: WG2: Big ideas |
move |
No content added or deleted. |
Change:
Moved by Aug 31 2009, 2:23 PM EDT
View all updates.
|
|
Existing Standards and Technologies
|
Aug 31 2009, 2:23 PM EDT |
|
edit |
11 words added
|
Change:
See also information re: participating in the development of ISO standardsExisting annotation standards and resourcesThis section lists the various annotation conventions and other resources for developing and discussing annotation standards that were suggested by participants of Cyberling09. It was copied from the working group 1 page on
View changes from previous version.
(Word count: 1646)
View all updates.
|
|
Machine reusability of data
|
Aug 31 2009, 2:20 PM EDT |
|
edit |
294 words added
10 words deleted
|
Change:
Most often when we encounter IGT -- as, in fact, in the table above -- the links between annotation tiers are conveyed through visual aspects
View changes from previous version.
(Word count: 1139)
View all updates.
|
|
Machine reusability of data
|
Aug 31 2009, 2:04 PM EDT |
|
edit |
200 words added
|
Change:
The table above shows two tiers of annotation for the clause shown above the table. The 'MORPHEME' tier shows a segmentation of words into their component morphemes. The 'GLOSS' tier shows a morpheme-by-morpheme gloss of the clause, including both gloss labels for non-stem morphemes (e.g. NEG for kita')
View changes from previous version.
(Word count: 859)
View all updates.
|
|
Machine reusability of data
|
Aug 31 2009, 12:56 PM EDT |
|
edit |
460 words added
70 words deleted
|
Change:
source of the change, how the change should be manifested in the annotation (in other words, what did the previous analysis look like? what does the new analysis look like?), the date and time at which the decision to change the analysis was made, and whether or not the types
View changes from previous version.
(Word count: 648)
View all updates.
|
|
Machine reusability of data
|
Aug 31 2009, 11:14 AM EDT |
|
edit |
186 words added
|
Change:
. Roughly put, the machine learns generalizations over observed data and uses those to predict labels (or structures, etc.) for previously-unseen data. In order for it generalize well, the collection of data must be as internally-consistent as possible in the way that it is coded/labeled. 2. study: interlinear glossed text
View changes from previous version.
(Word count: 242)
View all updates.
|
|
Existing Standards and Technologies
|
Aug 31 2009, 10:03 AM EDT |
|
edit |
4 words deleted
|
Change:
(or at least common) practicesExisting annotation standards and resourcesThis section lists the various annotation conventions and other resources for developing and discussing annotation standards that were suggested by participants of Cyberling09. It was copied from the working group 1 page on 23 July 2009,
View changes from previous version.
(Word count: 1635)
View all updates.
|
|
Machine reusability of data
|
Aug 31 2009, 9:05 AM EDT |
|
edit |
27 words added
19 words deleted
|
Change:
heremay anywaybe arere-purposed: theas training data for statistical bigmachine learning approaches in computational ideas.linguistics (alexisand/or palmer,natural apalmer@coli.uni-sb.de)language processing.Computational linguistics, machine learning, statistical approaches need data...But machines are finicky about the types of data and representation they can work with.
View changes from previous version.
(Word count: 53)
View all updates.
|
|
Existing Standards and Technologies
|
Aug 31 2009, 9:02 AM EDT |
|
edit |
11 words added
|
Change:
subfield-specific best (or at least common) practicesExisting annotation standards and resourcesThis section lists the various annotation conventions and other resources for developing and discussing annotation standards that were suggested by participants of Cyberling09. It was copied from the working group 1 page on 23 July
View changes from previous version.
(Word count: 1639)
View all updates.
|
|
Genealogical classification with AutoTyp
|
Aug 31 2009, 8:54 AM EDT |
added one link |
edit |
|
Change:
There were only format changes (bold, italics, etc.) in this version. See this version for details.
(Word count: 377)
View all updates.
|
|
Group 2: Standards for Storage, Retrieval, and Search of Data
|
Aug 31 2009, 8:49 AM EDT |
|
edit |
|
Change:
There were only format changes (bold, italics, etc.) in this version. See this version for details.
(Word count: 388)
View all updates.
|
|
WG2: Subfield-specific practices
|
Aug 31 2009, 8:23 AM EDT |
|
edit |
4 words added
30 words deleted
|
Change:
and its offspring, in two forms: 1. Expansion of existing seed lists 2. Creation of new seed lists Contributions are currently being coordinated by Alexis Palmer (apalmer@coli.uni-sb.de). Feel free to contact me if you have questions or need any assistance with the wiki interface.
View changes from previous version.
(Word count: 129)
View all updates.
|
|
WG2: Case Studies
|
Aug 31 2009, 8:20 AM EDT |
|
edit |
26 words deleted
|
Change:
CONTRIBUTE!We welcome the contribution of additional case studies regarding standards for data storage, search, and retrieval (and other relevant topics... there are many). Contributions are currently being coordinated by Alexis Palmer (apalmer@coli.uni-sb.de). Feel free to contact me if you have questions or need any assistance with the wiki interface.
View changes from previous version.
(Word count: 144)
View all updates.
|
|
Machine reusability of data
|
Aug 28 2009, 11:02 AM EDT |
|
edit |
20 words added
|
Change:
SIGH... this page of mine is still heavily under construction. But here anyway are the big ideas. (alexis palmer, apalmer@coli.uni-sb.de)Computational linguistics, machine learning, statistical approaches need data...But machines are finicky about the types of data and representation they can work with.
View changes from previous version.
(Word count: 47)
View all updates.
|
|
WG2: Resources
|
Aug 28 2009, 10:58 AM EDT |
|
edit |
6 words added
|
Change:
PLEASE CONTRIBUTE relevant links and resourcesEncoding and Annotation Links:AutotypISO Script Codes (ISO 15924): http://www.unicode.org/iso15924/OLACTypeCraft Unicode: http://www.unicode.org/charts/ (Unicode codecharts)http://scripts.sil.org/IPAhome (Unicode-enabled fonts with IPA) http://people.w3.org/rishida/scripts/pickers/ (Unicode character pickers)http://scripts.sil.org/UniIPAKeyboard (Unicode Keyboard)http://www.unicode.org/notes/tn19/ (Recommendations on the development of new orthographies)XSL-transformations (EXtensible Stylesheet Language)Data
View changes from previous version.
(Word count: 200)
View all updates.
|
|
WG2: Resources
|
Aug 28 2009, 10:57 AM EDT |
|
edit |
18 words added
6 words deleted
|
Change:
Server, 2007 (http://sharepoint.microsoft.com/Pages/Default.aspx)Infrastructure for Long-Term Archiving:eSciDoc collaborative eResearch infrastructureFedora Commons Repository SoftwareSharing Links:WS/SOAP (standards for web services)Access Links:Wiki access control lists (for assignment of usage rights and privileges)Bibliography:DiPaolo, M. and Yaeger-Dror, M. (forthcoming) Best Practices in Sociophonetics. Cambridge UP
View changes from previous version.
(Word count: 194)
View all updates.
|
|
Group 2: Standards for Storage, Retrieval, and Search of Data
|
Aug 28 2009, 10:47 AM EDT |
|
edit |
12 words added
|
Change:
(for more on this, see the white paper from Group 4)Standards are great, now how do I use them?Adaptability and Fossilization of standardsIssues addressed within the WG2 wiki pages:Unicode character encoding standards for increasing stable display, readability, and sharing of dataRelational database storageWiki-based
View changes from previous version.
(Word count: 383)
View all updates.
|
|
Genealogical classification with AutoTyp
|
Aug 28 2009, 10:29 AM EDT |
Rename |
rename |
No content added or deleted. |
Change:
Renamed from Description of JN's slides by Aug 28 2009, 10:29 AM EDT for: Rename
View all updates.
|