|
Aug 15 2009, 3:48 PM EDT
|
|
|
edit |
69 words added
88 words deleted
|
|
Change: that match initial metadata querybut would like to beof abledata toand dothen thissearching evenand iffiltering don'tmight space,not ifbe cannotthe downloadoptimal sincesolution. isFurthermore, restricteddownloading todata notshould be downloaded, if don't have correctavoided
View changes from previous version.
(Word count: 4481)
|
|
Aug 15 2009, 3:42 PM EDT
|
|
|
edit |
201 words added
107 words deleted
|
|
Change: are located. An infrastructure would not necessarily replace or copy existing repositories and services, but could aim at connecting them. Provenance information is very important as an underlying feature, since a user might usecombine a parser X with a tagger Y,Y etc.connectfrom existingdifferent archivesplaces
View changes from previous version.
(Word count: 4502)
|
|
Aug 15 2009, 3:27 PM EDT
|
|
|
edit |
162 words added
80 words deleted
|
|
Change: therefore crucial that repositories offer versioning and updating of the stored materials. Some motivation,researchers might justprefer useto control distribution themselves from their webpage,own homepage. theA possible solution could be that links or webpages cancould be generated from the repository automatically;automatically. veryThis could
View changes from previous version.
(Word count: 4410)
|
|
Aug 15 2009, 3:15 PM EDT
|
|
|
edit |
102 words added
112 words deleted
|
|
Change: It is challenging to reference mashup data since the mashup process is dynamic and every combination of specific versions of data produces a new mashup
View changes from previous version.
(Word count: 4328)
|
|
Aug 15 2009, 3:13 PM EDT
|
|
|
edit |
187 words added
152 words deleted
|
|
Change: Once you have provenance, what if someone retracts/deletes all or some research data? One could mark data as deleted, invalid or superseded withour actually destroying
View changes from previous version.
(Word count: 4338)
|
|
Aug 15 2009, 3:03 PM EDT
|
|
|
edit |
24 words added
20 words deleted
|
|
Change: or superseded withour actually delete),destroying justthe version;data; privacy would be one reason but it is often becauseeconomically isor economically/politicallypolitically inconvenient to have data around,around e.g.(cf. Swiss bank case in WikiLeaks; recordings of politically sensitive events, should have been
View changes from previous version.
(Word count: 4302)
|
|
Aug 15 2009, 3:00 PM EDT
|
|
|
edit |
103 words added
96 words deleted
|
|
Change: Reliability, persistence, acceptance, retractionData which may be inconsistently or obsoletely coded or which do not contain enough metadata is still abundant. Such data may need to
View changes from previous version.
(Word count: 4300)
|
|
Jul 19 2009, 3:12 PM EDT
|
|
|
edit |
15 words added
77 words deleted
|
|
Change: legacy issues: for some fields, being able to get legacy data into this state may be really important, e.g. lexicography where build on previous versions; legacy behavior is also an issue, e.g. if all examples are in italics and want to be able to search thempapers.Wikipedia(Wikipedia
View changes from previous version.
(Word count: 4293)
|
|
Jul 19 2009, 3:04 PM EDT
|
|
|
edit |
95 words added
144 words deleted
|
|
Change: data formats and interpretability: what if in bad format, not enough metadata, no/insufficient documentation(re)formatting:Data letwhich peoplemay knowbe whatinconsistently formatsor areobsoletely neededcoded andor howwhich todo getnot them;contain adviceenough onmetadata whatis tostill doabundant.
View changes from previous version.
(Word count: 4357)
|
|
Jul 19 2009, 2:55 PM EDT
|
|
|
edit |
66 words added
151 words deleted
|
|
Change: can also mask data in certain ways: manual and automatic; e.g. making non-words out of words, but keeping POS; media is harder to deal with
View changes from previous version.
(Word count: 4410)
|
|
Jul 19 2009, 2:46 PM EDT
|
|
|
edit |
23 words added
|
|
Change: Some language data, especially spontaneous speech and sign language, cannot be distributed due to privacy issues, in particular in utterances referring to people. becomes very complex with international access, where different countries may have different rules for guarding privacy. This situation may require different country-specific licences and legal advice.Some
View changes from previous version.
(Word count: 4508)
|
|
Jul 19 2009, 2:30 PM EDT
|
|
|
edit |
177 words added
5 words deleted
|
|
Change: ; media is harder to deal with than strict text, especially sign language where need much of the exact original data; any encoding of a
View changes from previous version.
(Word count: 4485)
|
|
Jul 19 2009, 1:45 PM EDT
|
|
|
edit |
17 words added
3 words deleted
|
|
Change: Michael Cysouw wrote a detailed proposal for modern dictionary publication: http://colab.mpdl.mpg.de/mediawiki/Living_Sources_in_Lexical_DescriptionSaturday morning sessionprivacycould use information from original institution waivers to guide what permissions to put incan also mask data in certain ways: manual and automatic; e.g.
View changes from previous version.
(Word count: 4308)
|
|
Jul 19 2009, 11:10 AM EDT
|
|
|
edit |
9 words added
|
|
Change: situation may require different country-specific licences and legal advice.want: top level take away, one page take away, the whole thingWork Session 4: clean up, start of report generation, and panel prep possible organization of report:intro: what are the issues
View changes from previous version.
(Word count: 4291)
|
|
Jul 19 2009, 11:09 AM EDT
|
|
|
edit |
|
|
Change: There were only format changes (bold, italics, etc.) in this version. See this version for details.
(Word count: 4282)
|
|
Jul 19 2009, 11:07 AM EDT
|
|
|
edit |
24 words added
39 words deleted
|
|
Change: educational mission is an important part of this: reach out to students and faculty make sure is not responsibility of author to "fix" or alter; can license in such a way to require/encourage republication machine produced data is another thing, e.g. all of semantics for English Wikipedia service oriented architecture
View changes from previous version.
(Word count: 4282)
|
|
Jul 19 2009, 1:07 AM EDT
|
|
|
edit |
42 words added
10 words deleted
|
|
Change: a specific view among many other possible views of this material? This could this be done by generating a unique URI and handle for the transformed and formatted web page. An "I want to cite this" button would make this process easy for the user.this could help
View changes from previous version.
(Word count: 4298)
|
|
Jul 19 2009, 1:01 AM EDT
|
|
|
edit |
74 words added
24 words deleted
|
|
Change: In contrast to paper materials, which are static and pre-edited, a cyberarchive allows the user to participate in filtering and presenting information ("play editor yourself"). An example is the Wittgenstein archives digitized:at hasBergen, manywhich contains digitized manuscripts; the user can
View changes from previous version.
(Word count: 4265)
|
|
Jul 18 2009, 7:58 PM EDT
|
|
|
edit |
96 words added
1 word deleted
|
|
Change: European parliaments wants to revise copyright on a larger scale; could include this for availability of data for research purposesreliabilitydata formats and interpretability: what if in bad format, not enough metadata, no/insufficient documentation(re)formatting: let people know what formats are needed
View changes from previous version.
(Word count: 4217)
|
|
Jul 18 2009, 7:51 PM EDT
|
|
|
edit |
105 words added
2 words deleted
|
|
Change: (Atlas of European Languages was published volume by volume with data, comments on what did and systems used, maps; all on paper) field worker with annotated corpus with translation: getting and analyzing data very time consuming document what did: many judgments go into
View changes from previous version.
(Word count: 4121)
|