|
Sep 2 2009, 10:48 AM EDT
|
|
|
edit |
1 word added
1 word deleted
|
|
Change: (co-chair)Kurt BollackerTracy Holloway King (co-chair)Koenraad de Smedt Paul Trilsbeek AbstractIn this paper we discuss what data provenance and data reliability are, with special attention to the needs of
View changes from previous version.
(Word count: 3674)
|
|
Sep 1 2009, 4:25 AM EDT
|
|
|
edit |
130 words added
54 words deleted
|
|
Change: system. PID systems only work though as long as the administration of the resource links is maintained, so the assignment of PIDs alone does not guarantee the long-term maintenance.stability of the resource references. The linguistic communicty might consider setting up a registration authority for linguistic data.Every entity
View changes from previous version.
(Word count: 3674)
|
|
Aug 31 2009, 12:17 PM EDT
|
|
|
edit |
33 words added
30 words deleted
|
|
Change: linguists to support this. Publishers would themselves be rated and willwould need to actively advertize their data publications and make them attractive to researchers. Peer review of data publications should be stimulated. MaybePerhaps language resources would be published unedited first,
View changes from previous version.
(Word count: 3596)
|
|
Aug 31 2009, 11:06 AM EDT
|
|
|
edit |
11 words added
11 words deleted
|
|
Change: reliableReliable provenanceProvenance throughThrough publicationPublicationA major question is how to achieve reliable data provenance in the linguistic community and promoting the sharing of data. Creating
View changes from previous version.
(Word count: 3595)
|
|
Aug 31 2009, 11:05 AM EDT
|
|
|
edit |
120 words added
328 words deleted
|
|
Change: Comprehensibility is key to data reliability. All data must be tagged with the appropriate metadata and linked to its documentation. This allows researchers to understand
View changes from previous version.
(Word count: 3595)
|
|
Aug 31 2009, 10:51 AM EDT
|
|
|
edit |
136 words added
120 words deleted
|
|
Change: Given that linguists change institutions and that URLs shift over time, it is important that future researchers be able to access the same data that is being used today and to be certain that this is the same data as was used by other researchers. On the Internet, rightsthe
View changes from previous version.
(Word count: 3804)
|
|
Aug 31 2009, 10:42 AM EDT
|
|
|
edit |
3 words added
|
|
Change: signed dissertation is submitted.Finally, the establishment of electronic data publishing journals in conjunction with a cyberinfrastructure should be considered, so as to provide a formal channel for establishing authorship of data sets and creating a scholarly reference in addition to a framework for peer review.
View changes from previous version.
(Word count: 3786)
|
|
Aug 31 2009, 10:42 AM EDT
|
|
|
edit |
60 words added
42 words deleted
|
|
Change: until the signed dissertation is submitted. Finally, the establishment of electronic data publishing journals in conjunction with a cyberinfrastructure should be considered, so as to provide a formal channel for establishing authorship and creating a scholarly reference in addition to a framework for peer review.
View changes from previous version.
(Word count: 3783)
|
|
Aug 31 2009, 10:35 AM EDT
|
|
|
edit |
45 words added
12 words deleted
|
|
Change: It will therefore be useful for the linguistic community to to engage in extensive dissemination and training efforts and to establish links with ongoing generic projects on metadata standards and preservation (e.g. PREMIS).There are several ways to encourage linguists to
View changes from previous version.
(Word count: 3764)
|
|
Aug 31 2009, 10:25 AM EDT
|
|
|
edit |
174 words added
147 words deleted
|
|
Change: This again is a challenge for provenance information. Ideally, a handle can be assigned to every step in the pipelining process -- note that at every step, intermediate data could be cached.Furthermore, Rosetta, Freebase, the Internet Archive etc. allow for mashups of data. Cyberinfrastructures
View changes from previous version.
(Word count: 3730)
|
|
Aug 31 2009, 10:08 AM EDT
|
|
|
edit |
120 words added
115 words deleted
|
|
Change: As part of provenance, in recording the who, what, and when of metadata, it is necessary to have trusted identification of individuals, organizations, and services.
View changes from previous version.
(Word count: 3706)
|
|
Aug 31 2009, 10:01 AM EDT
|
|
|
edit |
1415 words added
1 word deleted
|
|
Change: Researchers are often unwilling to turn over their data for storage and distribution in repositories. One reason is that some people feel their data is
View changes from previous version.
(Word count: 3701)
|
|
Aug 31 2009, 9:29 AM EDT
|
|
|
edit |
296 words added
141 words deleted
|
|
Change: Every entity involved in data set creation can be identified by a unique handle. These include entities such as people, organizations, and their roles, the
View changes from previous version.
(Word count: 2273)
|
|
Aug 30 2009, 11:12 AM EDT
|
|
|
edit |
131 words added
100 words deleted
|
|
Change: Individuals contributing to and creating these data sets need to get institutional credit for data publication. For example, these should count for tenure reviews and
View changes from previous version.
(Word count: 2116)
|
|
Aug 30 2009, 11:01 AM EDT
|
|
|
edit |
273 words added
264 words deleted
|
|
Change: is important to know whether the trees were manually constructed, created automatically, or bootstrapped by manually correcting automatically constructedwas trees.created.How to Achieve ProvenanceA major question is how to achieve reliable data provenance in the
View changes from previous version.
(Word count: 2085)
|
|
Aug 30 2009, 10:37 AM EDT
|
|
|
edit |
40 words added
56 words deleted
|
|
Change: it. It is important to know who contributed to a data set by collecting the data and by providing theits data.publication. The data might come from native-speaker informants, from published works of literature, from the web, etc. ThisAdequate allowsinformation theabout qualityprovenance, ofi.e.
View changes from previous version.
(Word count: 2077)
|
|
Aug 24 2009, 4:12 PM EDT
|
|
|
edit |
16 words added
1 word deleted
|
|
Change: community in a context where we are moving from simple data sets to more complex cyberinfrastructures. We then suggest some first steps to promote data sharing and publication in the linguistics community.Data ProvenanceProvenance is the who, what, and when of metadata. When a data set
View changes from previous version.
(Word count: 2092)
|
|
Aug 17 2009, 12:24 PM EDT
|
|
|
edit |
|
|
Change: There were only format changes (bold, italics, etc.) in this version. See this version for details.
(Word count: 2077)
|
|
Aug 17 2009, 12:23 PM EDT
|
|
|
edit |
2064 words added
|
|
Change: Provenance is extremely important to the linguistic community which uses natural language data sets as the basis of all of its work. Provenance provides a
View changes from previous version.
(Word count: 2077)
|
|
Aug 17 2009, 12:22 PM EDT
|
|
|
create |
No content added or deleted. |
|
Change: Created by Aug 17 2009, 12:22 PM EDT for: no reason given
|