Our charge:
To boldly choose the best facets of a cyberinfrastructure from various fields, not necessarily choosing a specific field or tool to copy. Categories of these facets include:
- Data handling practices
- Collaborative/organizational structure
- Internet technology and tools
Data handling practices
- no central data store
- appropriate conceptualizations of the field
- flexible visualization of data
- open data (but sensitive to access restrictions)
- easy to cite (DOIs, URIs)
Collaborative/organizational structure
collaborative modularity: tailoring data for a certain project is not the best model for collaborative data. Instead, keep the overall idea of data sharing in mind when projects are planned.
Technology
Internet trends and practices
- Linked Data (use of URIs and RDF)
- Cloud computing (infrastructure as service)
- Web Services
Sources of inspiration (pieces of software)
- Pangaea (repository and citation, cf. OLAC)
- FreeBase (open, free community model, cf. GOLDComm)
- Nanohub (educational materials portal, cf. LinguistList)
- OpenWetWare (info and practices site, cf. glottopedia)
- ManyEyes (visualization of data, cf. MultiTree)
Open questions:
existing repositories (e.g., for field data)
Should we use currently available tool and infrastructure, or build our own linguistics-specific CI?
Is it possible/desirable to organize ourselves as a field (or part of a federation of fields) to achieve a kind of infrastructure we see in other fields?
Synergy between theory and application: e.g., interaction among linguists as a field with military, industry, etc. who are interested in data for other reasons.