Design styles for effective use of the Web
The Web has a distributed architecture that sees application in a vast array of domains. Linguistics specific data sharing systems should follow "best-practices" for Web design so that linguistics data can be used effectively in conjunction with datasets from other disciplines.
RESTful design patterns are useful for multidisciplinary data sharing and using the Web as a publishing platform. Some important elements of RESTful design include:
- Data are located at specific addresses (URI's).
- Data can have different representations for different purposes. For instance, a resource may have a "human-readable" representation (a Web-page with attractive formatting for use in a browser) versus "machine-readable" representation such as XML for easy parsing by software.
- Getting data is very simple and only requires following a hyperlink to a specific address. There is no need to send more complex message to obtain data.
- In the same vein, there is a very limited set of general actions that can be performed on data. The most common is retieving data ("GET"). Other verbs that can be performed include: Creating a wholly new resource ("PUT"), creating / updating a subordinate resource ("POST"), and deleting data ("DELETE").
The simplicity of RESTful designs helps account for the explosive growth of the Web. However, some efforts at data sharing and design of shared infrastructure in the sciences and in industry deviate from RESTful design principles. These other efforts added layers of complexity (requiring additional verbs or other complex messages to interact with data, requiring "state" to be tracked in data exchanges, etc.). As a result, some cyberinfrastructure and enterprise systems have been more expensive to develop and maintain. These more complex systems are also harder to extend, use in unintended applications, and harder to make interoperable.
REST isn't really a standard, but more of a design style. In addition, there are few official standards for designing REST styled Web-services (systems designed for the exchange of machine-readable data). From the perspective of building a cyberinfrastructure where multidisciplinary data-sharing is a goal, it is probably more important adhere to RESTful design styles for data retrieval than other operations (creating, updating and deleting data). In other words, if one wants to make linguistic data available for a multidisciplinary community, these data should be made through no more fuss and bother than simply following hyperlinks. (
Note: there is nothing about REST that precludes security and authorization systems. Such systems can and do work perfectly well in RESTful systems).
Atom Standards and REST
Multi-disciplinary research may require mixing of data and services in ways that cannot be easily anticipated. Making this easy to do experimentally would be useful. This requires styles of service design (such as
RESTful architectures) that lower costs and barriers to entry and use.
- Atom Syndication Format (widely implemented). Atom can serve as a convenient "standard container" for more specialized XML payloads (such as an XML document expressing interlinearized annotation of a text). Atom's simple standard metadata that can make more specialized XML payloads more intelligible. The Atom standard is well designed and it can be extended to include additional metadata (most commonly with GeoRSS, a standard for expressing geographic data). Atom can be used as a common format to express the results of queries (as feeds, with feed entries as records). It can be extended to support more specialized applications.
- Atom Publication Protocol for updating and contributing to a collection (implemented by the SWORDS project). SWORDS lets you deposit content into a repository without worrying about what kind of repository it is..
Examples of Atom and REST
The screenshot below illustrates why REST and Atom-based web services are useful. The following example comes from aggregating data from the "
Portable Antiquities Scheme", an online database of antiquities found by the public and registered with museums and heritage organizations in the UK. The Portable Antiquities Scheme has a service that expresses the results of queries as a feed. The feed has geographic metadata expressed as
GeoRSS. This feed is combined with a similar feed from
Open Context.
Yahoo Pipes, a feed manipulation and processing service, was used to combine the feeds from these two sources.

The above example is of interest because the Portable Antiquties Scheme and Open Context have very different underlying data structures, vocabularies, and schema. Yet data from these two sources can be aggregated to a limited, though still useful, extent. In Open Context's case, specialized XML data (using the ArchaeoML global schema) for each record is available. Open Context's Atom feeds point to these ArchaeoML data, making the specialized data easily available for more sophisticated applications than illustrated here.
The main point of this discussion is that useful and meaningful cooperation across different datasources (potentially across disciplinary boundaries) is possible even when using very simple common standards.