Sociolinguistics case study (version control) |

Version 9 - view current page

Collaborative research utilizing version control

Case study subdiscipline: Sociolinguistics
Project title:Dialect Evolution and Ongoing Variable Linguistic Input in Pacific Northwest English
(Alicia Beckford Wassink, Principle Investigator, University of Washington,
Department of Linguistics)
Software used: Microsoft Sharepoint Server 2007 (for versioning control and remote collaboration)
Goals of this case study:Demonstrate the use of versioning software to register the changes made to spoken
language recordings and associated data to enable tracking of modifications, and
make transparent the nature of and motivations for the changes
As an example of versioning, we can look at how a sociophonetic study uses versioning software in a collaborative research collaboration area (ORCA).


Project description from the project homepage (http://www.artsci.washington.edu/nwenglish/index.asp):
The Pacific Northwest English project investigates the features of English spoken in the Pacific Northwestern region of the United States (PNW), two hundred years after the introduction of non-indigenous speakers to the region. The Pacific Northwest English (PNWE) project explores the extent of English dialect development in the Pacific Northwest region of the United States. It also documents the stories of families with deep roots in the Pacific Northwest region.

1. Data Elicitation

• Elicitation of data involves the utilization of a standard multip-part variationist sociolinguistic interview schedule allowing collection of data in different spoken registers (unscripted conversation, interview schedule, reading passage, word lists, semantic differentials prompts, syntactic diagnostic prompts).
• While we cannot make all instruments available via the wiki (to avoid exposing materials to potential respondents), smilar elicitation materials are publicly available in the Elicitation Materials clearinghouse, Sociolinguistics Laboratory, University of Washington.
• Field data are recorded in the field for a judgement sample and in the laboratory using telephony devices to acquire data for a complementary random sample.

2. Storage

• Original recordings (recorded at a 44.1kHz sampling rate in uncompressed form to compact flash media, using M-Audio MicroTrack digital flash recording devices) are stored in three locations (as required by IRB protocols): in ISO-9660 formatted compact disks, locked in a cabinet accessible only to the principal investigator, 2) in redacted form on compact disks in a CD archive, 3) in redacted form in an online research collaboration area (Microsoft SharePoint).
• Redacted formats have been edited in Praat software for the removal of potentially identifying information, so that the acoustic signal has been attenuated to zero, while leaving the time dimension intact for version control. Allowing all versions of the soundfiles to retain original timings enables locating temporal events of interest in the versions of the recordings and transcriptions (which have been time-stamped based upon the non-redacted versions of the signal).

3. Version control

• Version control is provided in the Microsoft SharePoint online research collaboration workspace. Versioning is particularly used in document libraries, where soundfiles and transcribed materials are stored
[[insert screenshot: document library navigation bar]]
• Version control requires (in this case, although other versioning software varies) that each user check out a soundfile or transcript from the document library. Only one user may check out a file at a time.
• The file is modified by the user.
• At the end of a work session, the user uploads the modified version of the file to the document library. The software prompts the user to provide comments regarding what changes were made to the document, and automatically timestamps new file with the upload time and version.
• Crucially, all prior versions are available to the user. This allows full control and comparison of different versions of the documents stored in the library without overwriting data.
• Registering changes
• A discussion area within the ORCA allows substantive changes to analysis protocols to be discussed and documented so that important decisions may be registered as part of the project history.
[[insert screenshot: topic list from the general discussion site]]
[under construction: limitations and advantages of using versioning for spoken software. ]

4. Metadata

• Akustyk software is used for associating project, speaker and token level metadata with events in the sound file.
• A project handbook registers methodology and decisions made.
  • The metadata associated with all recordings is here

5. Access
• Sharepoint allows for restriction of access depending on permissions criteria for each member of the research team. It is possible, in principle, to share redacted versions of the recordings with all members of the team with data analysis functions, and restrict access to the non-redacted versions to the PI. Permissions criteria are set by principal investigator.
• [insert screenshot here: permissions interface]


Versioning software:
• Concurrent Versions System (CVS): An open-source revision control system (http://www.nongnu.org/cvs/)
• Subversion: An open-source revision control system (http://subversion.tigris.org/)
• Microsoft Sharepoint Server, 2007 (http://sharepoint.microsoft.com/Pages/Default.aspx)


Benefits of utilizing version control software:
-Allows research team to avoid the pitfall of saving numerous copies of the same file(s) in the same, or worse, different locations, and having to remember those locations.
-Allows research team to keep track of the current version of a working soundfile, spreadsheet (containing acoustic measures and demographic data in this case) between different users and/or different machines
-Offers the capability to revert to earlier versions of some or all of the files in a given workspace
-Some versioning software (e.g., Subversion), offers the ability to merge work done on the same file by different users
-Members of the working team located remotely may all access common elements in the same workspace when they *do* talk together (e.g. video or teleconferencing); and contribute without the risk of overwriting each other's work