While there has been little in the way of broad-based “standards” setting in the field of sociolinguistics, there are common practices that have been developed in research laboratories over years of community-based research in variationist and interactional sociolinguistics, course materials that have provided a foundation for graduate level training, and online curricular resources prepared and shared by individual practicioners who have chosen to make their materials and protocols available to the wider community. This page is intended to provide a partial, expandable list of such resources. There is also a growing list of published writings on best practices. These links are shared not as standards, but as a aids for those interested in best practices (or just common practices) around data sharing, storage, and retrieval in sociolinguistics.
Recommended Readings:
DiPaolo, M. and Yaeger-Dror, M. (forthcoming)
Best Practices in Sociophonetics. Cambridge UP
Software:
Akustyk: (bartus.org): Free, open-source vowel analysis software package. It installs as an add-on to the popular software,
Praat (http://praat.org). Provides relational database functionality for storing project, speaker, and token level metadata.
The members of the WG2 were aware of common practices/best practices regarding:• Ordering of elicitation tasks in a conventional variationist sociolinguistic interview
• Standards for metadata tags for associating sociodemographic data with audiofiles (
example)
• Conventions for labelling soundfiles
• Data analysis time estimations (time required to orthographically transcribe one hour of recorded speech, analyze one vowel at midpoint; summarize the demographic information contained in some database)
Needed standards and data-sharing resources in Sociolinguistics:• Establishing a common, widely-used set of
metatadata tags for associating sociodemographic data with audio- and video-recordings that will render data useful for a range of types of sociolinguistic analysis addressing a range of research questions and may “scale up” in usefulness for other types of linguistic analysis
• Standards and models for the sharing of recordings, social data (assuming adherence to required human subjects protection protocols), transcriptions (orthographic, IGT, phone-level) and measurements drawn on that data (e.g. to support cross-language, cross-dialect study or language change research)
• What is data? Understanding the layering of the notion of data (or the continuum from data to products of research including full length or excerpted audio/video-recordings, transcriptions, written texts, measures, pages of IGT, summarized data, primary vs. secondary data).
o What needs to be protected? What may be copyrighted by the researcher? What belongs to the community? How do we serve community interests so that research has broader impact in both scholarly and lay communities
o Levels of representation of the social data as distinct from the linguistic data
• Training in how to conduct appropriate inferential statistical tests for the range of different data structure types (from phonetic data, to syntactic data, to subjective reaction/attitudinal data)
• Training in generating human subjects applications that will enable sharing of data within the wider research community (at different levels within an institution or more broadly in the field, and in the lay community, as appropriate)
• Versioning practices that will enable tracking changes to all types of data and associated products of research, and which tracks the motivations for changes made to audio or text files
• Online clearinghouses for elicitation materials, research instruments, tools, and recording device configurations for various types of study. These should, at a minimum, be supplied with information regarding proper crediting of the originator of the tool, and instructions regarding how to cite the tool in a bibliographic record (example at UW Sociolinguistics Laboratory)
o Praat scripts
o Commutation tests
o Reading passages
o Word lists
Needed standards for data-analysis in sociophonetics• Best practices for recording data in formats that will ensure sufficient fidelity for acoustic analysis of various kinds (amplitude, pitch, formant frequency, jitter, shimmer, duration)
• Representational conventions vis-a-vis what qualifies as narrow vs. broad transcriptions
• Standards regarding inter-measurer and inter-transcriber reliability (verification of measures)
• Training in use of determining appropriate inferential statistical tests for data structure types
• Greater transparency with regard to documenting transcription conventions (word class categories and memberships, explanations for use of phonetic symbols and diacritics).