Metatdata Tags Frequently used by SociolinguistsIdeally, every audio or video file generated as part of a research project should include data about the data ("metadata"). Metadata is used to facilitate the understanding, use and management of data. The metadata required for this will vary with the type of data and context of use. In sociolinguistics, there is a common core of sociodemographic information that is of interest for addressing particular types of research questions related to language change and variation. For this reason, it seems reasonable to publish a common set of data that researchers may choose to associate with recordings generated in field research. Supplying information in datafields not of immediate interest to a research study is worth the effort because it carries a minimal time burden, but greatly increases the usability of data for other researchers and types of study.
In its simplest form, metadata may be stored in a text file (.txt or .rtf) that sits in the directory or folder with the sound or video file. For metadata attached to Web pages, the standard encoding scheme is HTML (HyperText Markup Language). RDF (Resource Description Framework) supports multiple metadata schemes.
Below is a list of metadata tags used at the University of Washington, Sociolinguistics Laboratory. These draw in part from the metadata categories used in
Akustyk. The recording metadata categories below are also the basis for a digital CD archive (generated using FileMaker Pro software) that drives a CD carousel. Fields in the FileMaker records are searchable, and CDs in the catalogue may be located and automatically ejected using search criteria.
Recording Metadata (often in a readme.txt document in main project archive)Project name
Project website (url)
Name of sound or video file
Format of sound or video file
Name of database file(s) (transcriptions, text tiers, annotation files, etc)
Name of the data administrator or investigator
How to contact data administrator
IRB approval number
Date recording made
Location of recording
Speakers on recording
Publications associated with the project (appropriate for bibliographic citation)
Register
Type of recorded data (unscripted conversation, dyadic or small group interview, individual interview, reading passage, wordlist, minimal pair list, words in isolation, self-commutation test, map task, attitude or subjective reaction test)
Names of elicitation instruments used to elicit data (with filename, as appropriate)
Version history available for datafiles?
Translations available of transcription files?
Speaker-level tags:Name
Sex
Age
Age cohort
Known speech impediments or disorders
Ethnicity
Socioeconomic class
Highest educational level attained
Occupation
Place of birth
Residence history (places lived for more than 6 months)
Regionality
Social Network information available (yes/no)? If yes, name of datafile:
Neighborhood
Bi/Multilingual (yes/no)
Language background (all language varieties [dialect region/language name] spoken)
Languages spoken natively
Languages of high fluency
Languages of low fluency
Writing system used or preferred by speaker
Level of literacy
Group-level tags:Language
Language modality (signed, spoken)
Dialect
Task (wordlist, reading passage, casual conversation, etc.)
Bi/Multilingual (yes/no)
Token-level tags:Vowel (IPA category)
Word
Preceding phone
Following phone
Place
Manner
Voicing
Phonation type
Normalized (y/n)
Stress (primary/secondary/unstressed)
Tone level
Other phonetic tagsWindow length
Sampling Rate