As a part of the ORAAL project, we have developed the first public corpus of AAL data, the Corpus of Regional African American Language (CORAAL). CORAAL features recorded speech from regional varieties of AAL and includes the audio recordings along with time-aligned orthographic transcription.
CORAAL is a long-term corpus-building project conceived of in terms of several components. The first two components of CORAAL focus on AAL in Washington DC, the nation’s capital, a city with a long-standing African American majority, and the site of much early research on AAL (e.g. Fasold 1972). In April 2018, the first additional component, CORAAL:PRV, was released, making available data for 16 speakers from a rural community in central North Carolina. In October 2018, we released CORAAL:ROC, a subset of sociolinguistic interviews conducted by Sharese King as a part of her dissertation project in Rochester, NY (King 2018). In May 2020, we are pleased to release the newest component, CORAAL:ATL, a friendship network from Atlanta, as well as newly transcribed recordings from CORAAL:ROC. Details about CORAAL Components can be found here.
Together, CORAAL include data from over 150 sociolinguistic interviews from speakers born between 1891 and 2005 and over a million words of accurate time-aligned transcription of conversational speech.
All interviews have been anonymized and orthographically transcribed with time-alignment at the utterance level. Audio is available in high-quality uncompressed (.wav) format, and transcripts are available in three formats, Praat TextGrid (.TextGrid) files, ELAN (.eaf) files, and as plain text (.txt) files with tab-delimited fields.
In addition to the official components of CORAAL, beginning in May 2020, ORAAL also features a related set of materials, CORAAL Supplements, representing recordings or selections from larger datasets important to the field of sociolinguistics. These CORAAL Supplements highlight newly available datasets that are available publicly but are not part of CORAAL's official components.
Update, April 2021: Phone-level aligned TextGrids and a language model are now available through LingTools using the Montreal Forced Aligner. A new version of CORAAL is scheduled for release in May 2021, featuring two new components, CORAAL:SGA, sociolinguistic interviews with speakers from a small city in Southern Georgia, as well as CORAAL:LES, featuring speakers from Dr. Kara Becker's dissertation work from the Lower East Side of Manhattan, New York.
CORAAL is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike (4.0) International license (https://creativecommons.org/licenses/by-nc-sa/4.0/). It is available for free and is downloadable from the above link.
More information is available in the User Guide, and we suggest you read that document for full information about the corpus. As a part of their work on their paper in American Speech, "Contextualizing the Corpus of Regional African American Language DC: AAL in the Nation's Capital", Charlie Farrington and Natalie Schilling prepared an extensive reference list of publications related to AAL in Washington, DC; you can access the reference list here.
As with the ORAAL project, CORAAL was developed with support from the National Science Foundation (Grant No. BCS-1358724), and the University of Oregon.
How to cite CORAAL
Kendall, Tyler and Charlie Farrington. 2020. The Corpus of Regional African American Language. Version 2020.05. Eugene, OR: The Online Resources for African American Language Project. http://oraal.uoregon.edu/coraal.
See the CORAAL User Guide for information about citing CORAAL’s individual components.
Contact the CORAAL team
Contact the CORAAL development team with any questions or comments about CORAAL.