Data Sources

CORAAL allows for increased access to conversational speech recordings, which are available through ORAAL. In developing CORAAL, we also wanted to shed light on other sources of data that are openly available or accessible to educators and researchers. Here, we describe the sources available through ORAAL, including CORAAL Components and CORAAL Supplements, as well as several External sources. These sources include a wide range of information, including sociolinguistic recordings, oral histories, and conversations. Each source listed below includes information about accessing the recordings, recording availability, and external links for further information.

The following data sources list was prepared by Charlie Farrington and Jaidan McLean. Please contact us at CorpusOfRegionalAAL@gmail.com if you want to suggest more sources to add to this list!

CORAAL: Atlanta

CORAAL:ATL consists of 13 primary speakers across 14 recordings. The speakers for these recordings are part of a modern Atlanta friendship sphere, where many of the speakers were not born and raised in Atlanta, but rather moved to the south from other locations. Eight speakers are in Age Group 1 (18-29 years old) and five speakers are in Age Group 2 (30-50 years old). Located in northeastern Georgia, in 2010, the city of Atlanta had a population of 420,003, with the black population accounting for 54% of that population. The Metro Atlanta population was over 5.2 million in 2020. The city is home to the Martin Luther King Jr. National Historic Site, which is to remember and honor African American history throughout hte Civil Rights Movement and Civil War.

Farrington, Charlie, Tyler Kendall, Patrick 'Slay' Brooks, Emma Mullen, and Chloe Tacata. Forthcoming. 2020. The Corpus of Regional African American Language: ATL (Atlanta, GA 2017). Version 2020.05. Eugene, OR: The Online Resources for African American Language Project.

Download CORAAL:ATL from CORAAL, or browse the corpus from the CORAAL Explorer!

CORAAL: D.C. (1968)

CORAAL:DCA consists of 68 speakers across 74 recordings, originally collected as part of Ralph Fasold's (1972) foundational study of African American Language in Washington, D.C. (Fasold 1972). The speakers were recorded between March 1968 and August 1969, with dates of birth ranging from 1891 to 1958. The 68 speakers selected for CORAAL are not the exact same set of speakers analyzed by Fasold (1972). We have selected speakers from Fasold’s interviews to best represent four age groups and three social class groups, although a balanced demographic matrix is not possible given the emphasis of the original project on young speakers. The youngest age group has additional speakers for two reasons: there are lots of these speakers in Fasold’s data and their interviews tend to be shorter, so extra speakers were included to increase the amount of total data available for the demographic group. The social class groups are not completely analogous to Fasold’s groups, which are based on the Index of Status Characteristics, but are meant to capture broad social strata. These recordings are publicly available for download through CORAAL and the CORAAL Explorer.

Fasold, Ralph W. 1972. Tense marking in Black English: A linguistic and social analysis. Arlington, VA: Center for Applied Linguistics.

Kendall, Tyler, Ralph Fasold, Charlie Farrington, Jason McLarty, Shelby Arnson, and Brooke Josler. 2018. The Corpus of Regional African American Language: DCA (Washington DC 1968). Version 2018.10.06. Eugene, OR: The Online Resources for African American Language Project.

Download CORAAL:DCA from CORAAL, or browse the corpus from the CORAAL Explorer!

CORAAL: D.C. (2016)

CORAAL:DCB consists of 48 primary speakers across 63 audio files, collected specifically for CORAAL. The speakers were recorded between July 2015 and December 2017. Speakers were collected through a friend of a friend network to fill the 4 x 3 demographic matrix, as was done for DCA. The socioeconomic groups here are meant to capture broad social strata; the qualitative labels are simple descriptors to help orient users around the ordering. Theses are not meant to represent theoretically motivated socioeconomic assessments of individuals. They are also not intended to be perfectly analogous to Fasold classifications. There are theoretical and practical issues comparing socioeconomic indices in the DC community 50 years apart. We have tried to capture and include in the metadata broad information about speakers’ demographic backgrounds, but leave questions of interpretation up to end-users. These recordings are publicly available through CORAAL and the CORAAL Explorer.

Kendall, Tyler, Minnie Quartey, Charlie Farrington, Jason McLarty, Shelby Arnson, and Brooke Josler. 2018. The Corpus of Regional African American Language: DCB (Washington DC 2016). Version 2018.10.06. Eugene, OR: The Online Resources for African American Language Project.

Download CORAAL:DCB from CORAAL, or browse the corpus from the CORAAL Explorer!

CORAAL: Lower East Side, New York City

CORAAL:LES consists of 11 primary speakers, collected in 2008 and 2009 by Kara Becker as part of her dissertation research in New York City. These recordings will be made available in CORAAL in early 2020.

Becker, Kara. 2009. Regional dialect features on the Lower East Side of New York City: Sociophonetics, ethnicity, and idenity. Ph.D. dissertation. New York City: New York University. <Proquest Link>

Coming to CORAAL in 2020!

CORAAL: Princeville, North Carolina

CORAAL:PRV consists of 16 primary speakers across 32 audio files, collected by Ryan Rowe, Walt Wolfram, and colleagues for the North Carolina Language and Life Project. Princeville, NC is the oldest town incorporated by African Americans in the United States. Many community members can trace their families back to the original founders of the town. The speakers were recorded between August 2003 and June 2004. As of the 2000 census, African Americans composed 97% of the population (Rowe 2005). These recordings are publicly available through CORAAL and the CORAAL Explorer. Additional recordings may be available through the Sociolinguistic Archive and Analysis Project.

Rowe, Ryan. 2005. The development of African American English in the oldest black town in America: Plural -s absence in Princeville, NC. MA Thesis. Raleigh: North Carolina State University.

Rowe, Ryan, Walt Wolfram, Tyler Kendall, Charlie Farrington, and Brooke Josler. 2018. The Corpus of Regional African American Language: PRV (Princeville, NC 2004). Version 2018.10.06. Eugene, OR: The Online Resources for African American Language Project.

Available for download from CORAAL. Browse CORAAL:PRV in the CORAAL Explorer, and visit Princeville's OLAC entry for SLAAP.

CORAAL: Rochester, New York

CORAAL:ROC consists of 14 primary speakers across 13 audio files, collected in 2016 and 2017 by Sharese King as part of her dissertation research in Rochester, New York. Rochester is a city on Lake Ontario, in Monroe County in western New York state. Since the early twentieth century, Rochester has been home to a large African American population (King 2018). Speakers were provided by King for CORAAL from a larger dataset to fill a 2 x 3 demographic matrix. We do not focus on socioeconomic strata, but focus on providing a distribution across gender and age groups. We have attempted to capture and include in the metadata broad information about speakers’ demographic backgrounds, but leave questions of interpretation up to end users. These recordings are publicly available through CORAAL and the CORAAL Explorer.

King, Sharese. 2018. Exploring social and linguistic diversity across African Americans from Rochester, New York. Ph.D. dissertation. Palo Alto, CA: Stanford University.

King, Sharese, Charlie Farrington, Tyler Kendall, Emma Mullen, Shelby Arnson, and Lucas Jenson. 2020. The Corpus of Regional African American Language: ROC (Rochester, NY 2018). Version 2020.05. Eugene, OR: The Online Resources for African American Language Project.

Available for download from CORAAL. Browse CORAAL:ROC in the CORAAL Explorer!

CORAAL: South Georgia

CORAAL:SGA consists of 13 primary speakers across 13 recordings, collected in 2017 and 2018 by Minnie Quartey for CORAAL. Located in southern Georgia, Valdosta has a population of over 56 thousand residents, over 51% of whom are African American. Since the early nineteenth century, Valdosta has been home to a large African American population, particularly since the Civil War where it served as a place of refuge for many fleeing other parts of Georgia where more battles were being fought. We have attempted to capture and include in the metadata broad information about speakers’ demographic backgrounds, but leave questions of interpretation up to end-users. These recordings will be publicly available through CORAAL.

Quartey, Minnie, Charlie Farrington, Tyler Kendall, Chloe Tacata, and Lucas Jenson. Forthcoming. The Corpus of Regional African American Language: SGA (South Georgia 2018). Version TBD. Eugene, OR: The Online Resources for African American Language Project.

Coming soon to CORAAL!

CORAAL Supplements highlight newly available recordings and selections from larger datasets important to the field of sociolinguistics prepared and curated by the CORAAL team.

For other data resources, please visit ORAAL Resources External Data Sources. This list brings together contextualized links to other data readily available from other resources, such as oral histories and other digitized datasets.

Seattle 1972 (Elaine Tarone Tapes)

This collection consists of several recordings from Dr. Elaine Tarone's (1972) dissertation fieldwork, where she recorded groups of white and black speakers from Seattle to investigate intonation patterns. These recorded were digitized by members of the Linguistics Lab at NC State University in 2017, and will be available exclusively here.

Tarone, Elaine. 1972. Aspects of intonation in vernacular White and Black English speech. Ph.D. dissertation. University of Washington. <https://eric.ed.gov/?id=ED091923>

Tarone, Elaine. 1973. Aspects of intonation in Black English. American Speech 48.1/2: 29-36.

Coming soon to ORAAL! For more information on the recordings from the CORAAL team, please visit CORAAL Seattle 1972.

Washington DC 1966 (Bengt Loman Tapes)

The first CORAAL Supplment represents important data in the study of prosody and intonation, DC1966, recordings from Bengt Loman's (1967) Conversations in a Negro American Dialect. The sample of recordings, which were made available by Loman at the release of his edited volume by the Center for Applied Linguistics, includes 11 samples totaling approximately 23:30.The digitized tape is from Walt Wolfram's collection, and is available exclusively here!

Loman, Bengt. 1967a. Conversations in a Negro American Dialect. Washington, D.C.: Center for Applied Linguistics. <https://eric.ed.gov/?id=ED013455>

Loman, Bengt. 1967b. Intonation patterns in a Negro American dialect: A preliminary report. Washington, D.C.: Center for Applied Linguistics. Unpublished manuscript.

Available now! For more information about the recording subset and downloads, please visit CORAAL Washington DC 1966!

Back to top of page

African American Writers 1892-1912 (AAW) Corpus

From the Corpus Linguistics in Context (CLiC): "The CLiC Fiction project team (Professor Michaela Mahlberg and Research Fellow Viola Wiegand) have compiled this corpus in collaboration with Nicholas J. Rosato and Claiborne Rice of the University of Louisiana at Lafayette, who had the idea for this corpus and prepared the text files." (Source) This corpus contains novels by Charles W. Chesnutt, Paul Laurence Dunbar, Sutton E. Griggs, Frances E.W. Harper, and James Weldon Johnson.

Information from the University of Birmingham's Centre for Corpus Research, and available through the CLiC web app.

American English Dialect Recordings: The Center for Applied Linguistics

The American English Dialect Recordings collection, available online through the Library of Congress, contains 118 hours of recordings documenting North American English dialects. These include a variety of speech styles, including linguistic interviews, oral histories, conversations, and excerpts from public speeches. They were originally collected as part of Donna Christian's "A Survey and Collection of American English Dialect Recordings" funded by the Center for Applied Linguistics and the National Endowment for the Humanities.

The following page includes links and demographic information about all of the African American speech samples included in the collection.

For access to the African American speakers available in this collection, please visit Charlie Farrington's American English Dialect Recordings page on ORAAL! Here you will find direct links to Library of Congress pages, which contains both mp3 and wav files.

Donna Christian's American English Speech Recordings: A Guide to Collections (Available via ERIC) is a directory of collections of audio recordings. This extensive directory gives a state-by-state breakdown of recordings. Published in 1986, this gives background information to the audio available through the CAL digitized collection.

Asheville 1974

Asheville 1974 is a dataset of sociolinguistic interviews collected by Ronald Butters in Asheville, North Carolina. Forty six recordings were made between May 1974 and August 1974. There are nineteen African American speakers, who are born between 1889 and 1960. Please note that there are young males and older females in the dataset. Located in Western North Carolina's Blue Ridge Mountains, Asheville's population in 1970 was nearly 58,000, and it was home to the Allen School, a notable private school in North Carolina that served black students during segregation in the public school system. These recordings are password protected, but can be accessed through the Sociolinguistic Archive and Analysis Project.

Butters, Ronald. 1981. Unstressed Vowels in Appalachian English. American Speech 56.2: 104-110.

Visit Asheville 1974 on OLAC for access information.

Behind the Veil: Documenting African American Life in the Jim Crow South

Duke University's "The Behind the Veil Oral History Project" was undertaken in 1993 to 1995, funded by the National Endowment for the Humanities. From Duke: "The primary purpose of this documentary project was to record and preserve the living memory of African American life during the age of legal segregation in the American South, from the 1890s to the 1950s". "Four hundred and ten of the 1,260 interviews have been digitized and made available on this site totaling about 725 hours of recorded audio. One hundred and sixty five of hte interviews include transcripts comprising more than 15,000 pages of text." The digitized recordings are available freely online, but lossless (e.g., .WAV) recordings can be requested and made available to researchers.

Behind the Veil: Documenting African American Life in the Jim Crow South Digital Collection, John Hope Franklin Research Center, Duke University Libraries.

This digitized dataset is available through Duke University Libraries. For information about the larger collection, visit the Collection Guide from the Rubenstein Library at Duke.

Detroit Dialect Study

The Detroit Dialect Study (DDS) was a large-scale study of the linguistic correlates of social stratification, done in

The newly digitized recordings will be available via the Sociolinguistic Archive and Analysis Project (SLAAP).


Dictionary of American Regional English

Between 1965 and 1970, fieldworkers for The Dictionary of American Regional English (DARE) conducted interviews with nearly 3000 informants all across the country. The resulting recordings, which consist of both conversational interviews and the Arthur the Rat reading passage, were digitized, anonymized, and made available through the University of Wisconsin-Madison Libraries Digital Collections.

The link below includes links and demographic information about all of the African American speech samples included in DARE.

For access to the African American speakers available on DARE, please visit Charlie Farrington's DARE page on ORAAL! Here you will find direct links to the UW page as well as extensive demographic information.

DARE interview files are available in mp3 format from the University of Wisconsin Library. Download CORAAL from CORAAL.

Oregon African American Railroad Porters: Oral History Collection

This oral history collection consists of 20 primary speakers across 30 recordings, two of which are from a Senior Citizens Association Meeting and include multiple unknown speakers. While the African American population in Oregon is quite small, the economic impact the community has had on Oregon has been substantial through the railroad system. These recordings were conducted in the 1980’s, with a brief description of the interview's topics provided. Audio files and transcripts are available through the Oregon State University Collections and Archives Research Center.

Oregon State University. 2016. African American Railroad Porters Oral History Collection (OH 29). Corvallis, OR: Oregon State University Special Collections and Archives Research Center.

Available through the Oregon State University Libraries.

Roanoke Island

Roanoke Island, North Carolina recordings were made in 2003 by members of the Language and Life Project. There are 35 African American speakers in the SLAAP Roanoke Island project, including twenty males and fifteen females, born between 1921 and 1991.

From The Language and Life Project: "Roanoke Island, a thirteen-mile island in the Croatan and Roanoke Sounds, located between the Outer Banks and the mainland coast of North Carolina, is well known as the site of the Lost Colony, where the first settlement of British colonists disappeared in 1587. The untold story of Roanoke Island, however, is its role in the development of Outer Banks African American speech. During the Civil War, the 1862 Battle of Roanoke Island ended Confederate resistance along the Outer Banks and stripped the Confederate army of one of their most vibrant maritime routes for provisions. This Union victory also escalated an influx of freed and runaway slaves to the North Carolina coast and compelled the creation of the Freedmen's Colony of Roanoke Island. The goal of this Freedmen's Colony was to establish a self-sufficient African American community. By the end of the Civil War, the population of the Freedmen's Colony of Roanoke Island was nearly 3500. Unfortunately, the colony was disbanded when the former residents of the island demanded their land back after the war. Following the forced disintegration of the Freedmen's colony, about 300 African American residents remained on Roanoke Island. Many of the approximately 250 current African American residents of Roanoke Island can trace their ancestry back to these people who remained on the island from the Freedmen's Colony."

Carpenter, Jeannine. 2004. The lost community of the Outer Banks: African American speech on Roanoke Island. Master's thesis. Raleigh: North Carolina State University.

Visit Roanoke Island on OLAC for access information.

Robeson County

The Language and Life Project conducted interviews with over a hundred Robeson County, North Carolina speakers. There are 23 recordings of African American speakers in the SLAAP archive, including six males and seventeen females. Robeson County is a tri-ethnic community located on Interstate 95 near the South Carolina border. Native Americans comprise approximately 40 percent of the county population, African Americans 25 percent, and Anglo Americans the remaining 35 percent.

From The Language and Life Project: "According to historical records, early Anglo settlers from the Scottish Highlands, some of whom were Gaelic speakers, found the Lumbee, the Native-American group within the county, speaking English when they arrived in the Robeson County area in the 1730s. A group of African Americans, including both runaway and free slaves, was also scattered in the region at the time, so that the three ethnic groups have lived in this region for almost three centuries. The ethnic relations of the three groups have shifted through time in response to various sociopolitical events, including the desegregation of county school in the early 1970s. Despite some increase in intercommunication among the three ethnicities, ethnic boundaries remain strong; and Robeson County in large part continues to exist in a state of de facto segregation into three ethnic communities."

Wolfram, Walt, and Clare J. Dannenberg. 1999. Dialect identity in a triethnic context: The case of Lumbee American Indian English. English World-Wide, 20.: 79-116.

Visit Robeson County on OLAC for access information.

Rochester Voices

Rochester Voices is the home of the Phillis Wheatley Public Library Oral History Collection.

Recordings are available to stream at www.rochestervoices.org, and WAV files of recordings are available upon request.

Voices Remembering Slavery: Freed People Tell Their Stories

This collection consists of 23 primary speakers across 63 recordings. The age of the speakers range from 1842 to 1882, with the majority being over 80 years old. These recordings were taken between the years 1932 and 1975, and across 9 states.The individuals discuss their life of enslavement and life after freedom, and some sing songs that they learned as slaves. Unfortunately, not all of the recordings are clearly audible due to background noise and microphone issues. Some time-aligned transcripts of these recordings are available through the Sociolinguistic Archive and Analysis Project.

Bailey, Guy, Natalie Maynor, and Patricia Cukor-Avila (eds.). 1991. The Emergence of Black English: Text and Commentary. Amsterdam/Philadelphia: John Benjamins. https://doi.org/10.1075/cll.8

For more information about speakers, recordings, and teaching resources, please visit Voices Remembering Slavery on the Library of Congress. For information on TextGrid available on SLAAP, please contact us.

Wilmington 1973

Wilmington 1973 consists of 79 individual interviews across 79 recordings. This collection was collected by Ronald Butters in Wilmington, North Carolina between July 1973 and August 1974, and includes speakers born between 1893 and 1957. There are 31 African American participants, including 20 males and 11 females. Located on the southern North Carolina coast, sitting along the Cape Fear River, Wilmington's 1970 population was just over 46,000, and was 34.3% African American. These recordings can be accessed through the Sociolinguistic Archive and Analysis Project.

Butters, Ronald R. and Ruth A. Nix. 1986. The English of Blacks in Wilmington, NC. In Michael B. Montgomery and Guy Bailey (eds.), Language Variety in the South: Perspectives in Black and White. Tuscaloosa: University of Alabama Press, 254-263.

Thomas, Erik R. 1989. The implications of /o/ fronting in Wilmington, North Carolina. American Speech 64: 327-333.

Visit Wilmington 1973 on OLAC for access information.

Back to top of page

Charlie Farrington, May 2020 (last update)