Tuesday, August 6, 2013

tocharian, the most important extinct language family you never heard of

Probably the most forgotten branch of the massive Indo-European family is Tocharian. Unlike Germanic, Indo-Iranian, or Greek branches, you likely have never heard of the Tochars unless you had previously read up on Indo-European studies. It is comprised of three separate languages, which we have conveniently labeled Tocharian A, B, and C.

When the Tochars, and their writings, were discovered in Western China in the 20th century, they overturned much of what we knew about the Indo-Europeans. Here's an example. Before the Tocharian discovery, we noticed that some phonological features can be found in only western or eastern IE languages but not shared between each other. The numeral for 100 in western languages, for instance, had a hard /k/ sound (English had a hard /k/ sound, but it softened to an aspirated /x/ - the "h" sound in hundred) while eastern languages had an /s/ or some variant. We called this particular division the "Centum-Satem Split." Centum is an example from a k-language, Latin (pronounced something like kentum), and Satem is an example from an s-language, Sanskrit. The C-S Split was particularly tricky because we did not know which sound, /k/ or /s/, was the original. 

Enter Tocharian.

Pictured above is a map of Indo-European's subset language families. The red and orange boxes represent satem languages and the blue box represents centum languages. What's Tocharian doing way over there? If Proto-Indo-European split into an east-west dichotomy early on, then did the Tochars make a long migration from Europe to China? 

The answer is simpler. Based on this isogloss and several other isoglosses, it is now hypothesized that the Proto-Indo-European language was spoken by a fairly large nation or nations. The Indo-Iranian, Balto-Slavic, Greek, and Armenian families descend from speakers that were located in the center of these nations while Germanic, Celtic, and Tocharian speakers descend from peripheries - an outer ring of sorts. A /k/ word for 100 was used initially, but a mutation in the sound (/k/ --> /s/) emerged. The trend was catching on enough that interior speakers picked up the habit permanently, but the evolution never caught on in outer-ring languages which had already begun to drift and migrate away.

So now we have our answer (granted, later information may change our opinion, but subsequent data seems to only confirm our hypothesis). The Centum Languages speak with the older word-form while the Satem Languages represent an innovation. We couldn't have known this without the discovery of Tocharian A, B, and C.

As a final goodbye, I will leave you with the trail of 100 from Proto-Indo-European to English. The asterisk indicates that the word was reconstructed and not directly attested in writing. It has no phonetic value.

5000 BCE: *dkmtom (the dkm- cluster is a zero-root, meaning vowel-less, of dekam- "ten")
5000 - 3000 BCE: *kmtom (the d- is snipped off the front in later Proto-Indo-European; the Centum word we just discussed)
2000 BCE: *hundam (Proto-Germanic has softened k to h; the vowel m in PIE becomes an n in the stressed position and a vowel appears before it; o becomes a schwa)
2000 BCE?: *hunda-ratha "120" (Proto-Germanic makes a new word by adding ratha "reckoning" to the end of 100 to fashion a new number)
0 CE: hundrath "100," "120" (Proto-West Germanic has now abbreviated the "hundred-reckon" to a shortened form, but because the individual words are mangled, the original meaning is lost. The influence of Christianity spreads the Latin language, including their counting methods, and it confuses the meaning between 100 and 120 among Germanic tribes)
800 CE: hundred "100" (English has hardened the -th into a solid, tapped -d; -a- has weakened to -e- in an unstressed position)

No comments:

Post a Comment