Indus Script
© Harappa Archaeological Research Project, in Yadav et al 2010 PLOS OneA typical seal of the Indus Valley Civilization, containing undeciphered signs.
Today, when we've unlocked the secrets of Egyptian hieroglyphs, Maya writing and hosts of far lesser known scripts, it seems as though there's nothing left for enterprising linguists. Fear not, for there are actually a number of ancient writing systems still to be cracked by archaeologists. They include texts of the Olmec and Zapotec (Mesoamerican cultures preceding the Classic Maya), Proto-Elamite (writings of the earliest civilization of present-day Iran) and Rongorongo of Easter Island.

But if it's fame you're after (as well as intense scrutiny and even death threats) there's no better challenge than the symbols of the Indus Valley Civilization, which flourished some 4,000 years ago in present-day Pakistan and northwest India.

From this culture, archaeologists have recovered several thousand short inscriptions, most with just 4 or 5 signs. There is no consensus on how to read them, although dozens of speculative decipherments have been proposed over the past century.

Complicating efforts, the underlying language the script is tied to is disputed, and there are complex modern-day political ramifications to the question. Rival ethnic groups claim to descend from this once-great civilization and knowing its language would help cement cultural ties. Hence the reported threats to scholars immersed in the matter.

Furthermore, some researchers go so far as to deny the existence of an underlying language. That is, they argue the Indus inscriptions were not true writing - visible signs that unambiguously represent speech - but an alternate symbolic system similar to emblems, conveying more general meanings.

Despite naysayers and challenges, decipherment efforts have progressed in the past decade, thanks to better databases of texts and new computational methods for finding patterns among the signs. Here's what we know, for now.

That Lesser-known Great Civilization

Ruins of Mohenjo-daro
© Robinson, Cracking the Indus Script, Nature, vol 526, 2015Ruins of Mohenjo-daro, one of the largest cities of the Indus Valley Civilization.
4,000 years ago the Indus Valley civilization held an estimated one million people spread over a Texas-sized region, twice the area of contemporary Egypt or Mesopotamia. Its largest excavated cities, Harappa and Mohenjo-daro, exhibit levels urban planning that rival modern standards, including grid-like streets, water management and the oldest toilets. Yet there's no suggestion of royal, religious or military might - no grand palaces, temples or defensive fortifications. And after flourishing between 1900-2600 BC, it's unclear what happened to the people, or if any populations today can count themselves as their descendants.

One reason archaeologists, and average people, don't know much about the Indus, is that it was only discovered in the 1920s. Since then, researchers have identified more than 1,000 settlements, which from the surface appear to belong to the culture. But less than 10 percent have been systematically excavated, due in part to unrest along the India-Pakistan border.

Another reason the Indus is elusive: that undeciphered script.

The Indus Inscriptions

Indus Valley script
© Rao et al, A Markov Model of the Indus Script, PNAS vol 106, 2009Examples of short inscriptions on seals and tablets from the Indus.
Several thousand Indus texts have been discovered, mostly from Harappa and Mohenjo-daro, but also in far-flung lands of trading partners along the Persian Gulf and in Mesopotamia (and it's probable the Indus were exposed to the idea of writing by these literate Mesopotamians). The majority are engraved on small stone seals, about one inch squared, above the image of an animal, such as a bull, elephant or unicorn-like creature. Fewer inscriptions are found on clay tablets, pottery and metal objects.

With an average of just 4 or 5 signs, the brevity of most inscriptions poses a challenge for decipherment efforts. It's also among the reasons that some scholars argue these characters are not true writing. Most other civilizations with a writing system have left examples that are hundreds of characters long. The longest example of Indus script, by contrast, is less than 30 characters.

Since 2004, there's even been a standing $10,000 prize for anyone who discovers an Indus text over 50 characters, offered by an anonymous donor and valid through the lifetime of historian Steve Farmer, a vocal opponent of the view that the Indus civilization was literate.

Tallying all the characters appearing on all known texts, researchers count between 400 and 700 distinct Indus signs. In part, their estimates differ because of subjectivity in judging how much variation is permissible for a single sign. For instance, my handwritten "a" probably looks different than your "a," but they are the same character. Regardless, having several hundred characters suggests the script - if it was writing - was likely logosyllabic, meaning signs represented full words as well as syllabic sounds. Other logosyllabic systems we've deciphered include Mesopotamian cuneiform (~600 signs) and Mayan glyphs (~800 signs).

How to Read Long-lost Scripts

Scholars have deciphered many extinct writing systems, such as Egyptian hieroglyphs, Mesopotamian cuneiform and, most recently, a considerable portion of Maya glyphs. Aside from the short inscriptions, why does Indus give us so much trouble?

Successful decipherment efforts have followed similar courses (Part 3). Researchers cataloged the possible characters and their variations to infer the nature of the system - alphabetic, syllabic, logographic, etc. Then they found patterns in the distribution and frequency of signs. For instance, some characters may commonly occur at the beginning of lines or others may usually cluster together.

Though there's some disagreement, we're probably at that point for the Indus script. But serious decipherment breakthroughs have relied on three key elements so far absent from the Indus corpus:

1) Proper names, such as kings or cities, known from records of contemporaneous cultures. During the process of deciphering Egyptian hieroglyphs, scholars benefited from the mention of rulers like Ptolemy and Cleopatra in ancient Greek texts, understood at the time. As for the Indus, we don't know any historical figures or certain place names.

2) A bi- or trilingual inscription, which records the same text in both known and unknown writing systems. For Egypt, that was the famous Rosetta stone, a fractured slab transcribing a priestly decree in two Egyptian scripts and ancient Greek. No such thing has been found for the Indus.

3) The language the script transcribes. For Egypt, successful translators correctly reasoned that hieroglyphs represented Coptic, a language still used by the Egyptian Coptic Church. And indigenous people of Mesoamerica continue to speak the words of Maya glyphs.

But the actual identity of the Indus language (or languages) is contested and clouded by modern politics. Presently, many scholars (here, here) argue for an ancient form of Dravidian, a family of languages found today in mostly southern India, but also pockets of northern India and Pakistan, near the heart of the Indus Valley Civilization. Alternatively, some favor an Indo-European language, related to ancient Sanskrit, which supports Hindu nationalist claims to the culture. Still others propose different indigenous language families, like Munda, or no language at all.

Where We Stand

Graph of Indus inscriptions
© Rao, Probabilistic Analysis of an Ancient Undeciphered Script, Computer, April 2010The degree of disorder in different sequences. Indus inscriptions fall near writing systems, between DNA (top) and computer code (bottom).
As early as 1966, archaeologist Shri B. B. Lal concluded the texts were normally read from right to left. But, as Indus scholar Bryan K. Wells wrote in 2015, that is "about the only fact that most researchers can agree on" (page 7). This conclusion is based on spacing of characters: rightmost signs are aligned comfortably at the edge, whereas leftmost signs hang, get squeezed or pushed lower.

For decades, researchers have used statistical analyses to show that certain signs often cluster together, suggesting words and/or word-order (what we would call syntax) exist in the texts (here, here, here) - an important counter to claims that Indus signs are not true writing. More recently, computer scientists have reinvigorated efforts. One approach analyzes how random or predictable the order of signs is within a text. By this measure, known as conditional entropy, Indus inscriptions appear like known writing systems, which fall between highly ordered sequences like computer code and disordered ones like DNA code. Other methods using statistics and probability theory have brought similar conclusions: Indus inscriptions exhibit a degree of predictability characteristic of true writing.

Reading that putative writing will take future research. Ancient DNA may soon shed light on the ancestry of the Indus people, providing clues about their language. And there's always hope that future excavations will uncover more informative texts, a Rosetta stone of the Indus.