„The basic problem of studying the origins of language is, to understate matters, language leaves few fossils.“ – Edmund Blair Bolles.

In „The Singing Neanderthals“ Steven Mithen, professor of Archaeology at the University in Reading, summarizes his views of the co-evolution of music and language in the history of our species. Drawing evidence from many areas such as anthropology, psychology, neuroscience and musicology, he asserts that music is not only a byproduct of language with no evolutionary value in itself as stated by Steven Pinker for instance. More than that Mithen introduces a hypothetical proto-music/language that was holistic (not composed of segmented elements), manipulative (influencing emotional states and hence behavior of oneself and others), multimodal (using both sound and movement), musical (temporally controlled, rhythmic, and melodic), and mimetic (utilizing sound symbolism and gesture) – a musicking that he calls ‚Hmmmm’ as an abbreviation of the before-mentioned communication modes. These holistic utterances, each with its own meaning but lacking any meaningful sub-units (that is to say, words) were used to manipulate other individuals, as commands, threats, greetings and requests. They would have been as much music-like as language-like. According to this theory „modern language only evolved when holistic utterances were ‚segmented’ to produce words, which could then be composed together to create statements with novel meanings.“

Here is a brief summary of Mithen’s hypothesis in form of a collage of key citations taken from his book:

„Music and language are universal features of human society. They are hierachical, combinatorial systems which involve expressive phrasing and are reliant on rules that provide recursion and generate an infinite number of expressions from a finite set of elements. Both communication systems involve gesture and body movement. They provided the human mind to switch from a ‚domain-specific’ to a ‚cognitively fluid’ mentality that was only attributed to Homo sapiens alone. Cognitive fluidity refers to the combination of knowledge and ways of thinking from different mental modules, which enables the use of metaphor and producing creative imagination.“

Mithen stresses the role bipedalism has played in the evolutionary development of the homo family:

„Both the multi-modal and the musical aspects of such utterances would have been greatly enhanced by the evolution of bipedalism. Bipedalism required the evolution of mental mechanisms to maintain the rhythmic coordination of muscle groups. As our ancestors evolved into bipedal humans so, too, would their inherent musical abilities evolve – they got rhythm. The new degrees of motor control, independence of torso and arms from legs, and internal and uncouscious time-keeping abilities, would all have dramatically enhanced the potential for gesture and body language in Homo ergaster, hugely expanding the existing potential for holistic communication. This would have added to vocalization an invaluable means of expressing and inducing emotions, and manipulate behaviour.“

„Bipedalism requires a relatively narrow pelvis and hence puts a severe constraint on the width of the birth canal. To be born at all through the narrow bipedal pelvis, infants effectively had to be born premature, leaving them almost entirely helpless for their first eighteen month of life. Thus creating selective pressures for the development of vocal and gestural mother-infant interactions, which would have been of a music-like nature.“

„Music-making had considerable survival value as a means of communicating emotions, intentions and information and therefore facilitated cooperation, that is: the sharing of information and resources, working as a team during a hunt, caring for each other’s well-being, advertising and consolidating pair-bonding. In all known societies music-making is frequently, if not always, a group activity.“

Then, Mithen speculates on the transition from a holistic communication system to a referential language:

„Alison Wray uses the term ‚segmentation’ to describe the process whereby humans began to break up holistic phrases into separate units, each of which had its own referential meaning and could then be recombined with units from other utterances to create an infinite array of new utterances. This is the emergence of compositionality, the feature that makes language so much more powerful than any other communication system.“

„Simon Kirby of Edinburgh University is one of several linguists who have begun to explore the evolution of language using computer simulation models. He was able to simulate how children acquire language simply by listening to their parents, siblings and other language-users. In his simulations he gave each speaking-agent a ‚random language’, which is in fact a holistic language, and as the simulation runs, learning-agents are exposed to a sample of speaking-agents and by this means acquire a language by their own. Because they will only ever have heard a sample of the utterances of any single speaking-agent, their language will be unlike that of any other individual. As the simulation proceeds, Kirby finds that some parts of the language systems become stabilized and are passed on faithfully from one generation to the next. A learning-agent mistakenly infers some form of non-random behaviour in a speaking-agent indicating a recurrent association between a symbol string and a meaning, and then uses this association to produce its own utterances, which are now genuinely non-random. Kirby refers to this process as ‚generalization’. Other learning-agents will acquire the same association between the symbol string and its meaning, so that it spreads throughout the population and, eventually, the whole language system will have been stabilized and will constitute a single, compositional language. With his work, Kirby challenges Noam Chomsky’s argument that children are born with an innate language abilities, something he called ‚universal grammar’. Instead Kirby’s simulations show that the process of learning itself can lead to the emmergence of grammatical structures.“

„The transition from a predominantly ‚Hmmmmm’ communication system to a compositional language most likely took tens of thousands of years. Some communities may have continued primarily with ‚Hmmmmm’ for much longer than others; some individuals who had become proficient language-users may have died before their knowledge was passed on, but finally compositional language emerged from ‚Hmmmmm’ and changed the nature of human thought and set our species on a path that led to global colonization and, ultimately, the end of the hunting and gathering way of life that had endured ever since the first species of Homo appeared more than 2 million years ago.“

 Well, from the latest spectacular fossil findings, Ardipithecus ramidus or short „Ardi“ being about 4.4 million years old, it is estimated that the homo lineage is much older than it was recently assumend. But Mithen admits that Archaeology is always coming up with new pieces of a broader puzzle and that human history has to be rewritten over and over again. But since fossils don’t say much about the language and music of our ancestors, much of the theorizing about the origin of language must remain highly speculative and that one of the few weak spots of Mithen’s endeavour: there is too much could-be and might-be in the text and some conclusions appear highly speculative. I also think there is a lack of ethnomusicological background that would have provided a broader, non-western perspective, but doubtlessly this book is a great starting point to dive into the different academic controversies about the evolution and origin of language and music.