How difficult is language learning? And to explain it?
Language is a vertiginously complex symbolic system, and yet humans all learn it, effortlessly, within a remarkably short period of time. How do they do this?
We just have language
As a PhD student, the question made me shrug my shoulders, and it still provokes some resistance at parties: “Huhh, is that what you use our tax money for? Can’t you find out some useful stuff? … an innovative fun and easy educational method to teach statistics to law and social science students?...Or, what’s going on in criminals’ brains?” Here, I would point to science funders’ strong commitment to knowledge valorization to reassure my fellow partygoers… But I can see their point: The intuition that we just have language is so inevitable that the scientific study of language learning seems like dedicating a whole discipline and half a century of research to the question of how we learn to ride a bike.. or to drink a glass of beer.
Who do you think has stolen the banana?
So what excites psycholinguists so much about this question? Here is our answer. Children don’t make mistakes with a sentence like: “Who do you think has stolen the banana?”, even if they have never heard that sentence before. They do not believe, for example, that the speaker is accusing them of having stolen the banana. This immediate and errorless understanding is not trivial, because the word you is closer to the word stolen than the word who, so linking you with stolen would be a straightforward, simple way to make sense of this stream of words. Moreover, children’s capability to parse the sentence cannot be explained by their knowledge and experience with language. The database of language input heard in a few years is much too small to derive the rules of grammar, as Chomsky pointed out 50 years ago.
Computers
Compare children to computers: These machines have a virtually infinite database of language at their disposal; we can feed them with the full corpus of Google in hundreds of languages. Yet, anybody who has ever communicated with a speech recognition computer system (like Siri or the Railways Time Schedule Information Line) knows how hopelessly poor they are at understanding. Ask Siri “Who do you think has stolen the banana?” and depending on your intonation, her answer will be either “I am not sure I understand?”, “I think therefore I am, but let us not put Descartes before the horse”, or “I really cannot say” . Notice that we cannot excuse Siri on the grounds that it does not know the context. The only thing it has to do is apply grammar to this particular sentence, i.e., identify the two subject-verb-object parts. Computers have an infinite knowledge of language, and no understanding; children have very little knowledge of language, and excellent understanding.
Simple sentences
Together with colleagues from Cornell University, Cambridge University in the UK, Georgia State University and Erasmus University of Rotterdam, we carried out a series of 6 experiments (including one online study) to unravel what is behind children’s fabulous parsing skills. We did not look at actual children: we created artificial miniature languages that had the same grammatical structure as the ‘banana’ sentence, and exposed hundreds of participants (from the Netherlands and the US) to hundreds of sentences in these languages. Then, we tested their parsing skills. We thus mapped the eight years of children’s language acquisition period onto a one-hour lab session. The main finding was as simple as it was remarkable: those participants who were ‘fed’ with language input that was ordered according to increasing complexity – i.e., starting small and simple, with short one-phrase sentences, and ending up complex, with sentences containing multiple phrases – could learn the system, while those learners who had seen the same sentences in random order could not. In 2011, we already demonstrated that a training set of sentences with complex sentences only was insufficient to induce any grammar from (Lai & Poletiek, 2011). So grammar learners need to hear simple sentences, and they need them at the start of a training that grows gradually in complexity.
Scientifically
What do we learn from this scientifically? When faced with complex sentences, the Starting Small group identified the building blocks –simple sentences- they had been exposed to previously. In the same way, children hearing the banana sentence recognize the separate simple building blocks: “what do you think?” and “who has stolen the [something]?”. Crucially, it is children’s ability to selectively retrieve the parts of speech from memory as simple events encountered previously, that drives the parse. Luckily, their database of sentences is small. If the database were Google-sized, it would be like looking for a needle in a haystack. The parser might retrieve the wrong strings, like “who do” and “think has stolen the banana…”, as there would just be too many building blocks available in the gigantic memory store. And that is exactly where things go wrong for computers: They cannot make a meaningful choice. So computers fail because they cannot retrieve selectively. We would need to tell them what to retrieve, or searching the database takes ages and ends in nonsense. Here is the issue: Computers have no Consciousness that decides about what is useful to retrieve. They are shy of an ‘I’. And… no I, no parsing.
Practically
What do we learn from this practically? That watching television may not be the best way for your young child to learn language. Rather, talking and focusing attention. Note, long sentences are no problem (as we found in our study), as long as the structure is simple. So: Cuty baby boy has a fluffy softy little sleepy teddy bear. But not: Baby boy that needs a sleep has a cuty teddy bear.
Poletiek, F.H., Conway, C.M., Ellefson, M.E., Lai, J., Bocanegra, B.R., & Christiansen, M.E. (2018). Under What Conditions Can Recursion be Learned? Effects of Starting Small in Artificial Grammar Learning of Recursive Structure. Cognitive Science, doi:10.1111/cogs.12685. Early Online.