Moving forward, I believe that good science into language will be quantitative and interdisciplinary. Currently, a diverse set of fields have a stake in explaining human language: linguistics, cognitive science/psychology, psycholinguistics, computational linguistics, corpus linguistics, and the emerging digital humanities. Quantitative methods will facilitate productive collaborative work between these disciplines. And each of these disciplines have much to offer the others.
The contribution of each, I see as the following:
Linguistics: Most language research (including my own) is still done on English. Sometimes, this research is followed by other colonial languages and a few non Indo-European languages, and very rarely it is followed by rarer languages. This is a pragmatic compromise (we need data), but claims about “big L” language need to be tested against as many languages as possible for our task to be done. Linguistics holds the most knowledge about languages other than English, and reminds us to be tentative in our generalizations.
Psychology: Psychology has been limited by the confines of the laboratory, the undergraduate subject pool, and the difficulty of obtaining natural observational data. “Big data” in the form of language corpora offers psychology escape from the laboratory, the ability to study the population (or at least, the population of internet users or other contributors to corpora), at least where psychological theories make predictions about language use.
Psycholinguistics: Similarly to linguistics, this discipline provides much needed validity to language research. Linguistic theories should be psychologically plausible. Robust understanding of language requires us to determine exactly how language is represented in the brain. Psycholinguistics provides much needed experimental interrogation of linguistic theories.
Computational Linguistics: Gives us the tools, and helps us refine them. Allows us to handle big data. Computational linguistics sometimes dispenses with validity for the sake of computational efficiency or simplicity (e.g. bag of words models), and computational linguistics should continue to collaborate with others on developing psychologically realistic and valid models of language.
Corpus Linguistics: Corpora are best when representative, carefully constructed, and approached at the right level of granularity. Corpus linguists are often more interested in interpretable results than computational linguistics/machine learning, and as scientists we want to be able to make sense of our data.
Digital Humanities: Quantitative methods of language research are potentially applicable to research questions in the humanities. Some humanists see this, and moving forward collaborations between humanists and social scientists should be encouraged.
This is a mutually beneficial relationship. As we better understand links between psychology and language use, we can apply this knowledge to questions about authors, specific texts, whole genres etc. Importantly, unlike experiments, corpora can be from any time or location for which we can wrangle the data. We could use a historical corpus to, in effect, conduct a psychology experiment in the past!
However, humanists are not accustomed to using quantification to answer questions about human behaviour. The wealth of knowledge (e.g. constructs, different forms of validity) in the social sciences should be shared with digital humanists moving forward.