Using corpora for language teaching and assessment in L2 writing: A narrative review

Main Article Content

Ömer Faruk Kaya
Kutay Uzun
Hakan Cangır


Corpora have primarily been used in linguistic research, but they have not yet become a pedagogical mainstay of language teaching and assessment practices. Therefore, this narrative review paper aimed to inform practitioners and researchers by examining the advantages and disadvantages of data-driven learning and exploring the use of corpora in foreign language teaching, particularly in writing. Specifically, the goals of this paper include: (1) elucidating what data-driven learning is and its potential to shape the learning experience, (2) explaining and exemplifying how learner corpora can guide EFL learners with particular attention to academic writing, and (3) providing insights into the indirect uses of corpora in teaching and assessing academic writing in L2. The review has met its objectives by presenting evidence compiled from the results of corpus-related studies and references to the use of corpus in language instruction.


Metrics Loading ...

Article Details

How to Cite
Kaya, Ömer F., Uzun, K., & Cangır, H. (2022). Using corpora for language teaching and assessment in L2 writing: A narrative review. Focus on ELT Journal, 4(3), 46–62.


Ai, H., & Lu, X. (2010, June 8–12). A web-based system for automatic measurement of lexical complexity. Paper presented at the 27th Annual Symposium of the Computer-Assisted Language Consortium (CALICO-10). Amherst, MA.

Ander, S., & Yıldırım, Ö. (2010). Lexical errors in elementary level EFL learners' compositions. Procedia - Social and Behavioral Sciences, 2(2), 5299-5303.

Anthony, L. (2022). AntConc (Version 4.1.3) [Computer Software]. Tokyo, Japan: Waseda University. Available from

Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® v.2. The Journal of Technology, Learning and Assessment, 4(3), 1-30.

Baisa, V., & Suchomel, V. (2014). SkELL: Web interface for English language learning. In A. Horák, & P. Rychlý (Eds.), Proceedings of recent advances in Slavonic natural language processing (pp. 63-70). NPL Publishing Consultants.

Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., & Quirk, R. (1999). Longman grammar of written and spoken English. Longman.

Boulton, A. (2008). DDL: Reaching the parts other teaching can't reach? In A. Frankenburg-García (Eds.), Proceedings of the 8th Teaching and Language Corpora Conference (pp. 38- 44). Associaçao de Estudos e de Investigaçao Científica do ISLA-Lisboa.

Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta‐analysis. Language Learning, 67(2), 348-393.

Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1-22. https://doi: 10.1093/applin/amt018

Cangır, H. (2021). Objective and subjective collocational frequency Association strength measures and EFL teacher intuitions. Pedagogical Linguistics, 2(1), 64-91.

Carter, R., & McCarthy, M. (2006). Cambridge grammar of English. Cambridge University Press.

Chambers, A., & Le Baron, F. (2007). Chambers-le Baron corpus of research articles in French. Oxford Text Archive,

Chang, W. L., & Sun, Y. C. (2009). Scaffolding and web concordancers as support for language learning. Computer Assisted Language Learning, 22(4), 283-302.

Chapelle, C. A., & Plakans, L. (2013). Assessment and testing: Overview. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 240-244). Blackwell/Wiley.

Cobb, T. (1999). Applying constructivism: A test for the learner-as-scientist. Educational Technology Research and Development, 47(3), 15-31.

Cobb, T. (n.d.). Compleat Lex Tutor v.8.5 [Software]. Accessed 17 July 2022 at

Cobb, T., & Boulton, A. (2015). Classroom applications of corpus analysis. In D. Biber & R. Reppen (Eds.), The Cambridge handbook of English corpus linguistics (pp. 478-497). Cambridge University Press.

Collentine, J. (2000). Insights into the construction of grammatical knowledge provided by user-behavior tracking technologies. Language Learning & Technology, 3(2), 44-57.

Corino, E., & Onesti, C. (2019). Data-Driven Learning: A Scaffolding Methodology for CLIL and LSP Teaching and Learning. Frontiers in Education, 4(7), 1-12. feduc.2019.00007

Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press. Accessed 16 July 2022 at

Crossley, S. A., Bradfield, F., & Bustamante, A. (2019). Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. Journal of Writing Research, 11(2), 251-270.

Crosthwaite, P. (2017). Retesting the limits of data-driven learning: feedback and error correction, Computer Assisted Language Learning, 30(6), 447-473.

Crosthwaite, P. (2020). Data-driven learning for the next generation: Corpora and DDL for pre-tertiary learners. Routledge.

Crosthwaite, P., & Cheung, L. (2019). Learning the Language of Dentistry: Disciplinary Corpora in the Teaching of English for Specific Academic Purposes. John Benjamins.

Cushing, S. T. (2017). Corpus linguistics in language testing research. Language Testing, 34(4), 441-449.

De Smet, M. J. R., Leijten, M., & Van Waes, L. (2018). Exploring the process of reading during writing using eye tracking and keystroke logging. Written Communication, 35(4), 411447.

Edmonds, P. (2013). Just The Word. Accessed 17 July 2022 at

Flowerdew, J. (2009). Corpora in Language Teaching. In M. H. Long & C. J. Doughty (Eds.), The handbook of language teaching (pp. 327-350). Wiley-Blackwell.

Flowerdew, L. (2010). Using corpora for writing instruction. In A. O'Keeffe, & M. McCarthy (Eds.). The Routledge handbook of corpus linguistics (pp. 444-457). Routledge.

Flowerdew, L. (2015). Data-driven learning and language learning theories: Whither the twain shall meet. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple affordances of language corpora for data-driven learning (pp. 15–36). John Benjamins.

Frankenberg-Garcia, A., Rees, G., Lew, R., Roberts, J., Sharma, N., & Butcher, P. (2019). ColloCaid: a tool to help academic English writers find the words they need. In F. Meunier (Eds.), CALL and complexity – short papers from EUROCALL 2019 (pp.144–150).

Gilquin, G., & Granger, S. (2022). ‘Using data-driven learning in language teaching’. In A. O’Keeffe, & M. McCarthy (Eds.) The Routledge handbook of corpus linguistics. Second Edition (pp. 430-442). Routledge.

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193-202.

Granger, S. (1994). The Learner Corpus: A revolution in applied linguistics. English Today, 10(3), 25-33.

Granger, S. (2002). A bird's-eye view of learner corpus research. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 3–33). John Benjamins.

Granger, S. (2015). The contribution of learner corpora to reference and instructional materials design. The Cambridge Handbook of Learner Corpus Research, 485-510.

Granger, S., & Meunier, F. (1994). Towards a grammar checker for learners of English. In U. Fries, & G. Tottie (Eds.) Creating and using English language corpora (pp. 79-91). Rodopi.

Higgins, D., Ramineni, C., & Zechner, K. (2015). Learner corpora and automated scoring. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Cambridge handbook of learner corpus research (pp. 567–586). Cambridge University Press.

Hoffmann, S., Evert, S., Smith, N., Lee, D., & Berglund-Prytz, Y. (2008). Corpus linguistics with BNCweb-a practical guide (Vol. 6). Peter Lang.

Huang, K. (2015). More does not mean better: Frequency and accuracy analysis of lexical bundles in Chinese EFL learners’ essay writing. System, 53, 13-23. 2015.06.011

Hymes, D. (1972). On communicative competence. In J. Pride, & J. Holmes (Eds.), Sociolinguistics (pp. 269-285). Penguin Books.

Indrarathne, B., Ratajczak, M., & Kormos, J. (2018). Modelling changes in the cognitive processing of grammar in implicit and explicit learning conditions: Insights from an eye-tracking study. Language Learning, 68(3), 669-708.

Jarvis, S. (2017). Grounding lexical diversity in human judgments. Language Testing, 34(4), 537-553.

Johns, T. (1991). Should you be persuaded: Two examples of data-driven learning. English Language Research Journal, 4, 1-16.

Johns, T. (1997). Contexts: The background, development and trialling of a concordance-based call program. In A. Wichmann, S. Fligelstone, T. McEnery, & G. Knowles (Eds.), Teaching and language corpora (pp. 15-36). Longman.

Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication [Doctoral dissertation, Georgia State University]. ScholarWorks @ Georgia State University.

Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757-786.

Kyle, K., & Crossley, S. A. (2018). Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. Modern Language Journal, 102(2), 333–349.

Landauer, T. K., Laham, D., & Foltz, P. W. (2000). The intelligent essay assessor. IEEE Intelligent Systems, 15, 27-31.

Lee, D. Y., & Swales, J. M. (2006). A corpus-based EAP course for NNS doctoral students: Moving from available specialized corpora to self-compiled corpora. English for Specific Purposes, 25, 56-75.

Lee, H., Warschauer, M., & Lee, J. H. (2019). The effects of corpus use on second language vocabulary learning: A multilevel meta-analysis. Applied Linguistics, 40(5), 721–753.

Summers, D. (2003). Longman dictionary of contemporary English (4th edition). Longman.

Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474-496.

McEnery, A., & Xiao, R. (2005). Help or help to: What do corpora have to say? English Studies, 86(2), 161-187.

McEnery, T., & Xiao, R. (2011). What corpora can offer in language teaching and learning. In E. Hinkel (Ed.) Handbook of research in second language teaching and learning (pp. 364-380). Routledge.

McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. Routledge.

Meunier, F. (2020). A case for constructive alignment in DDL: Rethinking outcomes, practices, and assessment in (data-driven) language learning. In P. Crosthwaite (Ed.), Data-driven learning for the next generation: Corpora and DDL for pre-tertiary learners (pp. 13-30). Routledge.

Naismith, B., Juffs, A., Han, N.-R., & Zheng, D. (2022). Handle it in-house? International Journal of Corpus Linguistics, 27(3), 291–320.

Nesselhauf, N. (2004). Learner corpora: Learner corpora and their potential for language teaching. In J. Sinclair (Ed.), How to use corpora in language teaching (pp. 125-152). John Benjamins.

O'Donnell, M. (2016). UAM corpus tool 3.3f. Retrieved 30 June 2022.

O'Keeffe, A. (2021). Data-driven learning–a call for a broader research gaze. Language Teaching, 54(2), 259-272.

O'Sullivan, Í. (2007). Enhancing a process-oriented approach to literacy and language learning: The role of corpus consultation literacy. ReCALL, 19(3), 269-286.

Page, E. B. (2003). Project Essay Grade: PEG. In M. D. Shermis, & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43-54). Lawrence Erlbaum Associates.

Pérez-Paredes, P., Ordoñana Guillamón, C., van de Vyver, J., Meurice, A., Aguado Jiménez, P., Conole, G., & Sánchez Hernández, P. (2019). Mobile data-driven language learning: Affordances and learners’ perception. System, 84, 145-159.

Römer, U. (2022). Applied corpus linguistics for language acquisition, pedagogy, and beyond. Language Teaching, 55(2), 233-244.

Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric™ essay scoring system. The Journal of Technology, Learning and Assessment, 4(4), 3-21.

Runcie, M. (2002). Oxford collocations dictionary for students of English. Oxford University Press.

Rundell, M. (2009). Macmillan English dictionary online. Macmillan Education. Available at

Satake, Y. (2020). How error types affect the accuracy of L2 error correction with corpus use. Journal of second language writing, 50, 100757.

Schmidt, R. W. (1990). The role of consciousness in second language learning1. Applied linguistics, 11(2), 129-158.

Schmidt, R. W. (2001). Attention. In Robinson, P. (Ed.), Cognition and second language instruction (pp. 3-32). Cambridge University Press.

Silva, T., & Brice, C. (2004). Research in teaching writing. Annual Review of Applied Linguistics, 24, 70-106.

Hanks, P. (1987). Definitions and explanations. In Sinclair, J. (Ed.), Looking up: An account of the COBUILD project in lexical computing and the development of the Collins COBUILD English language dictionary (pp. 116–136). Collins.

Sinclair, J. (1987). Collins COBUILD English Language Dictionary. Collins.

Sinclair, S., & Rockwell, G. (2016). Voyant Tools. Accessed 17 July 2022 at

Smart, J. (2014). The role of guided induction in paper-based data-driven learning. ReCALL, 26, 184-201.

Smith, B. (2012). Eye tracking as a measure of noticing: A study of explicit recasts in SCMC. Language Learning & Technology, 16(3), 53-81.

Stockwell, G. (2007). A review of technology choice for teaching language skills and areas in the CALL literature. ReCALL, 19(2), 105-120.

Timmis, I. (2010). Teachers telling tales: Exploring materials for teaching spoken language. In F. Mishan and A. Chambers (Eds.), Perspectives on language learning materials development. (pp. 63-85). Peter Lang.

Tono, Y. (2019). Coming full circle - From CEFR to CEFR-J and back. CEFR Journal - Research and Practice, JALT, 5-17.

Uzun, K. (2022). Emotional load, formality, informativeness and implicature in relation to L2 writing performance. In Language, Culture, Art and Politics in the Changing World (pp. 19-29). essay, Literatürk Academia.

Vantage Learning. (2007). MY access! ® efficacy report. Vantage Learning. Accessed from https://www.

Vyatkina, N. (2016). Data-Driven learning of collocations: Learner performance, proficiency, and perceptions, Language Learning and Technology, 20(3), 159-79.

Willis, D. (1990). The lexical syllabus. Collins.

Yao, G. (2019). Vocabulary learning through data-driven learning in the context of Spanish as a foreign language. Research in Corpus Linguistics, 7, 18-46.

Zare, J., & Karimpour, S. (2022). Classroom Concordancing and Second Language Motivational Self-System: A Data-Driven Learning Approach. Frontiers in Psychology, 13, Article 841584.

Zare, J., Karimpour, S., & Aqajani Delavar, K. (2022). The impact of concordancing on English learners’ foreign language anxiety and enjoyment: An application of data-driven learning. System, 109, 102891.

Zechner, K., Higgins, D., & Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883-895.