Using corpora for language teaching and assessment in L2 writing: A narrative review
Main Article Content
Corpora have primarily been used in linguistic research, but they have not yet become a pedagogical mainstay of language teaching and assessment practices. Therefore, this narrative review paper aimed to inform practitioners and researchers by examining the advantages and disadvantages of data-driven learning and exploring the use of corpora in foreign language teaching, particularly in writing. Specifically, the goals of this paper include: (1) elucidating what data-driven learning is and its potential to shape the learning experience, (2) explaining and exemplifying how learner corpora can guide EFL learners with particular attention to academic writing, and (3) providing insights into the indirect uses of corpora in teaching and assessing academic writing in L2. The review has met its objectives by presenting evidence compiled from the results of corpus-related studies and references to the use of corpus in language instruction.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Ai, H., & Lu, X. (2010, June 8–12). A web-based system for automatic measurement of lexical complexity. Paper presented at the 27th Annual Symposium of the Computer-Assisted Language Consortium (CALICO-10). Amherst, MA.
Ander, S., & Yıldırım, Ö. (2010). Lexical errors in elementary level EFL learners' compositions. Procedia - Social and Behavioral Sciences, 2(2), 5299-5303. https://doi.org/10.1016/j.sbspro.2010.03.864
Anthony, L. (2022). AntConc (Version 4.1.3) [Computer Software]. Tokyo, Japan: Waseda University. Available from https://www.laurenceanthony.net/software
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® v.2. The Journal of Technology, Learning and Assessment, 4(3), 1-30.
Baisa, V., & Suchomel, V. (2014). SkELL: Web interface for English language learning. In A. Horák, & P. Rychlý (Eds.), Proceedings of recent advances in Slavonic natural language processing (pp. 63-70). NPL Publishing Consultants.
Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., & Quirk, R. (1999). Longman grammar of written and spoken English. Longman.
Boulton, A. (2008). DDL: Reaching the parts other teaching can't reach? In A. Frankenburg-García (Eds.), Proceedings of the 8th Teaching and Language Corpora Conference (pp. 38- 44). Associaçao de Estudos e de Investigaçao Científica do ISLA-Lisboa.
Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta‐analysis. Language Learning, 67(2), 348-393. https://doi.org/10.1111/lang.12224
Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1-22. https://doi: 10.1093/applin/amt018
Cangır, H. (2021). Objective and subjective collocational frequency Association strength measures and EFL teacher intuitions. Pedagogical Linguistics, 2(1), 64-91. https://doi.org/10.1075/pl.20014.can
Carter, R., & McCarthy, M. (2006). Cambridge grammar of English. Cambridge University Press.
Chambers, A., & Le Baron, F. (2007). Chambers-le Baron corpus of research articles in French. Oxford Text Archive, http://hdl.handle.net/20.500.12024/2527.
Chang, W. L., & Sun, Y. C. (2009). Scaffolding and web concordancers as support for language learning. Computer Assisted Language Learning, 22(4), 283-302. https://doi.org/10.1080/09588220903184518
Chapelle, C. A., & Plakans, L. (2013). Assessment and testing: Overview. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 240-244). Blackwell/Wiley.
Cobb, T. (1999). Applying constructivism: A test for the learner-as-scientist. Educational Technology Research and Development, 47(3), 15-31. https://doi.org/10.1007/BF02299631
Cobb, T. (n.d.). Compleat Lex Tutor v.8.5 [Software]. Accessed 17 July 2022 at https://www.lextutor.ca
Cobb, T., & Boulton, A. (2015). Classroom applications of corpus analysis. In D. Biber & R. Reppen (Eds.), The Cambridge handbook of English corpus linguistics (pp. 478-497). Cambridge University Press. https://doi.org/10.1017/CBO9781139764377.027
Collentine, J. (2000). Insights into the construction of grammatical knowledge provided by user-behavior tracking technologies. Language Learning & Technology, 3(2), 44-57. https://doi.org/10125/25072
Corino, E., & Onesti, C. (2019). Data-Driven Learning: A Scaffolding Methodology for CLIL and LSP Teaching and Learning. Frontiers in Education, 4(7), 1-12. https://doi.org/10.3389/ feduc.2019.00007
Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press. Accessed 16 July 2022 at https://rm.coe.int/1680459f97
Crossley, S. A., Bradfield, F., & Bustamante, A. (2019). Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. Journal of Writing Research, 11(2), 251-270. https://doi.org/10.17239/jowr-2019.11.02.01
Crosthwaite, P. (2017). Retesting the limits of data-driven learning: feedback and error correction, Computer Assisted Language Learning, 30(6), 447-473. https://doi.org/10.1080/09588221.2017.1312462
Crosthwaite, P. (2020). Data-driven learning for the next generation: Corpora and DDL for pre-tertiary learners. Routledge. https://doi.org/10.4324/9780429425899
Crosthwaite, P., & Cheung, L. (2019). Learning the Language of Dentistry: Disciplinary Corpora in the Teaching of English for Specific Academic Purposes. John Benjamins. https://doi.org/10.1075/scl.93
Cushing, S. T. (2017). Corpus linguistics in language testing research. Language Testing, 34(4), 441-449. https://doi.org/10.1177/0265532217713044
De Smet, M. J. R., Leijten, M., & Van Waes, L. (2018). Exploring the process of reading during writing using eye tracking and keystroke logging. Written Communication, 35(4), 411447. https://doi.org/10.1177/0741088318788070
Edmonds, P. (2013). Just The Word. Accessed 17 July 2022 at http://www.just-the-word.com/
Flowerdew, J. (2009). Corpora in Language Teaching. In M. H. Long & C. J. Doughty (Eds.), The handbook of language teaching (pp. 327-350). Wiley-Blackwell. https://doi.org/10.1002/9781444315783.ch19
Flowerdew, L. (2010). Using corpora for writing instruction. In A. O'Keeffe, & M. McCarthy (Eds.). The Routledge handbook of corpus linguistics (pp. 444-457). Routledge. https://www.routledgehandbooks.com/doi/10.4324/9780203856949.ch32
Flowerdew, L. (2015). Data-driven learning and language learning theories: Whither the twain shall meet. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple affordances of language corpora for data-driven learning (pp. 15–36). John Benjamins. https://doi.org/10.1075/scl.69.02flo
Frankenberg-Garcia, A., Rees, G., Lew, R., Roberts, J., Sharma, N., & Butcher, P. (2019). ColloCaid: a tool to help academic English writers find the words they need. In F. Meunier (Eds.), CALL and complexity – short papers from EUROCALL 2019 (pp.144–150). https://doi.org/10.14705/rpnet.2019.38.1000
Gilquin, G., & Granger, S. (2022). ‘Using data-driven learning in language teaching’. In A. O’Keeffe, & M. McCarthy (Eds.) The Routledge handbook of corpus linguistics. Second Edition (pp. 430-442). Routledge. https://doi.org/10.4324/9780367076399-30
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193-202. https://doi.org/10.3758/BF03195564
Granger, S. (1994). The Learner Corpus: A revolution in applied linguistics. English Today, 10(3), 25-33. https://doi.org/10.1017/S0266078400007665
Granger, S. (2002). A bird's-eye view of learner corpus research. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 3–33). John Benjamins. https://doi.org/10.1075/lllt.6.04gra
Granger, S. (2015). The contribution of learner corpora to reference and instructional materials design. The Cambridge Handbook of Learner Corpus Research, 485-510. https://doi.org/10.1017/cbo9781139649414.022
Granger, S., & Meunier, F. (1994). Towards a grammar checker for learners of English. In U. Fries, & G. Tottie (Eds.) Creating and using English language corpora (pp. 79-91). Rodopi.
Higgins, D., Ramineni, C., & Zechner, K. (2015). Learner corpora and automated scoring. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Cambridge handbook of learner corpus research (pp. 567–586). Cambridge University Press. https://doi.org/10.1017/CBO9781139649414.026
Hoffmann, S., Evert, S., Smith, N., Lee, D., & Berglund-Prytz, Y. (2008). Corpus linguistics with BNCweb-a practical guide (Vol. 6). Peter Lang.
Huang, K. (2015). More does not mean better: Frequency and accuracy analysis of lexical bundles in Chinese EFL learners’ essay writing. System, 53, 13-23. https://doi.org/10.1016/j.system. 2015.06.011
Hymes, D. (1972). On communicative competence. In J. Pride, & J. Holmes (Eds.), Sociolinguistics (pp. 269-285). Penguin Books.
Indrarathne, B., Ratajczak, M., & Kormos, J. (2018). Modelling changes in the cognitive processing of grammar in implicit and explicit learning conditions: Insights from an eye-tracking study. Language Learning, 68(3), 669-708. https://doi.org/10.1111/lang.12290
Jarvis, S. (2017). Grounding lexical diversity in human judgments. Language Testing, 34(4), 537-553.
Johns, T. (1991). Should you be persuaded: Two examples of data-driven learning. English Language Research Journal, 4, 1-16.
Johns, T. (1997). Contexts: The background, development and trialling of a concordance-based call program. In A. Wichmann, S. Fligelstone, T. McEnery, & G. Knowles (Eds.), Teaching and language corpora (pp. 15-36). Longman.
Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication [Doctoral dissertation, Georgia State University]. ScholarWorks @ Georgia State University. http://scholarworks.gsu.edu/alesl_diss/35
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757-786. https://doi.org/10.1002/tesq.194
Kyle, K., & Crossley, S. A. (2018). Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. Modern Language Journal, 102(2), 333–349. https://doi.org/10.1111/modl.12468
Landauer, T. K., Laham, D., & Foltz, P. W. (2000). The intelligent essay assessor. IEEE Intelligent Systems, 15, 27-31.
Lee, D. Y., & Swales, J. M. (2006). A corpus-based EAP course for NNS doctoral students: Moving from available specialized corpora to self-compiled corpora. English for Specific Purposes, 25, 56-75. https://doi.org/10.1016/j.esp.2005.02.010
Lee, H., Warschauer, M., & Lee, J. H. (2019). The effects of corpus use on second language vocabulary learning: A multilevel meta-analysis. Applied Linguistics, 40(5), 721–753. https://doi.org/10.1093/applin/amy012
Summers, D. (2003). Longman dictionary of contemporary English (4th edition). Longman.
Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474-496. https://doi.org/10.1075/ijcl.15.4.02lu
McEnery, A., & Xiao, R. (2005). Help or help to: What do corpora have to say? English Studies, 86(2), 161-187.
McEnery, T., & Xiao, R. (2011). What corpora can offer in language teaching and learning. In E. Hinkel (Ed.) Handbook of research in second language teaching and learning (pp. 364-380). Routledge.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. Routledge.
Meunier, F. (2020). A case for constructive alignment in DDL: Rethinking outcomes, practices, and assessment in (data-driven) language learning. In P. Crosthwaite (Ed.), Data-driven learning for the next generation: Corpora and DDL for pre-tertiary learners (pp. 13-30). Routledge.
Naismith, B., Juffs, A., Han, N.-R., & Zheng, D. (2022). Handle it in-house? International Journal of Corpus Linguistics, 27(3), 291–320. https://doi.org/10.1075/ijcl.20024.nai
Nesselhauf, N. (2004). Learner corpora: Learner corpora and their potential for language teaching. In J. Sinclair (Ed.), How to use corpora in language teaching (pp. 125-152). John Benjamins. https://doi.org/10.1075/scl.12.11nes
O'Donnell, M. (2016). UAM corpus tool 3.3f. Retrieved 30 June 2022.
O'Keeffe, A. (2021). Data-driven learning–a call for a broader research gaze. Language Teaching, 54(2), 259-272.
O'Sullivan, Í. (2007). Enhancing a process-oriented approach to literacy and language learning: The role of corpus consultation literacy. ReCALL, 19(3), 269-286. https://doi.org/10.1017/S095834400700033X
Page, E. B. (2003). Project Essay Grade: PEG. In M. D. Shermis, & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43-54). Lawrence Erlbaum Associates.
Pérez-Paredes, P., Ordoñana Guillamón, C., van de Vyver, J., Meurice, A., Aguado Jiménez, P., Conole, G., & Sánchez Hernández, P. (2019). Mobile data-driven language learning: Affordances and learners’ perception. System, 84, 145-159. https://doi.org/https://doi.org/10.1016/j.system.2019.06.009
Römer, U. (2022). Applied corpus linguistics for language acquisition, pedagogy, and beyond. Language Teaching, 55(2), 233-244. https://doi.org/10.1017/S0261444821000392
Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric™ essay scoring system. The Journal of Technology, Learning and Assessment, 4(4), 3-21.
Runcie, M. (2002). Oxford collocations dictionary for students of English. Oxford University Press.
Rundell, M. (2009). Macmillan English dictionary online. Macmillan Education. Available at
Satake, Y. (2020). How error types affect the accuracy of L2 error correction with corpus use. Journal of second language writing, 50, 100757. https://doi.org/10.1016/j.jslw.2020.100757
Schmidt, R. W. (1990). The role of consciousness in second language learning1. Applied linguistics, 11(2), 129-158. https://doi.org/10.1093/applin/11.2.129
Schmidt, R. W. (2001). Attention. In Robinson, P. (Ed.), Cognition and second language instruction (pp. 3-32). Cambridge University Press. https://doi.org/10.1017/CBO9781139524780.003
Silva, T., & Brice, C. (2004). Research in teaching writing. Annual Review of Applied Linguistics, 24, 70-106. https://doi.org/10.1017/s0267190504000042
Hanks, P. (1987). Definitions and explanations. In Sinclair, J. (Ed.), Looking up: An account of the COBUILD project in lexical computing and the development of the Collins COBUILD English language dictionary (pp. 116–136). Collins.
Sinclair, J. (1987). Collins COBUILD English Language Dictionary. Collins.
Sinclair, S., & Rockwell, G. (2016). Voyant Tools. Accessed 17 July 2022 at https://voyant-tools.org/
Smart, J. (2014). The role of guided induction in paper-based data-driven learning. ReCALL, 26, 184-201. https://doi.org/10.1017/S0958344014000081
Smith, B. (2012). Eye tracking as a measure of noticing: A study of explicit recasts in SCMC. Language Learning & Technology, 16(3), 53-81. http://dx.doi.org/10125/44300
Stockwell, G. (2007). A review of technology choice for teaching language skills and areas in the CALL literature. ReCALL, 19(2), 105-120. https://doi.org/10.1017/S0958344007000225
Timmis, I. (2010). Teachers telling tales: Exploring materials for teaching spoken language. In F. Mishan and A. Chambers (Eds.), Perspectives on language learning materials development. (pp. 63-85). Peter Lang.
Tono, Y. (2019). Coming full circle - From CEFR to CEFR-J and back. CEFR Journal - Research and Practice, JALT, 5-17. https://doi.org/10.37546/JALTSIG.CEFR1-1
Uzun, K. (2022). Emotional load, formality, informativeness and implicature in relation to L2 writing performance. In Language, Culture, Art and Politics in the Changing World (pp. 19-29). essay, Literatürk Academia.
Vantage Learning. (2007). MY access! ® efficacy report. Vantage Learning. Accessed from https://www. vantagelearning.com/school/research/myaccess.html
Vyatkina, N. (2016). Data-Driven learning of collocations: Learner performance, proficiency, and perceptions, Language Learning and Technology, 20(3), 159-79.
Willis, D. (1990). The lexical syllabus. Collins.
Yao, G. (2019). Vocabulary learning through data-driven learning in the context of Spanish as a foreign language. Research in Corpus Linguistics, 7, 18-46. https://doi.org/10.32714/ricl.07.02
Zare, J., & Karimpour, S. (2022). Classroom Concordancing and Second Language Motivational Self-System: A Data-Driven Learning Approach. Frontiers in Psychology, 13, Article 841584. https://doi.org/10.3389/fpsyg.2022.841584
Zare, J., Karimpour, S., & Aqajani Delavar, K. (2022). The impact of concordancing on English learners’ foreign language anxiety and enjoyment: An application of data-driven learning. System, 109, 102891.
Zechner, K., Higgins, D., & Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883-895. https://doi.org/10.1016/j.specom.2009.04.009