Autors: Vangelova A., Gancheva, V. S. Title: AI-Based Automated Scoring Layer Using Large Language Models and Semantic Analysis Keywords: artificial intelligence, automated scoring, Bloom’s taxonomy, large language models, natural language processing, open-ended questions, RAG, semantic analysisAbstract: Featured Application: This study presents an AI-based scoring layer for automated assessment of open-ended student responses. The proposed framework combines large language models, Retrieval-Augmented Generation (RAG), and analytical rubrics in order to support criterion-based, context-grounded evaluation in e-learning environments. It can be integrated into platforms such as Moodle to assist instructors in grading, improve consistency, reduce scoring time, and support faster and more structured feedback for learners. Automated scoring of open-ended questions is an important research direction in educational technology and artificial intelligence, as manual grading is time-consuming and often subject to inter-rater variation. This paper proposes an AI-based framework for automated scoring that combines large language models (LLMs), Retrieval-Augmented Generation (RAG), analytical rubrics, and structured machine-readable output within a Moodle-supported e-learning environment. The framework is designed to support context-grounded and criterion-based evaluation by combining the student response, retrieved instructional context, and rubric-defined scoring criteria within a controlled assessment workflow. The proposed approach aims to improve the consistency, traceability, and practical applicability of automated scoring for open-ended responses. To examine its performance, an experimental study was conducted in a real university setting involving a five-task open-ended examination. AI-generated scores were compared with independent human scores using agreement, reliability, correlation, and error metrics. The results indicate a strong level of agreement between automated and expert scoring within the tested setting, together with relatively low average deviation. These findings suggest that the proposed framework has practical potential for supporting automated assessment in digital learning environments, while also highlighting the importance of careful interpretation within the scope of the experimental design. References - Pecuchova J. Benko Ľ. Drlik M. Automated Grading of Open-Ended Questions in Higher Education Using GenAI Models Int. J. Artif. Intell. Educ. 2025 35 3813 3846 10.1007/s40593-025-00517-2
- Jauhiainen J. Guerra A.G. Evaluating Students’ Open-Ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large Adv. Artif. Intell. Mach. Learn. 2024 4 3097 3113 10.54364/AAIML.2024.44177
- Tang X. Chen H. Lin D. Li K. Harnessing LLMs for Multi-Dimensional Writing Assessment: Reliability and Alignment with Human Judgments Heliyon 2024 10 e34262 10.1016/j.heliyon.2024.e34262 39113951
- Yeung S.A. Comparative Study of Rule-Based, Machine Learning and Large Language Model Approaches in Automated Writing Evaluation (AWE) Proceedings of the 15th International Learning Analytics and Knowledge Conference (LAK’25) Dublin, Ireland 3–7 March 2025 984 991 10.1145/3706468.3706566
- Lan G. Li Y. Yang J. He X. Investigating a customized generative AI chatbot for automated essay scoring in a disciplinary writing task Assess. Writ. 2025 66 100959 10.1016/j.asw.2025.100959
- Grévisse C. LLM-based automatic short answer grading in undergraduate medical education BMC Med. Educ. 2024 24 1060 10.1186/s12909-024-06026-5
- Latif E. Zhai X. Fine-tuning ChatGPT for automatic scoring Comput. Educ. Artif. Intell. 2024 6 100210 10.1016/j.caeai.2024.100210
- Xu J. Liu J. Lin M. Lin J. Yu S. Zhao L. Shen J. EPCTS: Enhanced Prompt-Aware Cross-Prompt Essay Trait Scoring Neurocomputing 2025 621 129283 10.1016/j.neucom.2024.129283
- Mendonça P.C. Quintal F. Mendonça F. Evaluating LLMs for Automated Scoring in Formative Assessments Appl. Sci. 2025 15 2787 10.3390/app15052787
- Qiu H. White B. Ding A. Costa R. Hachem A. Ding W. Chen P. SteLLA: A Structured Grading System Using LLMs with RAG arXiv 2025 10.48550/arXiv.2501.09092 2501.09092
- Chu S. Kim J. Wong B. Yi M. Rationale Behind Essay Scores: Enhancing S-LLM’s Multi-Trait Essay Scoring with Rationale Generated by LLMs arXiv 2025 10.48550/arXiv.2410.14202 2410.14202
- Seßler K. Fürstenberg M. Bühler B. Kasneci E. Can AI grade your essays? A comparative analysis of large language models and teacher ratings in multidimensional essay scoring Proceedings of the 15th International Learning Analytics and Knowledge Conference Dublin, Ireland 3–7 March 2025 462 472 10.1145/3706468.3706527
- Papachristou I. Dimitroulakos G. Vassilakis C. Automated Test Generation and Marking Using LLMs Electronics 2025 14 2835 10.3390/electronics14142835
- Emirtekin E. Large Language Model-Powered Automated Assessment: A Systematic Review Appl. Sci. 2025 15 5683 10.3390/app15105683
- Gao R. Merzdorf H.E. Anwar S. Hipwell M.C. Srinivasa A.R. Automatic assessment of text-based responses in post-secondary education: A systematic review Comput. Educ. Artif. Intell. 2024 6 100206 10.1016/j.caeai.2024.100206
- Zlatkin-Troitschanskaia O. Fischer J. Braun H.I. Shavelson R.J. Advantages and challenges of performance assessment of student learning in higher education International Encyclopedia of Education 4th ed. Elsevier Amsterdam, The Netherlands 2023 312 330 10.1016/B978-0-12-818630-5.02055-8
- Sun J. Song T. Peng W. Song J. A Survey of Automated Essay Scoring: Challenges, Advances, and Future Neurocomputing 2025 650 130916 10.1016/j.neucom.2025.130916
- Dikli S. An Overview of Automated Scoring of Essays J. Technol. Learn. Assess. 2006 5 Available online: https://ejournals.bc.edu/index.php/jtla/article/view/1640/1489 (accessed on 3 March 2025)
- Fateen M. Wang B. Mine T. Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback IEEE Access 2024 12 185371 185385 10.1109/ACCESS.2024.3508747
- Zhuang M. Long S. Martin F. Castellanos-Reyes D. The affordances of Artificial Intelligence (AI) and ethical considerations across the instruction cycle: A systematic review of AI in online higher education Internet High. Educ. 2025 67 101039 10.1016/j.iheduc.2025.101039
- Sychev O. Anikin A. Prokudin A. Automatic Grading and Hinting in Open-Ended Text Questions Cogn. Syst. Res. 2020 59 264 272 10.1016/j.cogsys.2019.09.025
- Aydın B. Kışla T. Elmas N.T. Bulut O. Automated Scoring in the Era of Artificial Intelligence: An Empirical Study with Turkish Essays System 2025 133 103784 10.1016/j.system.2025.103784
- Stephen T.C. Gierl M.C. King S. Automated Essay Scoring (AES) of Constructed Responses in Nursing Examinations: An Evaluation Nurse Educ. Pract. 2021 54 103085 10.1016/j.nepr.2021.103085
- Jung J.Y. Tyack L. von Davier M. Towards the implementation of automated scoring in international large-scale assessments: Scalability and quality control Comput. Educ. Artif. Intell. 2025 8 100375 10.1016/j.caeai.2025.100375
- Mizumoto A. Eguchi M. Exploring the Potential of Using an AI Language Model for Automated Essay Scoring Res. Methods Appl. Linguist. 2023 2 100050 10.1016/j.rmal.2023.100050
- Pack A. Barrett A. Escalante J. Large Language Models and Automated Essay Scoring of English Language Learner Writing: Insights into Validity and Reliability Comput. Educ. Artif. Intell. 2024 6 100234 10.1016/j.caeai.2024.100234
- Birla N. Jain M.K. Panwar A. Automated Assessment of Subjective Assignments: A Hybrid Approach Expert Syst. Appl. 2022 203 117315 10.1016/j.eswa.2022.117315
- Li X. Chen M. Nie J.-Y. SEDNN: Shared and enhanced deep neural network model for cross-prompt automated essay scoring Knowl.-Based Syst. 2020 210 106491 10.1016/j.knosys.2020.106491
- Wang Q. A Multifaceted Architecture to Automate Essay Scoring for Assessing English Article Writing: Integrating Semantic, Thematic, and Linguistic Representations Comput. Electr. Eng. 2024 118 109308 10.1016/j.compeleceng.2024.109308
- Bonthu S. Rama Sree S. Krishna Prasad M.H.M. Improving the performance of automatic short answer grading using transfer learning and augmentation Eng. Appl. Artif. Intell. 2023 123 106292 10.1016/j.engappai.2023.106292
- Tan L.Y. Hu S. Yeo D.J. Cheong K.H. A Comprehensive Review on Automated Grading Systems in STEM Using AI Techniques Math 2025 13 2828 10.3390/math13172828
- Meyer J. Jansen T. Schiller R. Liebenow L.W. Steinbach M. Horbach A. Fleckenstein J. Using LLMs to Bring Evidence-Based Feedback into the Classroom: AI-Generated Feedback Increases Secondary Students’ Text Revision, Motivation, and Positive Emotions Comput. Educ. Artif. Intell. 2024 6 100199 10.1016/j.caeai.2023.100199
- Quah B. Zheng L. Sng T.J.H. Yong C.W. Islam I. Reliability of ChatGPT in automated essay scoring for dental undergraduate examinations BMC Med. Educ. 2024 24 962 10.1186/s12909-024-05881-6 39227811
- Zhao X. A Hybrid Deep Learning and Fuzzy Logic Framework for Feature-Based Evaluation of English Language Learners Sci. Rep. 2025 15 33657 10.1038/s41598-025-17738-z 41023079
- He X. Xiao X. Fang J. Li Y. Li Y. Zhou R. Exercise-Aware higher-order Thinking skills Assessment via fine-tuned large language model Knowl.-Based Syst. 2025 324 113808 10.1016/j.knosys.2025.113808
- Firoozi T. Bulut O. Gierl M. Language models in automated essay scoring: Insights for the Turkish language Int. J. Assess. Tools Educ. 2023 10 149 163 10.21449/ijate.1394194
- Johnsi R. Kumar G.B. Enhancing automated essay scoring by leveraging LSTM networks with hyper-parameter tuned word embeddings and fine-tuned LLMs Eng. Res. Express 2025 7 025272 10.1088/2631-8695/adcf74
- Córdova-Esparza D.-M. AI-Powered Educational Agents: Opportunities, Innovations, and Ethical Challenges Information 2025 16 469 10.3390/info16060469
- Tyndall E. Gayheart C. Some A. Genz J. Wagner T. Langhals B. Impact of retrieval augmented generation and large language model complexity on undergraduate exams created and taken by AI agents Data Policy 2025 7 e57 10.1017/dap.2025.10024
- Kinder A. Briese F.J. Jacobs M. Dern N. Glodny N. Jacobs S. Leßmann S. Effects of adaptive feedback generated by a large language model: A case study in teacher education Comput. Educ. Artif. Intell. 2025 8 100349 10.1016/j.caeai.2024.100349
- Villegas-Ch W. Gutierrez R. García-Ortiz J. Guevara V. Explainable educational assistant integrated in Moodle: Automated semantic assessment and adaptive tutoring based on NLP and XAI Discov. Artif. Intell. 2025 5 191 10.1007/s44163-025-00438-y
- Oğuz E. Can Generative AI Figure Out Figurative Language? The Influence of Idioms on Essay Scoring by ChatGPT, Gemini, and Deepseek Assess. Writ. 2025 66 100981 10.1016/j.asw.2025.100981
- Morris W. Crossley S. Holmes L. Ou C. Dascalu M. McNamara D. Formative Feedback on Student-Authored Summaries in Intelligent Textbooks Using Large Language Models Int. J. Artif. Intell. Educ. 2025 35 1022 1043 10.1007/s40593-024-00395-0
- Koo T.K. Li M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research J. Chiropr. Med. 2016 15 155 163 10.1016/j.jcm.2016.02.012
- Cisneros-González J. Gordo-Herrera N. Barcia-Santos I. Sánchez-Soriano J. JorGPT: Instructor-Aided Grading of Programming Assignments with Large Language Models (LLMs) Future Internet 2025 17 265 10.3390/fi17060265
- Ferreira Mello R. Pereira Junior C. Rodrigues L. Pereira F.D. Cabral L. Costa N. Ramalho G. Gasevic D. Automatic Short Answer Grading in the LLM Era: Does GPT-4 with Prompt Engineering beat Traditional Models? Proceedings of the 15th International Learning Analytics and Knowledge Conference Dublin, Ireland 3–7 March 2025 93 103 10.1145/3706468.3706481
- Cipriano E. Ferrato A. Limongelli C. Schicchi D. Taibi D. Leveraging Large Language Models to Assist Teachers in Code Grading Artificial Intelligence in Education Cristea A.I. Walker E. Lu Y. Santos O.C. Isotani S. Springer Nature Cham, Switzerland 2025 Volume 15880 204 217 10.1007/978-3-031-98459-4_15
- Landis J.R. Koch G.G. The Measurement of Observer Agreement for Categorical Data Biometrics 1977 33 159 174 10.2307/2529310 843571
Issue
| Applied Sciences (Switzerland), vol. 16, 2026, Switzerland, https://doi.org/10.3390/app16073537 |
|