Article
Forecasting Student Academic Performance Using Machine Learning
DOI:
https://doi.org/10.47344/w85rct27Keywords:
machine learning education, education artificial intelligence, edtech, AI in edtech, predictive power educationAbstract
Educational data mining depends on accurate student academic outcome forecasting to detect students who need help early and receive specific support. Traditional linear models have been used extensively yet they fail to detect the intricate non-linear patterns which exist in student achievement data. The evaluation of machine learning algorithms and their features for student outcome prediction in Portuguese secondary education remains insufficient because of missing systematic assessments.
The research investigates how Linear Regression and Random Forest and K-Nearest Neighbors perform when predicting Portuguese language grades from 649 student records containing 30 demographic and social and academic attributes. The evaluation of model performance used three established metrics which included Mean Squared Error (MSE) and R-Squared (R²) and Mean Absolute Error (MAE).
The results showed Linear Regression produced the most accurate predictions through its lowest MSE (9.00) and MAE (2.30) values but its weak R² value (0.01) indicated poor explanatory power. The error rates of Random Forest matched those of Linear Regression (MSE = 9.48 and MAE = 2.34) yet its negative R² (-0.04) indicated poor generalization because of irrelevant features and suboptimal hyperparameters. The KNN model showed the worst results (MSE = 11.10 and MAE = 2.57 and R² = -0.21) because it failed to detect important patterns without additional optimization.
The results show that educational prediction tasks require both optimal feature selection and parameter adjustment for successful results. The research shows that linear models perform better than complex methods in specific situations yet optimized non-linear models demonstrate superior ability to understand student achievement complexity. The research provides essential guidelines for developing better feature engineering and machine learning approaches to predict educational results.