Share:


Evaluating the performance of machine learning approaches in predicting Albanian Shkumbini River's waters using water quality index model

    Lule Basha Affiliation
    ; Bederiana Shyti Affiliation
    ; Lirim Bekteshi Affiliation

Abstract

A common technique for assessing the overall water quality state of surface water and groundwater systems globally is the water quality index (WQI) method. The aim of the research is to use four machine learning classifier algorithms: Gradient boosting, Naive Bayes, Random Forest, and K-Nearest Neighbour to determine which model was most effective at forecasting the various water quality index and classes of the Albanian Shkumbini River. The analysis was performed on the data collected during a 4-year period, in six monitoring points, for nine parameters.
The predictive accuracy of the models, XGBoost, Random Forest, K-Nearest Neighbour, and Naive Bayes, was determined to be 98.61%, 94.44%, 91.22%, and 94.45%, respectively. Notably, the XGBoost algorithm demonstrated superior performance in terms of F1 score, sensitivity, and prediction accuracy, the lowest errors during both learning (RMSE = 2.1, MSE = 9.8, MAE = 1.13) and evaluating (RMSE = 0.0, MSE = 0.01, MAE = 0.01) stages. The findings highlighted that Biochemical oxygen demand (BOD), Bicarbonate (HCO3), and Total Phosphor had the most positive impact on the Shkumbini River’s water quality. Additionally, a statistically significant, strong positive correlation (r = 0.85) was identified between BOD and WQI, emphasizing its crucial role in influencing water quality in the Shkumbini River.

Keyword : Water Quality Index model, Shkumbini River, machine learning classifier, model accuracy

How to Cite
Basha, L., Shyti, B., & Bekteshi, L. (2024). Evaluating the performance of machine learning approaches in predicting Albanian Shkumbini River’s waters using water quality index model. Journal of Environmental Engineering and Landscape Management, 32(2), 117–127. https://doi.org/10.3846/jeelm.2024.20979
Published in Issue
Mar 6, 2024
Abstract Views
322
PDF Downloads
305
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Abbasi, T., & Abbasi, S. A. (2012). Water-quality indices: Looking back, looking ahead. In Water quality indices (pp. 353–356). Elsevier. https://doi.org/10.1016/B978-0-444-54304-2.00016-6

Aldhyani, T. H. H., Al-Yaari, M., Alkahtani H., & Maashi, M. (2020). Retraction: Water quality prediction using artificial intelligence algorithms. Applied Bionics and Biomechanics, 2020, Ar­ticle 6659314. https://doi.org/10.1155/2020/6659314

Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–85. https://doi.org/10.1080/00031305.1992.10475879

Azrour, M., Mabrouki, J., Fattah, G., Guezzaz A., & Aziz, F. (2021). Machine learning algorithms for efficient water quality prediction. Modeleling Earth Systems and Environment, 8, 2793–2801. https://doi.org/10.1007/s40808-021-01266-6

Bedi, S., Samal, A., Ray, C., & Snow, D. (2020). Comparative evaluation of machine learning models for groundwater quality assessment. Environmental Monitoring and Assessment, 192, Article 776. https://doi.org/10.1007/s10661-020-08695-3

Brown, R. M., Mccleiland, N. J., Deiniger R. A., & O’Connor, M. F. (1972, June 18–23). Water quality index-crossing the physical barrier. In Proceedings of the International Conference on Water Pollution Research (pp. 787–797), Jerusalem. https://doi.org/10.1016/B978-0-08-017005-3.50067-0

Chen, T., & Guestrin, C. (2016, August 13–17). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794), San Francisco. https://doi.org/10.1145/2939672.2939785

Cunningham, P., & Delany, S. J. (2007). k-Nearest neighbour classifiers. ACM Computing Surveys, 54(6), 1–25. https://doi.org/10.1145/3459665

Dadolahi-Sohrab, A., Arjomand, F., & Fadaei-Nasab, M. (2012). Water quality index as a simple indicator of watersheds pollution in southwestern part of Iran. Water and Environment Journal, 26(4), 445–454. https://doi.org/10.1111/j.1747-6593.2011.00303.x

Damo, R., & Icka, P. (2013). Evaluation of water quality index for drinking water. Polish Journal of Environmental Studies, 22(4), 1045–1051.

El Bilali, A., Taleb, A., & Brouziyne, Y. (2021). Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agricultural Water Management, 245, Article 106625. https://doi.org/10.1016/j.agwat.2020.106625

Ferreira, A. J., & Figueiredo, M. A. (2012). Boosting algorithms: A review of methods, theory, and applications. In Ensemble machine learning (pp. 35–85). Springer. https://doi.org/10.1007/978-1-4419-9326-7_2

Georgescu, P.-L., Moldovanu, S., Iticescu, C., Calmuc, M., Calmuc, V., Topa, C., & Moraru, L. (2023). Assessing and forecasting water quality in the Danube River by using neural network approaches. The Science of the Total Environment, 879, Article 162998. https://doi.org/10.1016/j.scitotenv.2023.162998

Horton, R. K. (1965). An index number system for rating water quality. Journal of the Water Pollution Control Federation, 37(3), 303–306.

International Organization for Standardization. (2018). Water quality – Sampling – Part 4: Guidance on sampling from lakes, natural and man-made (ISO Standard No. 5667-4). https://standards.iteh.ai/catalog/standards/sist/a1a7bb26-7c03-462f-a7ae-7619d48945e2/sist-iso-5667-4-2018

International Organization for Standardization. (2015). Water quality – Sampling – Part 6: Guidance on sampling of rivers and streams (ISO 5667-6). https://standards.iteh.ai/catalog/standards/sist/b8b8c606-00fc-46fb-a38f-109c197cc3b9/sist-iso-5667-6-2015

Khoi, D. N., Quan, N. T., Linh, D. Q., Nhi, P. T. T., & Thuy, N. T. D. (2022). Using machine learning models for predicting the water quality index in the La Buong River, Vietnam. Water, 14(10), Article 1552. https://doi.org/10.3390/w14101552

Naloufi, M., Lucas F. S., Souihi, S., Servais, P., Janne, A., & Wanderley Matos De Abreu, T. (2021). Evaluating the performance of machine learning approaches to predict the microbial quality of surface waters and to optimize the sampling effort. Water, 13(18), Article 2457. https://doi.org/10.3390/w13182457

Nayan, A.-A., Kibria, M. G., Rahman, M. O., & Saha, J. (2020, November 28–29). River water quality analysis and prediction using GBM. In Proceedings of the 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT) (pp. 219–224). IEEE. https://doi.org/10.1109/ICAICT51780.2020.9333492

Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., Gupta, H. V. (2021). What role does hydro­logical science play in the age of machine learning? Water Resources Research, 57(3), Article e2020WR028091. https://doi.org/10.1029/2020WR028091

Parween, S., Siddique, N. A., Mahammad Diganta, M. T., Olbert, A. I., & Uddin, Md. G. (2022). Assessment of urban river water quality using modified NSF water quality index model at Siliguri city, West Bengal, India. Environmental and Sustainability Indicators, 16, Article 100202. https://doi.org/10.1016/j.indic.2022.100202

Rahman, A. (2020). Statistics for data science and policy analysis. Springer. https://doi.org/10.1007/978-981-15-1735-8

Ravindra, B., Subba Rao, N., & Dhanamjaya Rao, E. N. (2023). Groundwater quality monitoring for assessment of pollution levels and potability using WPI and WQI methods from a part of Guntur district, Andhra Pradesh, India. Environment, Development and Sustainability, 25, 14785–14815. https://doi.org/10.1007/s10668-022-02689-6

Roba, C., Rosu, C., Pistea, I., Baciu, C., Costin, D., & Ozunu, A. (2016). Transfer of heavy metals from soil to vegetables in a mining/smelting influenced area (Baia Mare – Ferneziu, Romania). Journal of Environmental Protection and Ecology, 16, 891–898.

Sain, S. R. (1996). The nature of statistical learning theory. Technometrics, 38(4), 409. https://doi.org/10.2307/1271324

Shafi, U., Mumtaz, R., Anwar, H., Qamar, A. M., & Khurshid, H. (2018, October 8–10). Surface water pollution detection using internet of things. In Proceedings 15th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT (HONET-ICT) (pp. 92–96). IEEE. https://doi.org/10.1109/HONET.2018.8551341

Shamsuddin, I. I. S., Othman, Z., & Sani, N. S. (2022). Water quality index classification based on machine learning: A case from the Langat River Basin model. Water, 14(19), Article 2939. https://doi.org/10.3390/w14192939

Steinhart, C. E., Schierow, L. J., & Sonzogni, W. C. (1982). An environmental quality index for the great lakes. Journal of the American Water Resources Association, 18(6), 1025–1031. https://doi.org/10.1111/j.1752-1688.1982.tb00110.x

Subba Rao, N., Sunitha, B., Das, R., & Anil Kumar, B. (2022). Monitoring the causes of pollution using groundwater quality and chemistry before and after the monsoon. Physics and Chemistry of the Earth, 128, Article 103228. https://doi.org/10.1016/j.pce.2022.103228

Sulce, S., Rroco, E., Malltezi, J., Shallari, S., Libohova, Z., Sinaj, S., & Qafoku, N. P. (2018). Water quality in Albania: An overview of sources of contamination and controlling factors. Albanian Journal of Agricultural Sciences, 2 (Special edition – Proceedings of ICOALS), 279–297.

Sutadian, A. D., Muttil, N., Yilmaz, A. G., & Perera, B. J. C. (2018). Development of a water quality index for rivers in West Java Province, Indonesia. Ecological Indicators, 85, 966–982. https://doi.org/10.1016/j.ecolind.2017.11.049

Uddin, M. G., Nash, S., & Olbert, A. I. (2021). A review of water quality index models and their use for assessing surface water quality. Ecological Indicators, 122, Article 107218. https://doi.org/10.1016/j.ecolind.2020.107218

Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2022a). A comprehensive method for improvement of water quality index (WQI) models for coastal water quality assessment. Water Research, 219, Article 118532. https://doi.org/10.1016/j.watres.2022.118532

Uddin, M. G., Nash, S., Mahammad Diganta, M. T., Rahman, A., & Olbert, A. I. (2022b). Robust machine learning algorithms for predicting coastal water quality index. Journal or Environmental Management, 321, Article 115923. https://doi.org/10.1016/j.jenvman.2022.115923

Uddin, G., Nash, S., & Olbert, A. I. (2022c). Optimization of parameters in a water quality index model using principal component analysis [Conference presentation]. Proceedings of the 39th IAHR World Congress, Granada, Spain. https://doi.org/10.3850/IAHR-39WC2521711920221326

Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2023a). A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches. Water Research, 229, Article 119422. https://doi.org/10.1016/j.watres.2022.119422

Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2023b). Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Safety and Environmental Protection, 169, 808–828. https://doi.org/10.1016/j.psep.2022.11.073

Verma, R. K., Murthy, S., Tiwary, R. K., & Verma, S. (2019). Development of simplified WQIs for assessment of spatial and temporal variations of surface water quality in upper Damodar river basin, eastern India. Applied Water Science, 9, Article 21. https://doi.org/10.1007/s13201-019-0893-0

World Health Organization. (2017). Guideline for drinking water quality (4th ed., incorporating the 1st addendum). https://www.who.int/publications/i/item/9789241549950

Zela, G., Demiraj, E., Marko, O., Gjipalaj, J., Erebara, A., Malltezi, J., Zela, E., & Bani, A. (2020). Assessment of the water quality index in the Semani River in Albania. Journal of Environmental Protection, 11(11), 998–1013. https://doi.org/10.4236/jep.2020.1111063