Анализ эффективности алгоритмов кластеризации мультимодальных выборок с помощью компьютерного моделирования педагогического эксперимента

DOI: http://dx.doi.org/10.15293/2658-6762.2402.06

Science for Education Today, 2024, vol. 14, no. 2, pp. 125–151

UDC:

37.012.4+159.9.072

Analysis of the effectiveness of clustering algorithms for multimodal samples using computer simulation of an educational experiment

Abitov R. N.

¹ (Kazan, Russian Federation), Safin R. S.

¹ (Kazan, Russian Federation)

¹ Kazan State University of Architecture and Engineering

Full text (Open Access) — Downloaded 533

Abstract:

Introduction. The article is devoted to the problem of primary data processing of pedagogical experiments having a multimodal character. The purpose of the study is to identify the most effective and universal clustering algorithms for pedagogical experiments.
Materials and Methods. The study used the method of modeling a pedagogical experiment. The analysis of 5 clustering algorithms is conducted. The effectiveness of clustering algorithms was evaluated based on the proportion of observations with clustering errors at various tolerance levels and the Jacquard similarity coefficient. Regression analysis was used to assess the influence of modeling parameters of a pedagogical experiment and indicators of descriptive statistics on the effectiveness of clustering algorithms.
Results. The assessment of the effectiveness of various data clustering algorithms is provided, as well as a correlation and regression analysis of factors affecting clustering efficiency indicators was carried out.
Conclusions. The most effective clustering algorithms for multimodal samples include the K-means algorithm and the agglomerative hierarchical algorithm. The results obtained in this research can be used for statistical analysis of pedagogical, psychological, sociological, biological and medical research data.

Keywords:

Educational experiment modeling; Data clustering algorithms; Multimodal samples; Data analysis in education.

For citation:

Abitov R. N., Safin R. S. Analysis of the effectiveness of clustering algorithms for multimodal samples using computer simulation of an educational experiment. Science for Education Today, 2024, vol. 14, no. 2, pp. 125–151. DOI: http://dx.doi.org/10.15293/2658-6762.2402.06

References:

Abitov R. N. On the ways to increase the validity and repeatability of experimental pedagogical research. Kazan Pedagogical Journal, 2022, no. 4, pp. 79–90. (In Russian) DOI: https://10.51379/kpj.2022.154.4.009 URL: https://elibrary.ru/item.asp?id=49482910
Ershov K. S., Romanova T. N. Analysis and classification of clustering algorithms. New Information Technologies in Automated Systems, 2016, no. 19, pp. 274–279. (In Russian) URL: https://elibrary.ru/item.asp?id=25864070
Podvalny S. L., Plotnikov A. V., Belyanin A. M. Comparison of cluster analysis of algorithms random set of data. Bulletin of Voronezh State Technical University, 2012, vol. 8 (5), pp. 4–6. (In Russian) URL: https://elibrary.ru/item.asp?id=17743528
Sivogolovko E. V. Methods for assessing the quality of clear clustering. Computer Tools in Education, 2011, no. 4, pp. 14–31. (In Russian) URL: https://elibrary.ru/item.asp?id=21786023
Xiaowei Xu, Ester M., Kriegel H.-P., Sander J. A distribution-based clustering algorithm for mining in large spatial databases. Proceedings 14th International Conference on Data Engineering. DOI: https://doi.org/10.1109/icde.1998.655795
Azzalini A., Valle A. D. The multivariate skew-normal distribution. Biometrika, 1996, vol. 83 (4), pp. 715–726. DOI: https://doi.org/10.1093/biomet/83.4.715
Banfield J. D., Raftery A. E. Model-based Gaussian and non-Gaussian clustering. Biometrics, 1993, vol. 49 (3), pp. 803–821. DOI: https://doi.org/10.2307/2532201
Cheng M.-Y., Hall P. Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 1998, vol. 60 (3), pp. 579–589. DOI: https://doi.org/10.1111/1467-9868.00141
Rodriguez M. Z., Comin C. H., Casanova D., Bruno O. M., Amancio D. R., Costa L. da F., Rodrigues F. A. Clustering algorithms: A comparative approach. PloS One, 2019, vol. 14 (1), pp. e021023. DOI: https://doi.org/10.1371/journal.pone.0210236
Reynolds A. P., Richards G., de la Iglesia B., Rayward-Smith V. J. Clustering rules: A comparison of partitioning and hierarchical clustering algorithms. Journal of Mathematical Modeling and Algorithms, 2006, vol. 5 (4), pp. 475–504. DOI: https://doi.org/10.1007/s10852-005-9022-1
Kinnunen T., Sidoroff I., Tuononen M., Fränti P. Comparison of clustering methods: A case study of text-independent speaker modeling. Pattern Recognition Letters, 2011, vol. 32 (13), pp. 1604–1617. DOI: https://doi.org/10.1016/j.patrec.2011.06.023
Ameijeiras-Alonso J., Crujeiras R. M., Rodríguez-Casal A. Mode testing, critical bandwidth and excess mass. TEST, 2018, vol. 28 (3), pp. 900–919. DOI: https://doi.org/10.1007/s11749-018-0611-5
Fisher N. I., Marron J. S. Mode testing via the excess mass estimate. Biometrika, 2001, vol. 88 (2), pp. 499–517. DOI: https://doi.org/10.1093/biomet/88.2.499
Fowlkes E. B., Mallows C. L. A method for comparing two hierarchical clusterings: Rejoinder. Journal of the American Statistical Association, 1983, vol. 78 (383), pp. 584. DOI: https://doi.org/10.2307/2288123
Guha S., Rastogi R., Shim K. Cure: an efficient clustering algorithm for large databases. Information Systems, 2001, vol. 26 (1), pp. 35–58. DOI: https://doi.org/10.1016/s0306-4379(01)00008-4
Guha S., Rastogi R., Shim K. ROCK: a robust clustering algorithm for categorical attributes. Proceedings 15th International Conference on Data Engineering, 1999. Cat. No.99CB36337. DOI: https://doi.org/10.1109/icde.1999.754967
Hartigan J. A., Hartigan P. M. The dip test of unimodality. The Annals of Statistics, 1985, vol. 13 (1), pp. 70–84. DOI: https://doi.org/10.1214/aos/1176346577
Jung Y. G., Kang M. S., Heo J. Clustering performance comparison using K-means and expectation maximization algorithms. Biotechnology & Biotechnological Equipment, 2014, vol. 28 (sup1), pp. S44–S48. DOI: https://doi.org/10.1080/13102818.2014.949045
Karypis G., Eui-Hong Han, Kumar V. Chameleon: Hierarchical clustering using dynamic modeling. Computer, 1999, vol. 32 (8), pp. 68–75. DOI: https://doi.org/10.1109/2.781637
Kruskal W. H., Wallis W. Errata: Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association, 1953, vol. 48 (264), pp. 907. DOI: https://doi.org/10.2307/2281082
Ankerst M., Breunig M. M., Kriegel H.-P., Sander J. OPTICS: Ordering points to identify the clustering structure. ACM Sigmod Record, 1999, vol. 28 (2), pp. 49–60. DOI: https://doi.org/10.1145/304181.304187
Rand W. M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 1971, vol. 66 (336), pp. 846–850. DOI: https://doi.org/10.1080/01621459.1971.10482356
Sculley D. Web-scale k-means clustering. Proceedings of the 19th international conference on World wide web, 2010, pp. 1177–1178. DOI: https://doi.org/10.1145/1772690.1772862
Shi J., Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, vol. 22 (8), pp. 888–905. DOI: https://doi.org/10.1109/cvpr.1997.609407
Silverman B. W. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society: Series B (Methodological), 1981, vol. 43 (1), pp. 97–99. DOI: https://doi.org/10.1111/j.2517-6161.1981.tb01155.x
Ward J. H. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 1963, vol. 58 (301), pp. 236–244. DOI: https://doi.org/10.1080/01621459.1963.10500845
Wilkin G. A., Huang X. K-means clustering algorithms: Implementation and comparison. Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007), 2007, pp. 133–136. DOI: https://doi.org/10.1109/imsccs.2007.51
Xu D., Tian Y. A comprehensive survey of clustering algorithms. Annals of Data Science, 2015, vol. 2 (2), pp. 165–193. DOI: https://doi.org/10.1007/s40745-015-0040-1
Zhang T., Ramakrishnan R., Livny M. BIRCH: An efficient data clustering method for very large databases. ACM Sigmod Record, 1996, vol. 25 (2), pp. 103–114. DOI: https://doi.org/10.1145/235968.233324

Date of the publication 30.04.2024

Analysis of the effectiveness of clustering algorithms for multimodal samples using computer simulation of an educational experiment

System requirements

Copyright and Content Licensing

User login

Search form

Analysis of the effectiveness of clustering algorithms for multimodal samples using computer simulation of an educational experiment

System requirements

Copyright and Content Licensing