Preview

Modern Science and Innovations

Advanced search

Assessment of binary prediction of fraudulent advertisements in ATS candidate tracking cloud systems

https://doi.org/10.37493/2307-910X.2025.1.4

Abstract

The abstract describes the construction of a binary classification model for predicting the type of job advertisement in cloud-based ATS (Applicant Tracking Systems) as either legitimate or fraudulent. Various machine learning algorithms can be employed to address this issue. Traditional classification algorithms, including LSVC (Support Vector Machine), GBT (Gradient Boosting Tree), and RF (Random Forest), have been chosen for this study. One approach to building such a model involves identifying and collecting relevant attributes or features that can help distinguish fraudulent job advertisements from legitimate ones. Some features that could be useful in detecting fraudulent job ads include job location, job description, job requirements, job responsibilities, company information, and recruiter data. Subsequently, different machine learning algorithms can be trained on prepared datasets using standard methods such as cross-validation to assess their performance. The performance of the trained models can be evaluated using various metrics such as accuracy, precision, and recall. Ultimately, the most effective model can be selected based on these evaluation metrics and deployed in a production environment, where it can classify job advertisements as fraudulent or legitimate. It's important to note that the model should also undergo continuous evaluation and updates over time to ensure its reliability and effectiveness. Based on the evaluation metrics, it was concluded that the GBT classifier exhibits higher performance and accuracy compared to the LinearSVC and RF classifiers on the given dataset. However, it should be considered that the GBT classifier requires more time for training and prediction; GBT takes 208.738579 seconds, while LSVC and RF take 64.267132 and 71.024914 seconds, respectively. Taking into account the evaluation results, the GBT model was utilized for the operational aspect of the program. For implementation of the prediction, machine learning was performed on GBT, RF, and LSVC using a custom dataset called "Job_Fraud," created based on the publicly available EMSCAD dataset. To address the significant data imbalance, an implementation of the Synthetic Minority Over-sampling Technique (SMOTE) from a library was utilized. Initially, a model was obtained and trained on the data using a classifier, removing stop-words through TFIDFVectorizer in the vector space. Then, after reducing the dimensionality of the data, the data was reloaded, and both the model and vectorizer were retrained before being used for prediction. The tkinter module was used for the graphical interface. The predict() function utilizes the trained model for predictions based on the feature vector.

About the Authors

V. V. Ligi-Goryaev
Kalmyk State University
Russian Federation

Vladimir V. Ligi-Goryaev, Head of the Department

Digital Department

Elista

tel.: +79371935125



G. A. Mankaeva
Kalmyk State University
Russian Federation

Galina A. Mankaeva, Senior Lecturer

Elista



T. B. Goldvarg
Kalmyk State University
Russian Federation

Tatyana B. Goldvarg, Associate Professor

Department of Experimental Physics

Elista

tel.: +79093974451



S. S. Muchkaeva
Kalmyk State University
Russian Federation

Svetlana S. Muchkaeva, Associate Professor

Department of Algebra and Analysis

Elista

tel.: +79054007024



E. N. Dzhakhnaeva
Kalmyk State University
Russian Federation

Elena N. Dzhakhnaeva, Senior Lecturer

Elista

tel.: +79371927755



References

1. Customizable workflows in cloud PBX in Russia. Available from: https://huntflow.ru/ [Accessed 22 August 2023]. (In Russ.).

2. Research of the recruiting systems market: functionality of cloud ATS in Russia. 02. 11. 2021. Available from: https://www.tadviser.ru/a/578060 [Accessed 22 August 2023]. (In Russ.).

3. Screening call with a recruiter: questions that you are most likely to be asked. Available from: https://habr.com/ru/articles/689564//разборискринингзнакомствовблокныхатсвроссии [Accessed 22 August 2023]. (In Russ.).

4. Swetha K, Sravani K. Fake job detection using machine learning approach. Journal of Engineering Sciences. 2023;14(2):67-74.

5. Bondarchuk DV. Selecting the optimal method of data mining for job selection. Informatsionnye tekhnologii modelirovaniya i upravleniya = Modeling and Management of Information Technologies. 2013;84(6):504-513. (In Russ.).

6. Kudryavtsev RV. Organization of Activities to Detect Remote Frauds. Molodoi uchenyi = Young Scientist. 2019;24(262):218-221. Available from: https://moluch.ru/archive/262/60528/ [Accessed 14 August 2023]. (In Russ.).

7. Goryaev VM, Burlykov VD, Proshkin SN, Lidzhi-Garyaev VV, et al. ROC curve and confusion matrix as an effective tool for optimizing machine learning classifiers. Vestnik Bashkirskogo universiteta = Bulletin of Bashkir University. 2023;28(1):22-28. (In Russ.).

8. Laboratory of Information and Communication Systems, University of the Aegean, Samos, Greece. EMSCAD dataset on employment fraud in the Aegean region. 2016. Available from: http://icsdweb.aegean.gr/emscad [Accessed 22 August 2023]. (In Russ.).

9. Goryaev VM, Basangova EO, Bembitov DB, Muchkaeva SS, et al. Study of the performance of various machine learning models in non-invasive blood pressure measurement based on PPG and ECG signals. Bulletin of Bashkir University. 2023;28(1):36-44. (In Russ.).

10. Wong Y, Kamel A. Classification of imbalanced data: a review. International Journal of Pattern Recognition and Artificial Intelligence. doi: 10.1142/S0218001409007326

11. Tabassum H, Ghosh G. Detecting Online Recruitment Fraud Using Machine Learning, 2021 9<sup>th</sup> Int. Conf. Inf. Commun. Technol. ICoICT 2021;472-477. doi: 10.1109/ICoICT52021.2021.9527477

12. Borisov ES. Classifier of texts in natural language. Available from: http://mechanoid.kiev.ua/neural-net-classifier-text.html [Accessed 22 August 2023]. (In Russ.).

13. Coelho LP, Richart V. Building machine learning systems in Python. 2<sup>nd</sup> edition. Transl. from English. Slinkin AAM: DMK Press; 2016. 302 p. (In Russ.).

14. Goryaev VM. Development of a methodology for professional and psychological selection of personnel in an organization taking into account aspects of information security. Modern high technologies. 2021;(12-2):342-347. (In Russ.).


Review

For citations:


Ligi-Goryaev V.V., Mankaeva G.A., Goldvarg T.B., Muchkaeva S.S., Dzhakhnaeva E.N. Assessment of binary prediction of fraudulent advertisements in ATS candidate tracking cloud systems. Modern Science and Innovations. 2025;(1):51-62. (In Russ.) https://doi.org/10.37493/2307-910X.2025.1.4

Views: 53


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2307-910X (Print)