Anomaly-Based Financial Fraud Detection Using Autoencoder A Case Study on the Kaggle Credit Card Dataset

Andri Nata; Dudes Manalu; Jaya Tata Hardinata; Peniel Sam Putra Sitorus

doi:10.56313/jictas.v4i1.431

Authors

Andri Nata Universitas Royal
Dudes Manalu Universitas HKBP NOMMENSEN Pematangsiantar
Jaya Tata Hardinata Universitas HKBP NOMMENSEN Pematangsiantar
Peniel Sam Putra Sitorus Universitas HKBP NOMMENSEN Pematangsiantar

DOI:

https://doi.org/10.56313/jictas.v4i1.431

Keywords:

Autoencoder, Anomaly Detection, Credit Card Transactions, Financial Fraud Detection, Unsupervised Learning

Abstract

Financial fraud remains a critical challenge for banking systems and digital payment platforms worldwide. With the rapid growth of electronic transactions, effective fraud detection mechanisms are essential to ensure security and user trust. This study explores the application of an unsupervised deep learning model—Autoencoder—for anomaly-based financial fraud detection. Utilizing the publicly available Kaggle Credit Card Fraud Detection dataset, which comprises 284,807 transactions including 492 fraudulent cases, the model is trained exclusively on legitimate transactions to learn typical behavioral patterns. Prior to training, the dataset underwent feature anonymization using Principal Component Analysis (PCA), and numerical columns such as "Amount" and "Time" were normalized using Min-Max Scaling. The Autoencoder architecture includes three encoder and decoder layers with ReLU activations, and is optimized using the Adam optimizer with Mean Squared Error (MSE) as the loss function. Experimental results show that the model achieves a classification accuracy of 94% and an AUC score of 0.931, indicating strong potential for detecting anomalies. However, the precision for identifying fraudulent transactions remains relatively low (5%), reflecting the challenges posed by imbalanced datasets. Despite this, the study demonstrates that Autoencoder offers a promising foundation for fraud detection systems, with further improvements possible through model integration and hybrid ensemble techniques

References

R. J. Bolton and D. J. Hand, “Statistical fraud detection: A review,” Statistical Science, vol. 17, no. 3, pp. 235–255, 2002.

ACFE, “Report to the Nations: 2022 Global Study on Occupational Fraud and Abuse,” Association of Certified Fraud Examiners, 2022.

S. Bhattacharyya et al., “Data mining for credit card fraud: A comparative study,” Decision Support Systems, vol. 50, no. 3, pp. 602–613, 2011.

E. Aleskerov, B. Freisleben, and B. Rao, “CARDWATCH: A neural network based database mining system for credit card fraud detection,” in Proc. IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr), 1997, pp. 220–226.

P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in Proc. ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 37–50.

R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,” arXiv preprint arXiv:1901.03407, 2019.

A. Dal Pozzolo et al., “Credit card fraud detection: A realistic modeling and a novel learning strategy,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8, pp. 3784–3797, 2018.

H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, 2009.

M. Goldstein and S. Uchida, “A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data,” PLoS ONE, vol. 11, no. 4, p. e0152173, 2016.

N. Liu et al., “Smart fraud detection system using hybrid learning,” in Proc. Int. Conf. Big Data (Big Data), 2020, pp. 2042–2048.

Y. Sahin and E. Duman, “Detecting credit card fraud by decision trees and support vector machines,” in Proc. Int. MultiConf. Eng. Comput. Sci., 2011, vol. 1, pp. 442–447.

C. Phua, V. Lee, K. Smith, and R. Gayler, “A comprehensive survey of data mining-based fraud detection research,” arXiv preprint arXiv:1009.6119, 2010.

T. Fawcett and F. Provost, “Adaptive fraud detection,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 291–316, 1997.

R. J. Barse, H. Kvarnstrom, and H. Jonsson, “Synthesizing test data for fraud detection systems,” in Proc. 19th Annual Computer Security Applications Conf., 2003, pp. 384–395.

J. West and M. Bhattacharya, “Intelligent financial fraud detection: A comprehensive review,” Comput. Security, vol. 57, pp. 47–66, 2016.

G. Zanin and B. Bruno, “Financial fraud detection using anomaly detection techniques,” in Proc. 14th Int. Conf. Information Fusion, 2011, pp. 1–7.

B. Baesens, V. Van Vlasselaer, and W. Verbeke, “Fraud analytics using descriptive, predictive, and social network techniques,” John Wiley & Sons, 2015.

C. Chen et al., “Using random forest to learn imbalanced data,” in Proc. IEEE Int. Conf. Systems, Man, and Cybernetics, 2004, vol. 4, pp. 3210–3215.

T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets,” PLoS ONE, vol. 10, no. 3, p. e0118432, 2015.

D. Dua and C. Graff, “UCI Machine Learning Repository,” [Online]. Available: https://archive.ics.uci.edu/ml, 2017.

B. Yanto et al., “Penerapan algoritma deep learning convolutional neural network dalam menentukan kematangan buah jeruk manis berdasarkan citra RGB,” J. Teknol. Inform. dan Ilmu Komput., vol. 10, no. 1, pp. 59–66, 2023.

B. Yanto et al., “Implementation of hue saturation intensity (HSI) color space transformation algorithm with RGB color brightness in assessing tomato fruit maturity,” RJOCS, vol. 9, no. 2, pp. 167–178, 2023.

B. Yanto et al., “Penerapan Algoritma HSI dengan ruang warna RGB dan implementasi aplikasi kematangan buah tomat,” J. Praktik Keinsinyuran, vol. 1, no. 1, pp. 33–40, 2024.

M. A. Mukti et al., “Akurasi 12 Layer CNN untuk jenis tumor otak dari hasil citra MRI dengan Google Colab dan dataset Kaggle,” RJOCS, vol. 10, no. 2, pp. 135–145, 2024.

H. Z. Yuan et al., “Implementing image processing for quality inspection of car air conditioning vents,” Eng. Proc., vol. 84, no. 1, p. 46, 2025.

A. D. Deva et al., “Klasifikasi prediksi penyakit paru-paru normal dengan pneumonia berdasarkan citra X-ray dengan optimasi adam CNN,” RJOCS, vol. 10, no. 2, pp. 146–155, 2024.

B. Yanto, “Penerapan Algoritma Deep Learning CNN dalam Menentukan Kematangan Buah Jeruk Manis Berdasarkan Citra RGB,” J. Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 2, pp. 125–132, 2023.

Anomaly-Based Financial Fraud Detection Using Autoencoder A Case Study on the Kaggle Credit Card Dataset

Authors

DOI:

Keywords:

Abstract

References

Published

How to Cite

Issue

Section