PENERAPAN OCR UNTUK DIGITALISASI DAN PENGARSIPAN DATA DIGITAL
DOI:
https://doi.org/10.37792/jukanti.v8i2.1857Keywords:
Digital Data, OCR, Data Archiving, Pengarsipan DataAbstract
ABSTRAK
Penelitian dalam bidang Optical Character Recognition (OCR) saat ini didorong oleh kemajuan pesat machine learning, ketersediaan dataset yang besar dan kebutuhan akan pemrosesan informasi yang efisien. Penelitian ini berhasil membangun sebuah sistem tertutup untuk mendigitalisasi dan mengarsipkan data dengan mengekstrak teks dari gambar dan file PDF yang terintegrasi dengan database. Sistem dirancang dengan koneksi lokal (LAN) dan dihosting pada localhost untuk menjaga keamanan data sensitif. Inti dari proses digitalisasi ini adalah teknologi OCR dengan mengimplementasikan library Tesseract. Untuk file PDF, proses ekstraksi dibantu oleh library PDF.js yang pertama-tama mengonversi PDF menjadi gambar. Pengujian akurasi sistem dilakukan pada dua jenis dokumen. Hasilnya menunjukkan performa yang sangat tinggi untuk dokumen yang diketik komputer dengan nilai Character Error Rate (CER) 0,16%, di mana kesalahan yang terjadi bersifat minor dan sporadis akibat ambiguitas visual font dan noise pada dokumen. Di sisi lain, kinerja pada dokumen mesin ketik menunjukkan hasil yang memadai namun belum maksimal dengan CER 6,41%. Tingginya error substitusi pada dokumen ini diduga kuat disebabkan oleh kualitas fisik dokumen yang memudar, hasil scan yang buram, karakteristik font mesin ketik yang khas, serta gangguan dari stempel atau tanda tangan. Secara keseluruhan, sistem ini terbukti sangat efektif untuk digitalisasi dokumen komputer dan cukup baik untuk dokumen mesin ketik, sehingga dapat sangat mempermudah proses pengarsipan data yang terorganisir dan aman.
Kata Kunci: Data Digital, OCR, Pengarsipan Data
ABSTRACT
Research in the field of Optical Character Recognition (OCR) is currently driven by rapid advances in machine learning, the availability of large datasets, and the need for efficient information processing. This research has successfully developed a closed system for digitizing and archiving data by extracting text from images and PDF files integrated with a database. The system is designed with a local connection (LAN) and hosted on localhost to maintain the security of sensitive data. At the core of this digitization process is OCR technology, which implements the Tesseract library. For PDF files, the extraction process is aided by the PDF.js library, which first converts PDFs into images. Accuracy testing of the system was conducted on two types of documents. The results showed very high performance for computer-typed documents with a Character Error Rate (CER) of 0.16%, where the errors that occurred were minor and sporadic due to visual font ambiguity and noise in the documents. On the other hand, performance on typewritten documents showed adequate but not optimal results with a CER of 6.41%. The high substitution error rate in these documents is strongly suspected to be caused by the faded physical quality of the documents, blurred scan results, the distinctive characteristics of typewritten fonts, and interference from stamps or signatures. Overall, this system has proven to be very effective for digitizing computer documents and quite good for typewritten documents, thereby greatly facilitating the process of organized and secure data archiving.
Keywords: Digital Data, OCR, Data Archiving
Downloads
References
T. Nasir, M. K. Malik, and K. Shahzad, “MMU-OCR-21: Towards End-to-End Urdu Text Recognition Using Deep Learning,” IEEE Access, vol. 9, pp. 124945–124962, 2021, doi: 10.1109/ACCESS.2021.3110787.
A. L. Firdaus, M. S. Kurnia, T. Shafera, and W. I. Firdaus, “Implementasi Optical Character Recognition (OCR) Pada Masa Pandemi Covid-19,” JUPITER J. Penelit. Ilmu Dan Teknol. Komput., vol. 13, no. 2, pp. 188–194, Oct. 2021.
M. Li et al., “TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models,” Proc. AAAI Conf. Artif. Intell., vol. 37, no. 11, pp. 13094–13102, Jun. 2023, doi: 10.1609/aaai.v37i11.26538.
K. Nisha, T. Wahyuni, and M. A. M. Hayat, “Pemeriksaan KTP Menggunakan Optical Character Recognition (OCR) dan Pengenalan Background serta Komponen KTP,” Arus J. Sains dan Teknol., vol. 2, no. 2, pp. 490–495, Oct. 2024.
R. M. I. Indrakusuma, A. S. Ahmadiyah, and N. F. Ariyani, “Pengenalan dan Klasifikasi Tulisan pada Nota Pembelian Material (Studi Kasus Proyek Konstruksi),” J. Tek. ITS, vol. 10, no. 2, pp. 478–483, 2021.
B. Rahman, “Analisis Manfaat Data Digital Spasial Bagi Desa,” Pondasi, vol. 27, no. 1, p. 88, Jul. 2022, doi: 10.30659/pondasi.v27i1.22891.
Ghifari Aminudin Fad’li, Marsofiyati Marsofiyati, and Suherdi Suherdi, “Implementasi Arsip Digital Untuk Penyimpanan Dokumen Digital,” J. Manuhara Pus. Penelit. Ilmu Manaj. dan Bisnis, vol. 1, no. 4, pp. 01–10, Aug. 2023, doi: 10.61132/manuhara.v1i4.115.
Z. Patmawati and Ismaya, “Strategi Digitalisasi dan Pengelolaan Arsip Elektronik Era Revolusi Industri 4.0 di Dinas Perpustakaan dan Kearsipan Kabupaten Bantul,” in Seminar Nasional Hukum Ilmu Sosial dan Ilmu Politik, Banten: Universitas Terbuka, 2024.
K. D. K. A. Wardani, N. P. I. P. Dewi, and A. A. N. E. S. Gorda, “Optimalisasi Kinerja Karyawan Melalui Pengelolaan Arsip Digital Di Kadin Bali,” J. Soc. Sci. Technol. Community Serv., vol. 4, no. 2, pp. 239–248, 2023.
H. Farahdiba, C. W. Wolor, and Marsofiyati, “Analisis pengelolaan arsip digital pada PT Anugrah Alam Karunia Abadi,” J. Adm. Soc. Sci., vol. 5, no. 1, pp. 41–53, Dec. 2023.
M. Rahman bin Abdullah, “A Review of Intelligent Document Processing Applications Across Diverse Industries,” 2022.
K. Kusnantoro, T. Rohana, and D. S. Kusumaningrum, “Implementasi Metode Tesseract OCR(Optical Character Recognition) untuk Deteksi Plat Nomor Kendaraan Pada Sistem Parkir,” Sci. Student J. Inf., vol. 3, no. 1, pp. 59–67, 2022.
S. M. Angela, A. Eviyanti, and M. I. Mauliana, “PENGEMBANGAN TEKNOLOGI OPTICAL CHARACTER RECOGNITION DI FLUTTER BERUPA DETEKSI TEKS PADA GAMBAR,” J. Tek. Inf. dan Komput., vol. 7, no. 1, p. 17, Jun. 2024, doi: 10.37600/tekinkom.v7i1.1167.
A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification,” in Proceedings of the 23rd international conference on Machine learning - ICML ’06, New York, New York, USA: ACM Press, 2006, pp. 369–376. doi: 10.1145/1143844.1143891.
M. A. Rifqi, M. Awaluddin, and L. M. Sabri, “Perancangan WebGIS Persebaran Rumah Sakit Kota Semarang,” J. Geod. Undip, vol. 12, no. 3, pp. 321–329, Nov. 2023.
M. Arman, “Analisa Jaringan Local Area Network (Lan) Dengan Aplikasi Cisco Packet Tracer Pada PT. Bank Negara Indonesia (Persero) Tbk Kcp Watansoppeng,” J. Ilm. Sist. Inf. dan Tek. Inform., vol. 5, no. 2, pp. 41–50, Oct. 2022, doi: 10.57093/jisti.v5i2.126.
N. A. Karima, A. N. Aisyah, H. V. Silla, L. B. Handoko, and R. R. Sani, “Kriptografi Teks Berbasis Algoritma Substitusi Vigenere Cipher 8 Bit,” J. Masy. Inform., vol. 15, no. 1, pp. 1–13, May 2024, doi: 10.14710/jmasif.15.1.60836.
M. N. Darpito, Kartika Firdausy, and Abdul Fadlil, “Perbandingan Unjuk Kerja Library Optical Character Recognition (OCR) dalam Pengenalan Teks pada Dokumen Digital,” J. Inform. Polinema, vol. 11, no. 3, pp. 273–282, May 2025, doi: 10.33795/jip.v11i3.7025.
H. M. Al-Barhamtoshy, K. M. Jambi, S. M. Abdou, and M. A. Rashwan, “Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image,” IEEE Access, vol. 9, pp. 51242–51257, 2021, doi: 10.1109/ACCESS.2021.3066477.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Justin Clarence Setiawan, Ariya Dwika Cahyono

This work is licensed under a Creative Commons Attribution 4.0 International License.
JUKANTI Journal License
JUKANTI (Jurnal Pendidikan Teknologi Informasi) is committed to supporting open access and the dissemination of scholarly knowledge. All articles published in JUKANTI are distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Creative Commons Attribution 4.0 International License (CC BY 4.0)
Under this license, users are permitted to read, download, copy, distribute, print, search, link to, remix, transform, adapt, and build upon the published work for any lawful purpose, including commercial purposes, provided that appropriate credit is given to the original author(s) and the original publication in JUKANTI, a link to the license is provided, and any changes made are indicated.
Author Rights
Authors retain the copyright of their articles and grant JUKANTI the right of first publication. Authors are also permitted to deposit and share the published version of their articles in institutional repositories, subject repositories, personal websites, and academic networks, provided that the original publication in JUKANTI is properly cited and linked.
Author Obligations
Authors publishing with JUKANTI are responsible for ensuring that their work is original, does not infringe any copyright, and complies with applicable ethical and legal standards. Authors must obtain permission for any third-party material included in their manuscript when required.
License Information
License: Creative Commons Attribution 4.0 International License (CC BY 4.0)
License URL: https://creativecommons.org/licenses/by/4.0/
Commitment to Open Access Standards
JUKANTI is committed to maintaining transparent editorial policies, clear licensing terms, and open access publishing practices in accordance with international scholarly publishing standards.
For further questions or clarifications regarding this license, please contact the JUKANTI editorial office at edu@ucb.ac.id



