ARCHIVES

Original Article

Scalable Email Spam Detection Using BiLSTM with Large-Scale Hybrid Datasets

Patinavalasa Durga Prasad1 Suneel Kumar Duvvuri2
1 Student, M.Sc (Computer Science), Government College (Autonomous), Rajahmundry, Andhra Pradesh, India. 2 Assistant Professor, Department of Computer Science, Government College (Autonomous), Rajahmundry, Andhra Pradesh, India.

Published Online: March-April 2026

Pages: 96-105

Abstract

Email communication continues to play a central role in both personal and organizational interactions, but the increasing volume of unsolicited and malicious messages poses serious challenges. Spam emails are not only disruptive but are also widely used as a medium for phishing attacks, malware distribution, and financial fraud. Traditional filtering techniques, including rule-based systems and classical machine learning models, often rely on keyword frequency and fail to capture the contextual structure of email text, making them ineffective against modern, well-crafted spam. This paper presents a deep learning-based approach for email spam detection using a Bidirectional Long Short-Term Memory (BiLSTM) network. The proposed system focuses on capturing contextual dependencies in both forward and backward directions of the text sequence, enabling improved understanding of semantic relationships within email content. A large-scale dataset was constructed by combining the Email Spam Balanced Dataset and the Enron Spam Dataset, resulting in a corpus of over 126,000 labeled email messages. A comprehensive preprocessing pipeline was applied, including text normalization, tokenization, stopword removal, and stemming. The processed data was converted into numerical sequences using tokenization and padding techniques before being fed into the neural network. Two models were implemented for comparison: a standard LSTM and the proposed BiLSTM architecture. Experimental results demonstrate that the BiLSTM model achieves superior performance with an accuracy of 98.1%, outperforming the LSTM model. Additional evaluation metrics, including precision, recall, and F1-score, confirm the effectiveness of the proposed approach in minimizing false classifications. The results indicate that contextual deep learning models provide a robust and scalable solution for modern email spam detection systems.

Related Articles

2026

A Strategic Framework for Depth-Dependent Hydroelectric Conversion along the Indian Coastline

2026

Reimagining Development in India: A Critical Analysis of the Viksit Bharat Vision

2026

AI-Enabled Image Description: Bridging the Gap for the Visually Impaired

2026

Perceived Occupational Risks of Emergency Medical Services Personnel

2026

Origin, Growth and recent Development of Integrated Reporting (IR): A theoretical Review

2026

Smart Hostel Management System

Share Article

X
LinkedIn
Facebook
WhatsApp

Or copy link

https://ijrtmr.com/archives/10.59256/ijrtmr.20260602016

*Instagram doesn't support direct link sharing from web. Copy the link and share it in your Instagram story or post.