Scalable Email Spam Detection Using BiLSTM with Large-Scale Hybrid Datasets

Patinavalasa Durga Prasad; Suneel Kumar Duvvuri

doi:https://www.doi.org/10.59256/ijrtmr.20260602016

ARCHIVES

Original Article

Scalable Email Spam Detection Using BiLSTM with Large-Scale Hybrid Datasets

Patinavalasa Durga Prasad¹ Suneel Kumar Duvvuri²

¹ Student, M.Sc (Computer Science), Government College (Autonomous), Rajahmundry, Andhra Pradesh, India. ² Assistant Professor, Department of Computer Science, Government College (Autonomous), Rajahmundry, Andhra Pradesh, India.

Published Online: March-April 2026

Pages: 96-105

Cite this article

↗ https://www.doi.org/10.59256/ijrtmr.20260602016

Abstract

View PDF

Email communication continues to play a central role in both personal and organizational interactions, but the increasing volume of unsolicited and malicious messages poses serious challenges. Spam emails are not only disruptive but are also widely used as a medium for phishing attacks, malware distribution, and financial fraud. Traditional filtering techniques, including rule-based systems and classical machine learning models, often rely on keyword frequency and fail to capture the contextual structure of email text, making them ineffective against modern, well-crafted spam. This paper presents a deep learning-based approach for email spam detection using a Bidirectional Long Short-Term Memory (BiLSTM) network. The proposed system focuses on capturing contextual dependencies in both forward and backward directions of the text sequence, enabling improved understanding of semantic relationships within email content. A large-scale dataset was constructed by combining the Email Spam Balanced Dataset and the Enron Spam Dataset, resulting in a corpus of over 126,000 labeled email messages. A comprehensive preprocessing pipeline was applied, including text normalization, tokenization, stopword removal, and stemming. The processed data was converted into numerical sequences using tokenization and padding techniques before being fed into the neural network. Two models were implemented for comparison: a standard LSTM and the proposed BiLSTM architecture. Experimental results demonstrate that the BiLSTM model achieves superior performance with an accuracy of 98.1%, outperforming the LSTM model. Additional evaluation metrics, including precision, recall, and F1-score, confirm the effectiveness of the proposed approach in minimizing false classifications. The results indicate that contextual deep learning models provide a robust and scalable solution for modern email spam detection systems.

Quick Links

Download

Manuscript Template Copyright Form

Policies

Share Article

X

Facebook

Or copy link

https://ijrtmr.com/archives/10.59256/ijrtmr.20260602016

*Instagram doesn't support direct link sharing from web. Copy the link and share it in your Instagram story or post.

ARCHIVES

Scalable Email Spam Detection Using BiLSTM with Large-Scale Hybrid Datasets

Cite this article

Abstract

Related Articles

A Strategic Framework for Depth-Dependent Hydroelectric Conversion along the Indian Coastline

Reimagining Development in India: A Critical Analysis of the Viksit Bharat Vision

AI-Enabled Image Description: Bridging the Gap for the Visually Impaired

Perceived Occupational Risks of Emergency Medical Services Personnel

Origin, Growth and recent Development of Integrated Reporting (IR): A theoretical Review

Smart Hostel Management System

PlumX Metrics

Dimension

Quick Links

Download

Policies

Share Article