Malware Classification using Deep Learning and Machine Learning

Overview

This repository contains the implementation and experimental framework for the research paper:

Malware Classification Using Deep Learning
Published in the 2025 IEEE International Conference on Emerging Technologies in Computing and Communication (ETCC)

The project explores the effectiveness of machine learning (ML) and deep learning (DL) techniques for malware classification. Experimental results demonstrate that deep learning models—particularly Recurrent Neural Networks (RNNs)—significantly outperform traditional ML models, achieving high accuracy and robustness against evolving malware threats.

📄 IEEE Xplore Link:
https://ieeexplore.ieee.org/document/11108579

Research Motivation

Traditional signature-based malware detection systems struggle with:

Zero-day attacks
Polymorphic and metamorphic malware
Obfuscated malicious binaries

This research addresses these challenges by leveraging deep learning architectures (CNN and RNN) that can learn complex static and behavioral patterns from malware data, as validated in the published IEEE ETCC 2025 paper.

Features

Machine Learning Models
- Random Forest
- Logistic Regression
- Support Vector Machine (SVM)
Deep Learning Models
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
Hybrid Analysis
- Combines static and behavioral malware features
High Performance
- RNN Accuracy: 99.11%
- CNN Accuracy: 96.7%
Evaluation Metrics
- Accuracy, Precision, Recall, F1-score
- Confusion Matrix Analysis
Research-Grade Methodology
- Cross-validation
- Hyperparameter optimization
- Feature selection (PCA, RFE)

Key Results

Model	Accuracy
Logistic Regression	~92%
Random Forest	~94%
CNN	96.7%
RNN	99.11%

Key Insight:
RNN models outperform other approaches due to their ability to capture sequential and behavioral malware patterns, making them highly effective for stream-based and real-time malware classification.

Prerequisites

Ensure the following dependencies are installed:

Python 3.8 or later
TensorFlow / Keras (or PyTorch)
NumPy
Pandas
Scikit-learn
Matplotlib

Usage

1️Prepare the Dataset

Place the dataset in the project directory and run preprocessing:

python preprocess.py

Train the Models

Machine Learning Models

python train_ml.py --model random_forest

python train_ml.py --model logistic_regression

Deep Learning Models

python train_dl.py --model cnn --epochs 50 --batch_size 32

python train_dl.py --model rnn --epochs 50 --batch_size 32

Evaluate the Models

python evaluate.py --model rnn

Predict Malware Samples

python predict.py --model rnn --input_file path/to/sample

📂 Directory Structure

malware-classification/
├── MalwareData/                # Malware and benign datasets
├── malware_classification/     # Jupyter notebooks for experiments and analysis
├── README.md                   # Project documentation

Publication

If you use this work in your research, please cite:

Lokesh J. et al. Malware Classification Using Deep Learning 2025 IEEE International Conference on Emerging Technologies in Computing and Communication (ETCC) IEEE Xplore: https://ieeexplore.ieee.org/document/11108579

Future Work

Transformer-based malware classification models
Explainable AI (XAI) for improved interpretability
Adversarial training for robustness against evasion attacks
Federated and privacy-preserving learning approaches
Real-time malware detection in resource-constrained environments

Contributing

Contributions are welcome! Please open an issue or submit a pull request for improvements, extensions, or optimizations.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
MalwareData.csv		MalwareData.csv
README.md		README.md
malware_clasification.ipynb		malware_clasification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware Classification using Deep Learning and Machine Learning

Overview

Research Motivation

Features

Key Results

Prerequisites

Usage

1️Prepare the Dataset

Train the Models

Evaluate the Models

Predict Malware Samples

📂 Directory Structure

Publication

Future Work

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Malware Classification using Deep Learning and Machine Learning

Overview

Research Motivation

Features

Key Results

Prerequisites

Usage

1️Prepare the Dataset

Train the Models

Evaluate the Models

Predict Malware Samples

📂 Directory Structure

Publication

Future Work

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages