Skip to content

mpjunaid/Malware-Classification_Using_DL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Malware Classification using Deep Learning and Machine Learning

Overview

This repository contains the implementation and experimental framework for the research paper:

Malware Classification Using Deep Learning
Published in the 2025 IEEE International Conference on Emerging Technologies in Computing and Communication (ETCC)

The project explores the effectiveness of machine learning (ML) and deep learning (DL) techniques for malware classification. Experimental results demonstrate that deep learning models—particularly Recurrent Neural Networks (RNNs)—significantly outperform traditional ML models, achieving high accuracy and robustness against evolving malware threats.

📄 IEEE Xplore Link:
https://ieeexplore.ieee.org/document/11108579


Research Motivation

Traditional signature-based malware detection systems struggle with:

  • Zero-day attacks
  • Polymorphic and metamorphic malware
  • Obfuscated malicious binaries

This research addresses these challenges by leveraging deep learning architectures (CNN and RNN) that can learn complex static and behavioral patterns from malware data, as validated in the published IEEE ETCC 2025 paper.


Features

  • Machine Learning Models
    • Random Forest
    • Logistic Regression
    • Support Vector Machine (SVM)
  • Deep Learning Models
    • Convolutional Neural Networks (CNN)
    • Recurrent Neural Networks (RNN)
  • Hybrid Analysis
    • Combines static and behavioral malware features
  • High Performance
    • RNN Accuracy: 99.11%
    • CNN Accuracy: 96.7%
  • Evaluation Metrics
    • Accuracy, Precision, Recall, F1-score
    • Confusion Matrix Analysis
  • Research-Grade Methodology
    • Cross-validation
    • Hyperparameter optimization
    • Feature selection (PCA, RFE)

Key Results

Model Accuracy
Logistic Regression ~92%
Random Forest ~94%
CNN 96.7%
RNN 99.11%

Key Insight:
RNN models outperform other approaches due to their ability to capture sequential and behavioral malware patterns, making them highly effective for stream-based and real-time malware classification.


Prerequisites

Ensure the following dependencies are installed:

  • Python 3.8 or later
  • TensorFlow / Keras (or PyTorch)
  • NumPy
  • Pandas
  • Scikit-learn
  • Matplotlib

Usage

1️Prepare the Dataset

Place the dataset in the project directory and run preprocessing:

python preprocess.py

Train the Models

Machine Learning Models

python train_ml.py --model random_forest
python train_ml.py --model logistic_regression

Deep Learning Models

python train_dl.py --model cnn --epochs 50 --batch_size 32
python train_dl.py --model rnn --epochs 50 --batch_size 32

Evaluate the Models

python evaluate.py --model rnn

Predict Malware Samples

python predict.py --model rnn --input_file path/to/sample

📂 Directory Structure

malware-classification/
├── MalwareData/                # Malware and benign datasets
├── malware_classification/     # Jupyter notebooks for experiments and analysis
├── README.md                   # Project documentation

Publication

If you use this work in your research, please cite:

Lokesh J. et al. Malware Classification Using Deep Learning 2025 IEEE International Conference on Emerging Technologies in Computing and Communication (ETCC) IEEE Xplore: https://ieeexplore.ieee.org/document/11108579


Future Work

  • Transformer-based malware classification models
  • Explainable AI (XAI) for improved interpretability
  • Adversarial training for robustness against evasion attacks
  • Federated and privacy-preserving learning approaches
  • Real-time malware detection in resource-constrained environments

Contributing

Contributions are welcome! Please open an issue or submit a pull request for improvements, extensions, or optimizations.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 100.0%