This repository contains the implementation and experimental framework for the research paper:
Malware Classification Using Deep Learning
Published in the 2025 IEEE International Conference on Emerging Technologies in Computing and Communication (ETCC)
The project explores the effectiveness of machine learning (ML) and deep learning (DL) techniques for malware classification. Experimental results demonstrate that deep learning models—particularly Recurrent Neural Networks (RNNs)—significantly outperform traditional ML models, achieving high accuracy and robustness against evolving malware threats.
📄 IEEE Xplore Link:
https://ieeexplore.ieee.org/document/11108579
Traditional signature-based malware detection systems struggle with:
- Zero-day attacks
- Polymorphic and metamorphic malware
- Obfuscated malicious binaries
This research addresses these challenges by leveraging deep learning architectures (CNN and RNN) that can learn complex static and behavioral patterns from malware data, as validated in the published IEEE ETCC 2025 paper.
- Machine Learning Models
- Random Forest
- Logistic Regression
- Support Vector Machine (SVM)
- Deep Learning Models
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- Hybrid Analysis
- Combines static and behavioral malware features
- High Performance
- RNN Accuracy: 99.11%
- CNN Accuracy: 96.7%
- Evaluation Metrics
- Accuracy, Precision, Recall, F1-score
- Confusion Matrix Analysis
- Research-Grade Methodology
- Cross-validation
- Hyperparameter optimization
- Feature selection (PCA, RFE)
| Model | Accuracy |
|---|---|
| Logistic Regression | ~92% |
| Random Forest | ~94% |
| CNN | 96.7% |
| RNN | 99.11% |
Key Insight:
RNN models outperform other approaches due to their ability to capture sequential and behavioral malware patterns, making them highly effective for stream-based and real-time malware classification.
Ensure the following dependencies are installed:
- Python 3.8 or later
- TensorFlow / Keras (or PyTorch)
- NumPy
- Pandas
- Scikit-learn
- Matplotlib
Place the dataset in the project directory and run preprocessing:
python preprocess.pyMachine Learning Models
python train_ml.py --model random_forestpython train_ml.py --model logistic_regressionDeep Learning Models
python train_dl.py --model cnn --epochs 50 --batch_size 32python train_dl.py --model rnn --epochs 50 --batch_size 32python evaluate.py --model rnnpython predict.py --model rnn --input_file path/to/samplemalware-classification/
├── MalwareData/ # Malware and benign datasets
├── malware_classification/ # Jupyter notebooks for experiments and analysis
├── README.md # Project documentation
If you use this work in your research, please cite:
Lokesh J. et al. Malware Classification Using Deep Learning 2025 IEEE International Conference on Emerging Technologies in Computing and Communication (ETCC) IEEE Xplore: https://ieeexplore.ieee.org/document/11108579
- Transformer-based malware classification models
- Explainable AI (XAI) for improved interpretability
- Adversarial training for robustness against evasion attacks
- Federated and privacy-preserving learning approaches
- Real-time malware detection in resource-constrained environments
Contributions are welcome! Please open an issue or submit a pull request for improvements, extensions, or optimizations.