MalwareClassificationML

There are 7 families of malware used for training and prediction in this machine learning problem. Each is assigned a numerical value from 0 to 6 as follows:

(BHO, CeeInject, FakeRean, OnLineGames, Renos, Vobfus, Winwebsec) = (0, 1, 2, 3, 4, 5, 6)

The feature vectors were developed by extracting a 328 byte vector from each of the malware sample files. First 64 bytes are extracted, followed by appending 264 bytes that begin at the offset that appears in 60th byte. Thus we get a 328 byte feature vector for each sample. There are 6300 such labelled samples.

For the word2vec data, the byte feature vectors are used to compute embedding vectors of length N = 2 and window size W = 7. Each word2vec feature vector is of length 512.

The Model_Predictions_and_Actual.xlsx contains the comparison of each of these models by the results they predict for the unlabelled data of 700 samples, with the actual labels of this data. Random Forest Classifier seems to work best for this multi-class problem, with an accuracy of 97.14%.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
MalwareClfCNN.ipynb		MalwareClfCNN.ipynb
MalwareClfLSTM.ipynb		MalwareClfLSTM.ipynb
MalwareClfMLP.ipynb		MalwareClfMLP.ipynb
MalwareClfOneVsRest.ipynb		MalwareClfOneVsRest.ipynb
MalwareClfRF.ipynb		MalwareClfRF.ipynb
Model_Predictions_and_Actual.xlsx		Model_Predictions_and_Actual.xlsx
README.md		README.md
labelledW2Vec.csv		labelledW2Vec.csv
labelledfeatures.csv		labelledfeatures.csv
unlabelledW2Vec.csv		unlabelledW2Vec.csv
unlabelledfeatures.csv		unlabelledfeatures.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MalwareClassificationML

About

Uh oh!

Releases

Packages

Languages

botdotcom/MalwareClassificationML

Folders and files

Latest commit

History

Repository files navigation

MalwareClassificationML

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages