Microsoft and Intel have teamed up to explore new ways of detecting malware. The team has published a report of how it’s finding new ways in deep transfer learning, a part of deep learning, machine learning.
This joint research, by far, has achieved a success rate in determining the malware by first converting them into images from binaries, and then evaluating them. This project has laid the foundation for further explorations, says the team.
A New Approach From Deep Learning
Machine learning is now widely used for various causes. One interesting cause of this is to detect malware. Researchers from Intel Labs and Microsoft Threat Intelligence Team have collaborated to find more possibilities of detecting the malware, even before launching on the target machine. This is based on multiple deep learning classifiers like deep transfer learning from computer vision.
The initial research was based on Intel’s deep transfer learning for static malware classification, and the real-world dataset from Microsoft. This resulted in creating a new approach called STAMINA (Static Malware as Image Network Analysis). This approach includes tuning the binaries (suspicious executables) into a grayscale image, and study the textural and structural patterns of it to determine as either malicious or safe.
Static analysis of malware here plays a critical role. This technique provides the metadata of a file, which is then examined by machine learning classifiers on the client and in the cloud to conclude it as safe or not. Further, it’s flexible to build more dynamic and behavioral analysis on top of it to be more comprehensive.
As researchers depicted, the STAMINA approach includes detecting and converting the image (of a binary) into two-dimensional and resized. This is then fed into deep transfer learning to analyze and relate to any of its previous related cases. This was done on a dataset of 2.2 million PE file hashes provided by Microsoft, and the result is satisfying.
The researchers concluded that STAMINA approach applied with a real-world dataset has a recall of 87.05% at a 0.1% false-positive rate. And, an overall recall rate of 99.66% with 99.07% accuracy at 2.58% false positive. These success numbers encouraged the team to collaborate even further in exploring more sophisticated ways of detecting malware. Learn more about the research here.