Machine learning classification for advanced malware detection

Di Troia, Fabio (2020) Machine learning classification for advanced malware detection. (PhD thesis), Kingston University, .

Official URL: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.etho...

Abstract

This introductory document discusses topics related to malware detection via the application of machine learning algorithms. It is intended as a supplement to the published work submitted (a complete list of which can be found in Table 1) and outlines the motivation behind the experiments. The document begins with the following sections: • Section 2 presents a preliminary discussion of the research methodology employed. • Section 3 presents the background analysis of malware detection in general, and the use of machine learning. • Section 4 provides a brief introduction of the most common machine learning algorithms in current use. The remaining sections present the main body of the experimental work, which lead to the conclusions in Section 10. • Section 5 analyzes different initialization strategies for machine learning models, with a view to ensuring that the most effective training and testing strategy is employed. Following this, a purely dynamic approach is proposed, which results in perfect classification of the samples against benign files, and therefore provides a baseline against which the performance of subsequent static approaches can be compared. • Section 6 introduces the static-based tests, beginning with the challenging problem of zero-day detection samples, i.e. malware samples for which not enough data has been gathered yet to train the machine learning models. • Section 7 describes the testing of several different approaches to static malware detection. During these tests, the effectiveness of these algorithms is analyzed and compared with other means of classification. 7 • Section 8 proposes and compares techniques to boost the detection accuracy by combining the scores obtained from other detection algorithms, with a view to improving static classification scores and thus reach the perfect detection obtained with dynamic features. • Section 9 tests the effectiveness of generic malware models by assessing the detection effectiveness of a generic malware model trained on several different families. The experiments are intended to introduce a more realistic scenario where a single, comprehensive, machine learning model is used to detect several families. This Section shows the difficulty to build a single model to detect several malware families.

Official URL:	https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.etho...
Item Type:	Thesis (PhD)
Physical Location:	Online only.
Uncontrolled Keywords:	machine learning; malware detection; clustering; hidden Markov models; support vector machines; dynamic analysis; static analysis
Research Area:	Research Areas > Computer science and informatics
Faculty, School or Research Centre:	Faculty of Science, Engineering and Computing Faculty of Science, Engineering and Computing > School of Computer Science and Mathematics
Date Deposited:	17 Feb 2021 18:34
Last Modified:	04 Jul 2023 11:43
URI:	https://eprints.kingston.ac.uk/id/eprint/48022

Actions (Repository Editors)

Item Control Page