To handle the large number of malware samples appearing in the wild each day, security analysts and vendors employ automated tools to detect, classify and analyze malicious code. Because malware is typically resistant to static analysis, automated dynamic analysis is widely used for this purpose. Executing malicious software in a controlled environment while observing its behavior can provide rich information on a malware’s capabilities. However, running each malware sample even for a few minutes is expensive. For this reason, malware analysis efforts need to select a subset of samples for analysis. To date, this selection has been performed either randomly or using techniques focused on avoiding re-analysis of polymorphic malware variants. In this paper, we present a novel approach to sample selection that attempts to maximize the total value of the information obtained from analysis, according to an application-dependent scoring function. To this end, we leverage previous work on behavioral malware clustering and introduce a machine-learning-based system that uses all statically-available information to predict into which behavioral class a sample will fall, before the sample is actually executed. We discuss scoring functions tailored at two prac- tical applications of large-scale dynamic analysis: the compilation of network blacklists of command and control servers and the gen- eration of remediation procedures for malware infections. We implement these techniques in a tool called FORECAST. Large-scale evaluation on over 600,000 malware samples shows that our prototype can increase the amount of potential command and control servers detected by up to 137% over a random selection strategy and 54% over a selection strategy based on sample diversity.
@inproceedings{Neugschwandtner2011ForeCast_-, title = {{ForeCast - Skimming off the Malware Cream}}, author = {Neugschwandtner, Matthias and Milani Comparetti, Paolo and Jacob, Gregoire and Kruegel, Christopher}, booktitle = {Proceedings of the 27th Annual Computer Security Applications Conference}, series = {ACSAC}, month = {December}, year = {2011} }