The high performance of machine learning in other domains has stimulated significant interest in applying it to network security, however (as noted in ), despite the breakneck pace of major successes with machine learning in many other domains, and the large amount of effort spent to produce machine learning-based intrusion detection systems, in practice most major network defense providers focus continue to use signature-based methods which have been in active use since the late 1990’s.
Drawing on the extensive literature on grammatical analysis, we propose that this is a reflection of a fundamental difference between more conventional domains of machine learning and network security. In particular, because network security – particularly network security applications that focus on analysis of packet contents – operates on the domain of formal grammars that are rigorously interpreted (as compared to the domain of natural language translation, where human intuition can often “fill in the gaps” in translation), it is an intrinsically difficult problem that a) is demonstrably intractable in the most general case, and b) cannot be addressed with the relatively crude features that appear to be most common in the literature. While some modest success has been recently realized in applying sequence-to-sequence models (thus at least partially avoiding the question of feature spaces) for grammatical inference in specific instances of specific protocols , there remains no method to demonstrate that such methods will generalize even to different instances of the same protocol, let alone novel protocols in the same class.
In fact, results from grammatical inference show that there is quite likely no general method that can be applied to arbitrary data to separate benign and malicious traffic; any practical method should therefore be restricted to a particular domain, analyze that domain carefully, and at least attempt to investigate what properties of the protocol under analysis may allow it to be effectively learned. The empirical effectiveness of Snort and Bro signatures suggest that the domain of malicious traffic is likely more tractable, and may be easier to learn. The appearance of particular byte sequences in malicious but not benign traffic can be viewed (informally) as evidence that the class of malicious languages is of finite elasticity (due to the absence of a limit language) within the class of all protocols that can produce accepting inputs to the system under consideration, thus supporting identifiability. Feature representation is also important. N-gram based features in particular will quite often be insufficiently powerful to model complex grammars or protocols; in some cases, sufficiently large values of n may be able to overcome this limitation for specific subclasses of protocols, however this is likely to be highly problem specific, and requires careful evaluation for any given proposed system.
While significant open questions remain – such as methods for performing inference on the restricted classes of grammars that in practical terms make up many existing protocols – the immediate results of applying grammatical inference theory to machine learning for intrusion detection both help explain the lack of widespread adoption of such systems, and suggest appropriate avenues for future work.
- R. Sommer and V. Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” in IEEE symposium on security and privacy, 2010.
- V. Paxson, “Bro: a system for detecting network intruders in real-time,” Computer networks, vol. 31, no. 23, pp. 2435-2463, 1999.
- Roesch, Martin, “Snort: Lightweight Intrusion Detection for Networks,” in LISA , Seattle, Washington, 1999.
- S. Axelsson, “The base-rate fallacy and the difficulty of intrusion detection,” ACM Transactions on Information and System Security (TISSEC), vol. 3, no. 3, pp. 186-205, 2000.
- A. Kott, “Towards fundamental science of cyber security,” in Network Science and Cybersecurity, New York, Springer, 2014, pp. 1-13.
- M. Sipser, Introduction to the Theory of Computation., Boston, MA: Thomson Course Technology, 2006., 2006.
- C. De la Higuera, Grammatical inference: learning automata and grammars, Cambridge University Press, 2010.
- M. E. Gold, “Language identification in the limit,” Information and control, vol. 10, no. 5, pp. 447-474, 1967.
- L. G. Valiant, “A theory of the learnable,” Commun. ACM, vol. 27, no. 11, pp. 1134–1142, 1984.
- M. E. Gold, “Complexity of automaton identification from given data,” Information and control, vol. 37, no. 3, pp. 302-320, 1978.
- M. Kearns and L. Valiant, “Cryptographic limitations on learning Boolean formulae and finite automata,” Journal of the ACM, vol. 41, no. 1, pp. 67-95, 1994.
- D. Angluin, “Learning regular sets from queries and counterexamples,” Information and computation, vol. 75, no. 2, pp. 87-106, 1987.
- V. H. Rajesh Parekh, “Learning DFA from simple examples,” Machine Learning, vol. 44, pp. 9-35, 2001.
- D. Angluin, “Finding patterns common to a set of strings,” in Proceedings of the eleventh annual ACM symposium on Theory of computing, 1979.
- D. Angluin, “Inductive inference of formal languages from positive data,” Information and control , vol. 45, no. 2, pp. 117-135, 1980.
- K. N. Wood and R. E. Harang, “Grammatical Inference and Language Frameworks for LANGSEC,” in 2015 IEEE Security and Privacy Workshops, 2015.
- T. Motoki, T. Shinohara and K. Wright, “Correct definition of finite elasticity,” Research Institute of Fundamental Information Science, Kyushu University Japan, 1990.
- C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin and W.-Y. Lin, “Intrusion detection by machine learning: A review,” Expert Systems with Applications, vol. 36, no. 10, pp. 11994–12000, 2009.
- P. Laskov, P. Düssel, C. Schäfer and K. Rieck, “Learning intrusion detection: supervised or unsupervised?,” in International Conference on Image Analysis and Processing, Heidelberg, 2005.
- S. Axelsson, “Intrusion detection systems: A survey and taxonomy,” 2000.
- K. Wang, J. J. Parekh and S. J. Stolfo, “Anagram: A content anomaly detector resistant to mimicry attack,” in Recent Advances in Intrusion Detection, Heidelberg, 2006.
- R. Perdisci, D. Ariu, P. Fogla, G. Giacinto and W. Lee, “McPAD: A multiple classifier system for accurate payload-based anomaly detection,” Computer Networks , vol. 53, no. 6, pp. 864–881, 2009.
- W. Hu, W. Hu and S. Maybank, “Adaboost-based algorithm for network intrusion detection,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , vol. 38, no. 2, pp. 577–583, 2008.
- P. Sangkatsanee, N. Wattanapongsakorn and C. Charnsripinyo, “Practical real-time intrusion detection using machine learning approaches,” Computer Communications , vol. 38, no. 18, pp. 2227–2235, 2011.
- O. Linda, T. Vollmer and M. Manic, “Neural network based intrusion detection system for critical infrastructures,” in International Joint Conference on Neural Networks, 2009.
- N. Abe, B. Zadrozny and J. Langford, “Outlier detection by active learning,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006.
- H. Xiao, J. Sun, Y. Liu, S.-W. Lin and C. Sun, “Tzuyu: Learning stateful typestates,” in 28th International Conference on Automated Software Engineering , 2013.
- M. Sutton, A. Greene and P. Amini, Fuzzing: brute force vulnerability discovery, Pearson Education, 2007.
- M. Zalewski, “american fuzzy lop,” [Online]. Available: http://lcamtuf.coredump.cx/afl/.
- D. Aitel, “The advantages of block-based protocol analysis for security testing,” Immunity Inc, 2002.
- S. Stolfo, “The Third International Knowledge Discovery and Data Mining,” University of California Irvine, 2002. [Online]. Available: .
- C. Sheridan and R. Harang, “Grammatical Inference and Machine Learning Approaches to Post-Hoc LangSec,” in IEEE Security and Privacy Workshop on Language-Theoretic Security, 2016.