Making WAF ML models go brrr: saving decades of processing time

In the context of the WAF Attack Score, which is Cloudflare's machine learning (ML)-powered layer built on top of their Web Application Firewall (WAF), the feature's popularity has driven the need for performance improvements. The feature's goal is to complement the WAF and detect attack bypasses that haven't been encountered before, catching zero-day vulnerabilities and enhancing customer protection against emerging and unknown threats. To optimize the performance of the WAF ML model, several steps were taken. First, the pre-processing phase was optimized by using Aho-Corasick DFA to replace hash map lookups, which reduced latency by 47.84% to 1.64x. Then, a match-based approach was used, which further reduced latency by 53.45% to 2.15x. Next, a new letters replacement implementation was developed, which doubled the preprocessing speed and made it four to five times faster than the original version. After that, a branchless ngram lookup was implemented, which reduced latency by 91.33% to 11.54x. This was achieved by precomputing offset lookup tables and using them to replace comparison operations with direct memory accesses, which are much faster and do not involve branching. Finally, the model inference was optimized by enabling SIMD matrix multiplication and using XNNPack, which reduced latency by 77.17% to 4.38x. Additionally, an LRU cache was implemented to skip redundant executions and further reduce latency. Overall, the optimizations reduced WAF ML execution time by ~81.90%, from 1519 to 275 microseconds, or 5.5x faster.

blog.cloudflare.com

TheNote.app (macOS, iOS and Android apps)

2024-07-25

Create attached notes ...