The Cloudflare Blog: A good day to trie-hard: saving compute 1% at a time

Source URL: https://blog.cloudflare.com/pingora-saving-compute-1-percent-at-a-time
Source: The Cloudflare Blog
Title: A good day to trie-hard: saving compute 1% at a time

Feedly Summary: Pingora handles 35M+ requests per second, so saving a few microseconds per request can translate to thousands of dollars saved on computing costs. In this post, we share how we freed up over 500 CPU cores by optimizing one function and announce trie-hard, the open source crate that we created to do it.

AI Summary and Description: Yes

Summary: The text provides an in-depth account of optimizing the clear_internal_headers function within Cloudflare’s Pingora system, leading to significant CPU utilization savings. The focus on Rust’s performance engineering, the introduction of an optimized trie data structure, and the resultant cost efficiency highlight notable trends in software optimization relevant to professionals in software and infrastructure security.

Detailed Description:

– **Context of Optimization**: Cloudflare’s CDN processes a vast number of HTTP requests, making performance optimization crucial. With 35 million requests per second handled by the pingora-origin service, even minor improvements can lead to substantial savings in CPU usage.

– **Key Function Analysis**:
– The clear_internal_headers function is critical as it sanitizes requests by removing sensitive internal headers before they leave Cloudflare’s infrastructure.
– Initially consuming 1.7% of total CPU time, the function’s optimization was necessary to handle increasing web traffic more efficiently.

– **Benchmarking Performance**:
– The original runtime of the function was benchmarked at 3.65µs. Subsequent iterations aiming to reduce this runtime began with evaluating different methods for optimizing header removal.
– An inversion method was employed to identify and remove internal headers more efficiently, which resulted in improved runtime down to 1.53µs, demonstrating a 2.39x speed gain.

– **Data Structure Enhancements**:
– Initial attempts utilized std::HashMap for header search operations but revealed limitations, especially with key length impacting performance.
– Through the introduction of an optimized trie data structure (trie-hard), further enhancements reduced the average runtime to an impressive 0.93µs.
– Trie-hard’s performance primarily benefits from efficient node relationships and memory management, focused on minimizing key-length-dependent penalties.

– **Real-World Performance Metrics**:
– After deployment in the production environment, sampling techniques confirmed that the optimizations reflected accurate performance expectations, showcasing a significant reduction in actual CPU usage, from the earlier hash-map version.

– **Financial Implications**:
– The overall optimization resulted in an estimated CPU cost saving of over $90,000 per year, underscoring the critical intersection of performance engineering with financial efficiency within cloud infrastructure contexts.

– **Conclusion**:
– The discussion emphasizes the significance of profiling and observability to identify and address performance bottlenecks systematically. The insights gleaned not only support better software practices but also resonate with compliance and cost-effectiveness objectives in the security and cloud computing arenas.

This case study is particularly relevant for professionals engaged in software security and infrastructure optimization, providing practical insights into effective performance management methodologies.