Backend Performance Study
Built a microbenchmark suite to study how cache locality, false sharing, and lock contention affect service latency and throughput under concurrency, and to analyze root causes of latency spikes in a storage data plane.
What I did
- Designed benchmarks to isolate cache locality patterns, false sharing between threads, and lock contention under different workload profiles
- Used
perfto measure CPU cycles, cache misses, and scheduler effects across single-threaded and multi-threaded code paths - Used
valgrindtooling to profile memory behavior and identify bottlenecks - Applied optimizations such as improving data locality, cache-line padding, and memory alignment
What I learned
The project made systems tradeoffs more concrete — especially around concurrency and memory-access patterns. Understanding how the CPU cache hierarchy affects real-world service performance changed how I think about backend data structure design.