TheUnknownBlog

Back

The Problem#

So I was implementing my own version of standard library containers like std::map. It’s a fantastic learning exercise! I get to the operator[], the access-or-insert function. And I was looking at the existing at() method (which provides bounds-checked access) and think, “Aha! I can reuse at() and just catch the exception if the key isn’t there!”

It seems elegant, right? I wrote something like this:

T &at(const Key &key) {
    if (root == nullptr) {
        throw index_out_of_bound();
    }
    return find(key, root);
}

T &operator[](const Key &key) {
    try {
        return at(key);
    } catch (index_out_of_bound &) {
        // insert
        value_type value(key, T());
        pair<iterator, bool> result = insert(value);
        return result.first->second;
    }
}
cpp

I compile it, feeling pretty good about the code reuse. Then, I run your benchmarks, comparing my sjtu::operator[] against std::map::operator[], especially focusing on scenarios involving insertions (where the key doesn’t initially exist), and boom - Time Limit Exceeded. Why? So I looked at the benchmark script, and it got something like

	//	test: erase()
	while (map.begin() != map.end()) {
		map.erase(map.begin());
	}
	assert(map.empty() && map.size() == 0);
	//	test: operator[]
	for (int i = 0; i < 100000; ++i) {
		std::cout << map[Integer(i)];
	}
	std::cout << map.size() << std::endl;
cpp

So probably you have already identified the problem now, but not so lucky for me. I was just thinking, “Oh, maybe the insert function is slow.”

The Performance Surprise and the Profiling Deep Dive#

The benchmark results are shocking. This implementation is dramatically slower – in my case, it is 88% slower – than std::map specifically when operator[] results in inserting a new element. Accessing existing elements might be fine, but the insert path is killing performance.

What gives? Is your tree balancing algorithm inefficient? Is memory allocation slow? This is where debugging tools become essential. Simple code inspection doesn’t immediately reveal why it’s so much slower as it DO ACHIEVE O(logN)O(\log N) Time complexity.

Time to bring out the profilers. Tools like perf (on Linux) and callgrind (part of the Valgrind suite) are designed to answer the question: “Where is my program actually spending its time?”

  1. Beginning with perf record ./code followed by perf report is a great start as it already provides simple CLI views to see which functions are “hot” – consuming the most CPU cycles. The perf report points towards functions with names _Unwind_Find_FDE, and various functions involved in stack unwinding and exception handling. This already reminded me to focus on some syntax issues (improper coding) instead of my code. However, I’m unfamiliar with something like _Unwind_Find_FDE, so I use callgrind to further view the instruction counts.

Perf Report

  1. Running callgrind: I run valgrind --tool=callgrind ./code. And I am using macOS, so I use qcachegrind to visualize the results.
    • The visualization confirms perf’s findings but with more detail. I can see that when sjtu::operator[] calls sjtu::at and at executes throw, a massive cascade of function calls related to exception handling follows - costing 88% of execution time!!!!!
    • Crucially, callgrind shows the cost associated not just with the throw itself, but with the entire stack unwinding process – the runtime searching for the catch block and meticulously destroying any local objects created within the try block and intervening function calls.

Callgrind Report

The “Aha!” Moment#

The profilers leave no doubt. The performance bottleneck is the deliberate, designed-in overhead of the C++ exception handling mechanism being triggered repeatedly for a non-exceptional condition (key not found during an insertion).

What Actually Happens When C++ Throws an Exception? (And Why Profilers Flag It)#

After chatting with some AI Chatbots and doing some googling, I realize that throwing and catching an exception isn’t just a fancy goto. Instead, it involves a complex runtime process that the profilers pick up as costly operations:

  1. Exception Object Creation: throw std::out_of_range(...) creates an object, often involving dynamic memory allocation (heap allocation shows up in profilers).
  2. Stack Unwinding: (The main cost flagged by profilers) The runtime walks backward up the call stack.
    • It destroys local objects (RAII cleanup). Profilers show time spent in destructors during unwinding.
    • It consults compiler-generated “unwinding tables”. Accessing and processing this data takes time/instructions.
  3. Handler Matching: The runtime checks catch blocks using RTTI, adding overhead.
  4. Control Transfer: Jumping to the catch block disrupts linear execution flow, potentially causing instruction cache misses and branch mispredictions (subtler effects seen in very low-level profiling).

Why is This So Slow Compared to a Simple Check?#

The profiling results, combined with understanding the mechanics, paint a clear picture:

  • Stack Unwinding Overhead: As callgrind showed, walking the stack, looking up cleanup actions, and calling destructors is expensive, especially compared to a simple if check.
  • Runtime Machinery: The hidden machinery (dynamic allocation, RTTI, table lookups) adds significant overhead absent in direct conditional logic.
  • Optimization Barriers: Exception handling constructs can limit compiler optimizations compared to simpler control flow, contributing to higher instruction counts seen in callgrind.

In our operator[] example, the case where the key doesn’t exist is expected. By using exceptions here, we frequently trigger the heavyweight process the profilers flagged, leading to poor performance.

So what does a normal operator[] look like? It should be something like this:

    T &operator[](const Key &key) {
        Node *node = find_node(key, root);
        if (node != nullptr) {
            return node->data.second;
        } else {
            // Insert new element
            value_type value(key, T());
            pair<iterator, bool> result = insert(value);
            return result.first->second;
        }
    }
cpp

and the profiler results should look like something like this:

Normal Report

As you can see in the image, the top CPU-consuming functions are now actual function in your code, not the exception handling machinery. The find_node function is now the most expensive operation, which is expected since it involves O(logN)O(\log N) tree traversal.

Key Takeaway#

Exceptions are vital for unexpected errors. However, they have a cost. Your debugging journey using perf and callgrind perfectly illustrates this: they pinpointed the exception handling mechanism as the source of the slowdown when it was used for expected control flow.

Trust your profiler! When benchmarks show a surprising slowdown, tools like perf and callgrind are indispensable for digging into the why. In this case, they confirmed that C++ exceptions, while powerful, should be reserved for true error conditions, not for everyday logic branches in performance-sensitive code. For predictable paths like “find-or-insert”, stick to conditional logic.

The Hidden Cost of try-catch
https://start-co.de/blog/try-catch
Author TheUnknownThing
Published at April 12, 2025
Comment seems to stuck. Try to refresh?✨