How to make your maps, try_emplace and smart pointers play nicely with each others in C++17.
Posted on Sun 18 November 2018 in C++
Trivia:
Lately, I have been working on the reincarnation of a class at work: a hash map.
While this class had interesting internals (a sort of dense hash map) and performed really well, its interface was not up to standard both literally and metaphorically.
After much of lipstick applied to it, the class now fully mimic the interface of the beloved std::unordered_map from the standard library.
A close look on std::unordered_map
and its sister std::map reveals few interesting design choices.
Combining this interface with some smart pointer types can present some challenges to squeeze performance out of your maps.
We will explore these challenges in this blog post, and try to figure out some solutions.
Disclaimer: C++ being C++, I would not be suprise if 1) I wrote some unacurracies here 2) Some guru could reduce this entire article in a phantasmagoric one liner.
Some peculiar modifier member functions:
Note: This part of the post will serve as a reminder for some of the folks that are not well versed in the associative containers of the standard. If you are confident, you can always jump straight to the dilemma part.
insert:
If you observe the interface of the associative containers (like std::map or std::unordered_map) in the current standard you will notice that there are 6 member functions to map a value to a given key: insert
, insert_or_assign
, emplace
, emplace_hint
, try_emplace
and the subscript operator (operator[]). That number does not include all the overloads for each of these member functions. It is not a wonder that a lot of C++ users will tend to do suboptimal calls to insert values in their associative containers, the choice is not always obvious when you have 6 different functions with slightly different behaviour.
Typically, you will often this pattern within a code-base:
std::unordered_map<std::string, std::string> m;
// Check if the key is already in m.
if (m.find("johannes") == m.end()) { // Often written as m.count("johannes") == 0
m["johannes"] = "lucio"; // If not insert they key
}
Little did your colleague, boss, or tired ego know that such a code will do twice a relatively costly job: checking the existence of the key in the map.
Indeed, in the case of a std::unordered_map
, the key "johannes"
will be hashed twice: in find
and in the operator[]
. In both member functions, the std::unoredered_map
has to know in which bucket the key will fall into. Worst! If you are having collisions between your keys, checking the existence of a key may induce up to N comparisons (even your hash function can be drunk sometimes) where N is the amount of stored key-value pairs. Potentially mutiplying these comparisons by two is not something you should desire. Such a situation in std::map
is even worst, this will always bring roughly O(log(N)) comparisons. Comparing two keys may not always be as cheap as it seems and if you add on top of that the cost of jumping through a linked list of nodes, this should be considered harmful.
Obviously the answer to this problem is to use insert. insert
as its name implies, will only work if it can insert the key in the associative container, meaning that the insertion will not happen if the same key is already in the map. If you really care to know whether the insertion happend, insert
will return a pair
of an iterator and a boolean that you can query. The iterator points to the newly inserted key-value pair or the already existing one, the boolean indicates whether the insertion happened or not.
// Use C++17 structured bindings and class template argument deduction (CTAD)
// See more on my previous post on that topics
auto [it, result] = m.insert(std::pair("johannes", "lucio")); // Construct a pair and insert it if needed.
// Do whatever you want with it and result.
Here only one check for the existence will be done, that is much better, isn't it?
Well, while this snippet is shorter and performs better, there is still room for improvement.
Here we are constructing a pair
out of the map, the same pair
that needs to be created in a node of the map
.
Internally a call to the move-constructor of the pair
will be done, or way worst a call to the copy-constructor if one of the two types in the pair
cannot be moved.
Relying on the move-constructor of a pair
to exist AND to be performant is too much of a wishful thinking.
emplace:
Thanksfully, C++11 added on many containers a new member function called emplace. Given a container of type T
, emplace
will accept the arguments necessary for an in-situ construction of a new instance of T
. Meaning that we can easily improve our insertion in this way:
auto [it, result] = m.emplace("johannes", "lucio"); // Construct the pair of "johannes", "lucio" straight into m.
I will go slightly against abseil's recommendation and say that emplace
should be prefered over insert
in C++11 for at least all the associative containers. It will perform better (as explained previously) and it also feels more natural (most users think of a key and a value, not a std::pair)!
Now emplace
on associative containers has a vicious specification which cppreference gently warns you about. For some obscure reasons, even if the emplace operation does not succeed since the key already exists, your arguments passed as a r-value may have been moved anyway.
More vaguely, the standard mandates effects only on whether the insertion will happen or not and the return value:
Effects: Inserts a value_type object t constructed with std::forward<Args>(args)... if and only if there is no element in the container with key equivalent to the key of t.
The bool component of the returned pair is true if and only if the insertion takes place, and the iterator component of the pair points to the element with key equivalent to the key of t.
This cryptic language-lawyer text does not explain in any way what happens to the arguments in the case of failure. For what we know, they could be sent over smoke signal up to Hong-Kong and inserted into some fortune cookies. Why would we care about that? Well, because that will restrain you to write such code:
std::unordered_map<std::string, std::unique_ptr<int>> m;
auto my_precious_int = std::make_unique<int>(42);
auto [it, result] = m.emplace("ricky", std::move(my_precious_int)); // We need to move unique pointers.
if (!result) { // Alright the insertion failed, let's do something else with my_precious_int.
do_something_else(*my_precious_int); // Or can we?
}
Here my_precious_int
is unusable right after the call to emplace
, it may have been moved-from forever and ever, EVEN if the insertion result is false
.
Some confused souls will tell you that this is evident, we called std::move
, so it MUST be moved-from. Like the famous cake, std::move
is a lie! It does not move anything, it simply casts objects to a x-value which makes them POTENTIALLY moveable-from (this world would have been better if std::move
was named std::to_xvalue
, std::moveable
, std::to_be_moved
...).
try_emplace:
This unleashed emplace
is a real pain when you are trying to store move-only types in any associative container.
The standard committee was aware of the issue and fixed it in C++17 with another member function called try_emplace.
Here are the expected effects:
Effects: If the map already contains an element whose key is equivalent to k, there is no effect.
Otherwise inserts an object of type value_type constructed with piecewise_construct, forward_as_tuple(std::move(k)), forward_as_tuple(std::forward<Args>(args)...).
It effectively prevents your arguments to be moved-from (as well as being packaged into fortune cookies) in the case of an insertion failure.
Why wasn't emplace
simply patched?
If you take a look at the definition of emplace, you will understand that it accepts as a key argument any object with a type compatible with your key type.
Unlike the value argument, the key argument ALWAYS need to be somehow converted to the key type to check for its existence in the map.
The potential conversion would defeat the "there is no effect" policy.
try_emplace
is stricter and only takes a key of type key type, which guarantees that no conversion sequence will be triggered.
At least try_emplace
can help us to safely rewrite the previous example:
std::unordered_map<std::string, std::unique_ptr<int>> m;
auto my_precious_int = std::make_unique<int>(42);
auto [it, result] = m.try_emplace("ricky", std::move(my_precious_int)); // We need to move unique pointers.
if (!result) { // Alright the insertion failed, let's do something else with my_precious_int.
do_something_else(*my_precious_int); // It is safe!
}
Hurray! After three corrections in the standard, we can effectively mix associative containers with unique_ptrs
for a maximum of fun and profit!
Well... no. C++ being C++, it cannot be that easy. There is one last boss to slain.
The last dilemma of a map of unique_ptrs:
Before I start on this topic, I would like to remind everyone a basic rule: you should try to avoid heap allocations.
You should always strive to get a std::map<T1, T2>
over a std::map<T1, std::unique_ptr<T2>>
.
Now that being said, you may have situations where you cannot do otherwise:
- If you need runtime polymorphism. For instance, you may need to store services into a
std::map<std::string, std::unique_ptr<service>>
whereservice
is an interface with multiples concrete implemetations. Although, there are always ways to hide the inheritance as explained by Sean Parent... - If your weapon of choice is a map following closely the interface of
std::unordered_map
orstd::map
minus the stable addressing part of it. This is often the case for all the hash map with excellent performance, like skarupe's one. Not having stable addressing means that querying the address of a value in the map&map["jeremy"]
might give you different results if you do any modifying operations (insert, erase...) on the map. In such case, having an extra indirection (heap allocation) will bring back stable addressing.
Not only dealing with pointers (even smart ones) is often tedious, but it can also ruin the try_emplace
member function of your class.
You will have to choose between a costly object creation or the dreaded double key lookup I mentioned right at the beggining of this post.
Pick your poison!
Uncessary object creation or double key lookup:
Let's keep the idea of a map of services: std::map<std::string, std::unique_ptr<service>>
, and you would like to register a service "file_locator" only if there was none registered earlier.
Hastily, you may write such code:
std::map<std::string, std::unique_ptr<service>> m;
// ...
// Create a file locator that explore a file system on a remote server.
// remote_file_locator implements the service interface.
auto [it, result] = m.try_emplace("file_locator", std::make_unique<remote_file_locator>("8.8.8.8", "/a_folder/"));
if (!result) {
// Print which file_locator is already in there.
log("Could not register a remote_file_locator, it has been overridden by: ", it->second.name());
}
If the remote_file_locator
is successfully registered, everything is fine!
But in the other scenario where a file_locator
is already in the map, this code has a huge pessimisation.
Here your compiler will emit code that allocate enough memory for a remote_file_allocator
, then it will construct it, under any circumstances.
If allocating can be seen as slow in the C++ world, starting a connection to a server is pure hell when it comes to speed.
If you are not planning to use the instance of this really costly object, why would you create it in the first place?
So shall we revert to the double lookup?
auto it = m.find("file_locator");
if (it == m.end()) {
m["file_locator"] = std::make_unique<remote_file_locator>("8.8.8.8", "/a_folder/");
} else {
log("Could not register a remote_file_locator, it has been overridden by: ", it->second.name());
}
Hell no! I already explained why I discourage you to use such a pattern in the first part of this post. You could argue that here we will only pay this double lookup overhead only once, when we try to create the "remote_file_locator". Given more time and coffee, I should be able to come up with an architecture where you could do such insertions of unique_ptrs in a loop. In any case, C++ is all about not paying in performance for uncessary things.
But don't worry, C++ being C++, there surely are ways to get around this impediment.
Two clumsy solutions:
I, personally, could come up with two solutions. If you have a better one, you are welcome to express it in the comments.
The first one is actually not so hacky. You can start by trying to emplace an empty unique_ptr
, if the insertion works you can always fix it afterwards with a real allocation:
// Try to emplace an empty `unique_ptr` first.
auto [it, result] = m.try_emplace("file_locator", nullptr);
if (result) {
// The insertion happened, now we can safely create our remote_file_locator without wasting any performance.
it->second = std::make_unique<remote_file_locator>("8.8.8.8", "/a_folder/");
}
I somehow dislike this solution. It is not consistent with the usage of try_emplace
on more classic types, which do not require any extra step.
It really smells like some kind of two-phases initialisation pattern which are usually frowned upon.
We are temporarily putting our map into a state where "file_locator" cannot be trusted. What if the actual creation of the remote_file_locator
throws an exception?
That would leave the map with a empty "file_locator", that's not great.
My second solution consists in trying to delay the construction of the remote_file_locator
.
To do so, I wrote a very simple helper struct that I called lazy_convert_construct
.
This struct wraps any kind of lambda that acts like factory: the lambda factory returns an instance of a given type, "result_type"
, when called.
If at any point the struct needs to be converted to result_type
, it will call the internal lambda to generate an instance of result_type
.
Any code should speaks for itself, so here is the lazy_convert_construct
beast in all its beauty:
template<class Factory>
struct lazy_convert_construct
{
using result_type = std::invoke_result_t<const Factory&>; // Use some traits to check what would be the return type of the lambda if called.
constexpr lazy_convert_construct(Factory&& factory)
: factory_(std::move(factory)) // Let's store the factory for a latter usage.
{
}
// ↓ Respect the same nowthrow properties as the lambda factory.
constexpr operator result_type() const noexcept(std::is_nothrow_invocable_v<const Factory&>)
// ^ enable ^ the type this struct can be converted to
// conversion
{
return factory_(); // Delegate the conversion job to the lambda factory.
}
Factory factory_;
};
// Example of usage:
auto l = lazy_convert_construct([]{ return 42; });
// ^ CTAD again ^ Factory lambda that returns an int.
int x = l;
// ^ Here l is forced to be converted to an int and will therefore call the lambda to do so.
std::cout << x; // Prints 42.
Note that the lambda will not be called if there is no conversion needed, this makes it having a lazy evaluation.
Note also: after turning all optimisations on, the lazy_convert_construct
entirely disappears and x will be simply initialised by 42 when needed.
The next step is to combine this lazy_convert_construct
with try_emplace
, which works like a charm:
auto [it, result] = m.try_emplace("file_locator", lazy_convert_construct([]{ return std::make_unique<remote_file_locator>("8.8.8.8", "/a_folder/"); }));
lazy_convert_construct
is now able to create a unique_ptr<remote_file_locator>
on-demand. Even with lazy_convert_construct
, try_emplace
will respect its contract: it will not have any side effect on the lazy_convert_construct
object if the key "file_locator"
is already present. Meaning that no conversion will happen if the key already exists.
This rather elegant solution fixes one of the main drawback of the previous one: it never leaves the map in a state with a file_locator
being null.
It is also a one liner!
Benchmark results:
Some of you may still be a bit skeptical on the importance of optimising your queries in your associative containers.
So I wrote a very simple benchmark which explores multiple insertion scenarios on a std::map<std::string, std::unique_ptr<T>>
.
You can fetch it here.
Using clang 6.0 on my Linux laptop, I obtain the following results:
2018-11-18 14:16:17
Running ./fast_try_emplace
Run on (8 X 3600 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------
insertion_double_lookup 636 ns 638 ns 1085075
insertion_construct_before_try_emplace 503 ns 506 ns 1387818
insertion_lazy_convert_try_emplace 503 ns 506 ns 1380488
no_insertion_double_lookup 107 ns 107 ns 6300180
no_insertion_construct_before_try_emplace 8642 ns 8641 ns 80478
no_insertion_lazy_convert_try_emplace 32 ns 32 ns 22393508
Clearly, a sucessful insertion using the double lookup is more expensive than it should. The cost will change depending on the amount of key-value pairs already in the map.
For a failed insertion scenario, my lazy_convert_construct
is also faster than the double lookup. I cannot explain why! Internally, find
and try_emplace
should have the same lookup mechanims.
And of course, creating a costly object and destroying right after is really bad choice. That explains why no_insertion_construct_before_try_emplace
's record is so damn huge compared to the two others cases (I purposely made the type very costly to create for the no insertion cases).
GCC offers similar results, without the mysterious advantage of the try_emplace
+ lazy_convert_construct
over the double lookup in a no insertion scenario.
2018-11-18 14:17:53
Running ./fast_try_emplace
Run on (8 X 3600 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------
insertion_double_lookup 547 ns 543 ns 1310711
insertion_construct_before_try_emplace 493 ns 496 ns 1394343
insertion_lazy_convert_try_emplace 495 ns 500 ns 1000000
no_insertion_double_lookup 44 ns 44 ns 15787599
no_insertion_construct_before_try_emplace 8659 ns 8658 ns 78869
no_insertion_lazy_convert_try_emplace 44 ns 44 ns 15895884
Now we can proudly claim that we solved all the insertion issues (at least those that I am aware of). But somehow it still feels like something is off with the smart pointer types of the standard library. Are we missing something here?
An in_place constructor:
Warning: here you will enter the somehow controversial and imaginary part of my post. I am not claiming that this is the direction C++ should take on these issues, but merely raising my questions on that topic.
Indeed, unique_ptr
and its other smart pointers counsins (shared_ptr
...) are rather special types.
On one hand you could see them as simple RAII wrappers that take care of a basic resource: a pointer.
On the other hand, you could, very arguably (<== note this bold statement), see them as some funky value wrappers with a very special storage (one that implies pointer semantics).
Value wrappers are types that enhance the property of another type(s). The most recent ones in the standard are: std::optional, std::variant and std::any. As expected, all of these new value wrappers have constructors that accepts an instance of the type they are wrapping:
struct A {
A(int args1, std::vector<int> args2) { /* do some with the args */}
};
// Move construct the newly created A into o.
std::optional o(A{42, std::vector<int>{42}});
While such constructors might be sufficient for most of the usages of your value wrappers, sometimes you really want to avoid any move or copy constructors.
The standard committee was proactive and provided another set of constructors to build the wrapped value in-place.
In order to disambiguate with the usual constructors, these new constructors take as a first argument a tag type: std::in_place or std::in_place_type. Here is how std::in_place
works with std::optional
:
// ↓ No CTAD ↓ The arguments needed for constructing a new instance of A.
std::optional<A> o(std::in_place, 42, std::vector<int>{42});
// ^ dispatch to in place constructor.
With such a constructor, we can safely assume that the wrapped instance of A
was built directly in its storage place.
Of course, you can also use this constructor if you are dealing with map::try_emplace
:
std::map<std::string, std::optional<A>> m;
m.try_emplace("bruno", std::in_place, 42, std::vector<int>{42});
// ^ Will construct the wrapped A deep down in the map.
At this point, if you are following my anology between std::unique_ptr
and value wrappers, you could start to question yourself on why we could not get a similar set of constructors for our smart pointers. Maybe something similar to this:
std::unique_ptr<service> x{std::allocate_in_place<remote_file_locator>, 42, std::vector<int>{42}};
// ^ tag ^ concret instance ^ following args.
I intentionally chose a different tag type than std::in_place
for two reasons:
- This constructor is doing more than constructing in place, it does allocate. The user should be informed.
- In a similar fashion to std::in_place_type , we somehow need to encode the concret type we want to instantiate.
With such a constructor in the standard for unique_ptr
, our issue with try_emplace
would become trivial to solve. Just call it!
auto [it, result] = m.try_emplace("file_locator", std::allocate_in_place<remote_file_locator>, "8.8.8.8", "/a_folder/");
The constructor of unique_ptr
that accept the tag allocate_in_place
would be called only if only the key "file_locator" is not in there.
No overhead, simple syntax, you could not ask for more!
As a side effect, my guess is that we could also fully deprecate the usage of make_unique
and make_shared
:
// Before:
auto x = std::make_shared<something>("michel", "christian");
// After
auto x = std::shared_ptr(std::allocate_in_place<something>, "michel", "christian");
Obviously the syntax is far from being terse. The invented feature also does not take in consideration the allocator arguments you could receive in a smart pointer. Me and a colleague promised ourselves to look a bit more into this topic. Whether we will formulate a proposal or just have some afterthoughts, be sure that I will let you informed on that in a further post!
Conclusion:
- If you are inserting a new pair into an associative container consider using
try_emplace
first. - If you cannot use C++17, prefer to use
emplace
overinsert
. - If you are cannot use C++11, I feel sorry for you!
- You can borrow my
lazy_convert_construct
if you are dealing with smart pointers andtry_emplace
, to get a blazzing fast insertion.
A special thanks to my colleague Yo Anes with whom I had a lot of fun discussing this specific topic.