Continuing with “Python in Golang”, the next interesting data structure to dive into is the Counter. In Python, this class is also in the “collections” package and is pretty self-explanatory. Given an iterable input, it will give you a count of each item. You can also get the top N most common items. I thought this would be pretty straightforward, and the first part is, however getting the top N items is a little more tricky. In Python, the Counter class actually inherits from dict and provides additional functionality.
In pseudocode, we might say. When adding an item to the counter, if the item is in the hash map, increase its count otherwise set it to 1. This is actually similar behavior to a defaultdict in Python, but that’s a topic for another day. Now when we want to get the top N items, we could iterate through each item in our map and check for the top N items, but given k as the length of our hash map, we would have O(k) time complexity. I think we can do better.
There’s a really neat data structure called a Heap, which can help us with this problem. Similar to how we use two different data structures in the OrderedDict example, we can use two data structures here to help reduce the time complexity.
Golang has a heap package which we can leverage to build our own data structure. In our case, we can actually follow the priority queue example in the docs, however, our priority is the number of times the item has shown up in our Counter. The Golang specific heap has O(log(n)) complexity for both Pushing and Popping. (note: in our above example, having k as the length of our hash map, O(log(k)) would be the time complexity)
Now that we’re using a heap, we can get a better time complexity and our Counter is complete. Unlike Python, we can’t override a map’s get or set methods for accessing values. We instead need to create our own methods that access the internal map.
A pretty large caveat that I still haven’t figured out is how to access the top N items in the heap without popping then re-pushing them. This makes our MostCommon method have a time complexity of O(n log k), where n is the number of items we’re retrieving and k is the length of our heap. If we could remove the use of popping and pushing again, this method would just be O(n). Since we are popping then pushing again, I added a mutex to make our Counter thread-safe; without this, if another thread called MostCommon in the middle of the first thread, it may return incorrect data.
When working with maps in multiple threads with multiple writers, you should look into the sync.Map struct if you don’t want to manage the synchronization yourself.