LeetCode 49: Group Anagrams

code
arrays and hashing
medium
Author

spa-dev

Published

June 3, 2024

Problem Description

Difficulty: Medium

Given an array of strings strs, group all anagrams together into sublists. You may return the output in any order.

Example 1:

Input: strs = ["act","pots","tops","cat","stop","hat"]
Output: [["hat"],["act", "cat"],["stop", "pots", "tops"]]

Example 2:

Input: strs = ["x"]
Output: [["x"]]

Example 3:

Input: strs = [""]
Output: [[""]]

Initial Solution

The key to this problem was to recognize that when anagrams are sorted alphabetically, they become the same ‘word’. With that in mind, a decent solution seemed relatively easy to come up with. Let’s just sort the words and place the original word in a dictionary with the keys being the unique sorted/garbled word.

There are a few parts that could potentially trip up a beginner:

Firstly, calling sorted() on a string returns a list of the sorted characters and not a whole sorted word:

word = 'hello'
sorted_word = sorted(word)
print(sorted_word)
>>> ['e', 'h', 'l', 'l', 'o']

So we must join() the list back together afterwards.

Secondly, adding values to a pre-existing key in the dictionary, instead of overwriting them, takes a little care, but there are a few options noted in this post. For example, using append() and extend(). The TLDR is to use extend() when you have multiple values to append, rather than using append() multiple times. E.g.

a = {}
a.setdefault('abc', []).append(1)       # {'abc': [1]}
a.setdefault('abc', []).extend([2, 3])  # a is now {'abc': [1, 2, 3]}

setdefault() will avoid a KeyError when we try to access a key that does not exist yet. It inserts the key with the specified default: []

Let’s put this all together in our initial attempt at solving the anagram problem.

import logging
from typing import List
class InitialSolution:
    def __init__(self):
        # Set up logging config
        logging.basicConfig(level=logging.DEBUG, format="%(levelname)s: %(message)s")
        self.logger = logging.getLogger(__name__)

    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        # works fine without the if statement, but let's save some later computation:
        if strs == [""]:
            return [[""]]

        words = {}
        for word in strs:
            self.logger.debug(f"Processing word: {word}")
            sorted_word = "".join(sorted(word))
            # n⋅log(n) character sorting * m strings in the list
            self.logger.debug(f"Sorted 'word': {sorted_word}")
            words.setdefault(sorted_word, []).append(word)
            self.logger.debug(
                f"Updated dict value(s): '{sorted_word}': {words[sorted_word]}"
            )

        return list(words.values())
# Example for testing purposes.
strs = ["act", "pots", "tops", "cat", "stop", "hat"]
# Output: [["hat"],["act", "cat"],["stop", "pots", "tops"]]
# Note that output order does not matter.

solution = InitialSolution()
solution.groupAnagrams(strs)
DEBUG: Processing word: act
DEBUG: Sorted 'word': act
DEBUG: Updated dict value(s): 'act': ['act']
DEBUG: Processing word: pots
DEBUG: Sorted 'word': opst
DEBUG: Updated dict value(s): 'opst': ['pots']
DEBUG: Processing word: tops
DEBUG: Sorted 'word': opst
DEBUG: Updated dict value(s): 'opst': ['pots', 'tops']
DEBUG: Processing word: cat
DEBUG: Sorted 'word': act
DEBUG: Updated dict value(s): 'act': ['act', 'cat']
DEBUG: Processing word: stop
DEBUG: Sorted 'word': opst
DEBUG: Updated dict value(s): 'opst': ['pots', 'tops', 'stop']
DEBUG: Processing word: hat
DEBUG: Sorted 'word': aht
DEBUG: Updated dict value(s): 'aht': ['hat']
[['act', 'cat'], ['pots', 'tops', 'stop'], ['hat']]

Initial Results

IntialSolution passes NeetCode submission (if we comment out the logging).

  • Time complexity: \(O(m⋅n⋅log(n))\) due to sorting each string.
  • Space complexity: \(O(m⋅n)\) due to storing the strings in the dictionary keys and values.

NeetCode Solution

from collections import defaultdict


class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        result = defaultdict(list)

        for s in strs:
            count = [0] * 26  # one for each character in a-z
            for c in s:
                # use ord to get unicode value of the character
                count[ord(c) - ord("a")] += 1
            # convert count list to tuple, as lists can't be dict keys
            result[tuple(count)].append(s)
        return result.values()
solution = Solution()
list(solution.groupAnagrams(strs))
[['act', 'cat'], ['pots', 'tops', 'stop'], ['hat']]

I used list() above to display the result nicely in the notebook. The code seems to work fine in NeetCode both with and without the conversion to a list; no TypeError is given.

Conclusion

Neetcode’s solution uses a hashmap (defaultdict) called result that stores lists of anagrams, where the key is a tuple representing the character counts of the strings. Note that a defaultdict never raises a KeyError, but instead provides a default value for a key that does not exist.

  • Time complexity \(O(m⋅n)\) where m is the total number of strings and n is the length of each string (multiplied by 26 possible letters)
  • Space complexity: \(O(m⋅n)\) due to the space needed for storing the strings in the defaultdict values.

Given an extremely long string, the above solution would be optimal: \(O(m⋅n⋅log(n))\) vs. \(O(m⋅n⋅26)\)