LeetCode 271: String Encode and Decode

code
arrays and hashing
medium
Author

spa-dev

Published

June 17, 2024

Problem Description

Difficulty: Medium

Design an algorithm to encode a list of strings to a single string. The encoded string is then decoded back to the original list of strings.

Please implement encode and decode

Example 1:

Input: ["neet","code","love","you"]
Output:["neet","code","love","you"]

Example 2:

Input: ["we","say",":","yes"]
Output: ["we","say",":","yes"]

Constraints:

0 <= strs.length < 100
0 <= strs[i].length < 200
strs[i] contains only UTF-8 characters.

Initial Solution

Not sure why this was classed as medium. Am I missing something here? (Apparently so). I chose to use a new line "\n" as the delimiter between words, and the submission passed on NeetCode. I’m not sure it would on LeetCode, but it’s a premium problem there, so not free to check.

import logging
from typing import List
# Set up logging config (global scope)
logging.basicConfig(level=logging.WARNING, format="%(levelname)s: %(message)s")
logger = logging.getLogger(__name__)
class InitialSolution:
    
    def __init__(self):
        self.logger = logger

    def encode(self, strs: List[str]) -> str:
        encoded = ""

        for string in strs:
            encoded += string + "\n"
            self.logger.debug(f"encoded string: {encoded}")

        return encoded

    def decode(self, s: str) -> List[str]:
        decoded = []
        word = ""

        for letter in s:
            word += letter
            self.logger.debug(f"added letter to word; word = {word}")

            if letter == "\n":
                decoded.append(word[0:-1])
                self.logger.debug(f"added word to final list: {decoded}")
                word = ""

        return decoded
strs = ["neet", "code", "love", "you"]
# Expected output:["neet","code","love","you"]

solution = InitialSolution()
# suppress debugging output to avoid the mess when using '/n'
logger.setLevel(logging.WARNING)

string = solution.encode(strs)
string
'neet\ncode\nlove\nyou\n'
logger.setLevel(logging.DEBUG)
solution.decode(string)
DEBUG: added letter to word; word = n
DEBUG: added letter to word; word = ne
DEBUG: added letter to word; word = nee
DEBUG: added letter to word; word = neet
DEBUG: added letter to word; word = neet

DEBUG: added word to final list: ['neet']
DEBUG: added letter to word; word = c
DEBUG: added letter to word; word = co
DEBUG: added letter to word; word = cod
DEBUG: added letter to word; word = code
DEBUG: added letter to word; word = code

DEBUG: added word to final list: ['neet', 'code']
DEBUG: added letter to word; word = l
DEBUG: added letter to word; word = lo
DEBUG: added letter to word; word = lov
DEBUG: added letter to word; word = love
DEBUG: added letter to word; word = love

DEBUG: added word to final list: ['neet', 'code', 'love']
DEBUG: added letter to word; word = y
DEBUG: added letter to word; word = yo
DEBUG: added letter to word; word = you
DEBUG: added letter to word; word = you

DEBUG: added word to final list: ['neet', 'code', 'love', 'you']
['neet', 'code', 'love', 'you']

Here’s a cleaner version that has the same approach. A few if statements are needed to account for edge cases on LeetCode. However, these cause an inconsistency between the type hints in the function definition and the actual type of the variables. The encode function now returns Union[List[str], str] rather than the original str. Not sure if that would be acceptable either.

class AlternativeSolution:

    def encode(self, strs: List[str]) -> str:  # incorrect type annotation
        if strs == [""]:
            return [""]
        if strs == []:
            return []
        return "\n".join(strs)

    def decode(self, s: str) -> List[str]:  # incorrect type annotation
        if s == [""]:
            return [""]
        if s == []:
            return []
        return s.split("\n")

Initial Results

My intial solution passed the LeetCode submission. I had a little trouble determining the worst-case complexity here. It appears that the intial encode function may be \(O(n^2)\) due to the repeated cost of string concatenation*. The alternative solution may avoid the quadratic time by using use Python’s better optimized join operation.

*Per this source:

String concatenation is best done with ’‘.join(seq) which is an \(O(n)\) process. In contrast, using the’+’ or ‘+=’ operators can result in an \(O(n^2)\) process because new strings may be built for each intermediate step. The CPython 2.4 interpreter mitigates this issue somewhat; however, ’’.join(seq) remains the best practice.

The decode function appears to be \(O(n)\) due to the linear nature of processing each character exactly once.

NeetCode Solution

class Solution:

    def encode(self, strs: List[str]) -> str:
        res = ""
        for s in strs:
            res += str(len(s)) + "#" + s
        return res

    def decode(self, s: str) -> List[str]:
        res = []
        i = 0

        while i < len(s):
            j = i
            while s[j] != "#":
                j += 1
            length = int(s[i:j])
            i = j + 1
            j = i + length
            res.append(s[i:j])
            i = j

        return res

Conclusion

Although my original code passes, apparently it’s against the spirit of the exercise. A simple delimiter could suffer from ‘delimiter collision’. One solution to the delimiter collision problem is use an escape character (‘\’ is the escape character for new line). I tried a few and NeetCode accepted the ASCII escape character ‘\033’ but not ‘’, suggesting we got lucky with ‘’ not appearing in the test cases.

NeetCode’s solution to the encode-decode problem uses a simple length-prefixing approach instead of an escape character. The encode method takes the list of strings and concatenates each string with its length prefixed by a separator “#”. The decode method then reverses the encoding process, parsing the encoded string by iterating through it, extracting each string based on its prefixed length and the separator.

  • Time complexity: encode is possibly \(O(n^2)\); see above. decode is \(O(n)\)
  • Space complexity: both \(O(n)\)