import logging
from typing import List
Problem Description
Difficulty: Medium
Design an algorithm to encode a list of strings to a single string. The encoded string is then decoded back to the original list of strings.
Please implement encode
and decode
Example 1:
Input: ["neet","code","love","you"]
Output:["neet","code","love","you"]
Example 2:
Input: ["we","say",":","yes"]
Output: ["we","say",":","yes"]
Constraints:
0 <= strs.length < 100
0 <= strs[i].length < 200
strs[i] contains only UTF-8 characters.
Initial Solution
Not sure why this was classed as medium. Am I missing something here? (Apparently so). I chose to use a new line "\n"
as the delimiter between words, and the submission passed on NeetCode. I’m not sure it would on LeetCode, but it’s a premium problem there, so not free to check.
# Set up logging config (global scope)
=logging.WARNING, format="%(levelname)s: %(message)s")
logging.basicConfig(level= logging.getLogger(__name__) logger
class InitialSolution:
def __init__(self):
self.logger = logger
def encode(self, strs: List[str]) -> str:
= ""
encoded
for string in strs:
+= string + "\n"
encoded self.logger.debug(f"encoded string: {encoded}")
return encoded
def decode(self, s: str) -> List[str]:
= []
decoded = ""
word
for letter in s:
+= letter
word self.logger.debug(f"added letter to word; word = {word}")
if letter == "\n":
0:-1])
decoded.append(word[self.logger.debug(f"added word to final list: {decoded}")
= ""
word
return decoded
= ["neet", "code", "love", "you"]
strs # Expected output:["neet","code","love","you"]
= InitialSolution()
solution # suppress debugging output to avoid the mess when using '/n'
logger.setLevel(logging.WARNING)
= solution.encode(strs)
string string
'neet\ncode\nlove\nyou\n'
logger.setLevel(logging.DEBUG) solution.decode(string)
DEBUG: added letter to word; word = n
DEBUG: added letter to word; word = ne
DEBUG: added letter to word; word = nee
DEBUG: added letter to word; word = neet
DEBUG: added letter to word; word = neet
DEBUG: added word to final list: ['neet']
DEBUG: added letter to word; word = c
DEBUG: added letter to word; word = co
DEBUG: added letter to word; word = cod
DEBUG: added letter to word; word = code
DEBUG: added letter to word; word = code
DEBUG: added word to final list: ['neet', 'code']
DEBUG: added letter to word; word = l
DEBUG: added letter to word; word = lo
DEBUG: added letter to word; word = lov
DEBUG: added letter to word; word = love
DEBUG: added letter to word; word = love
DEBUG: added word to final list: ['neet', 'code', 'love']
DEBUG: added letter to word; word = y
DEBUG: added letter to word; word = yo
DEBUG: added letter to word; word = you
DEBUG: added letter to word; word = you
DEBUG: added word to final list: ['neet', 'code', 'love', 'you']
['neet', 'code', 'love', 'you']
Here’s a cleaner version that has the same approach. A few if
statements are needed to account for edge cases on LeetCode. However, these cause an inconsistency between the type hints in the function definition and the actual type of the variables. The encode function now returns Union[List[str], str]
rather than the original str
. Not sure if that would be acceptable either.
class AlternativeSolution:
def encode(self, strs: List[str]) -> str: # incorrect type annotation
if strs == [""]:
return [""]
if strs == []:
return []
return "\n".join(strs)
def decode(self, s: str) -> List[str]: # incorrect type annotation
if s == [""]:
return [""]
if s == []:
return []
return s.split("\n")
Initial Results
My intial solution passed the LeetCode submission. I had a little trouble determining the worst-case complexity here. It appears that the intial encode
function may be \(O(n^2)\) due to the repeated cost of string concatenation*. The alternative solution may avoid the quadratic time by using use Python’s better optimized join
operation.
*Per this source:
String concatenation is best done with ’‘.join(seq) which is an \(O(n)\) process. In contrast, using the’+’ or ‘+=’ operators can result in an \(O(n^2)\) process because new strings may be built for each intermediate step. The CPython 2.4 interpreter mitigates this issue somewhat; however, ’’.join(seq) remains the best practice.
The decode
function appears to be \(O(n)\) due to the linear nature of processing each character exactly once.
NeetCode Solution
class Solution:
def encode(self, strs: List[str]) -> str:
= ""
res for s in strs:
+= str(len(s)) + "#" + s
res return res
def decode(self, s: str) -> List[str]:
= []
res = 0
i
while i < len(s):
= i
j while s[j] != "#":
+= 1
j = int(s[i:j])
length = j + 1
i = i + length
j
res.append(s[i:j])= j
i
return res
Conclusion
Although my original code passes, apparently it’s against the spirit of the exercise. A simple delimiter could suffer from ‘delimiter collision’. One solution to the delimiter collision problem is use an escape character (‘\’ is the escape character for new line). I tried a few and NeetCode accepted the ASCII escape character ‘\033’ but not ‘’, suggesting we got lucky with ‘’ not appearing in the test cases.
NeetCode’s solution to the encode-decode problem uses a simple length-prefixing approach instead of an escape character. The encode
method takes the list of strings and concatenates each string with its length prefixed by a separator “#”. The decode
method then reverses the encoding process, parsing the encoded string by iterating through it, extracting each string based on its prefixed length and the separator.
- Time complexity:
encode
is possibly \(O(n^2)\); see above.decode
is \(O(n)\) - Space complexity: both \(O(n)\)