假设我有以下算法来biject
bytes -> Natural
:
def rank(s: bytes) -> int:
k = 2**8
result = 0
offset = 0
for i, w in enumerate(s):
result *= k
result += w
offset += (k**i)
return result + offset
解码器(至少在我有限的能力范围内)如下:
def unrank(value: int) -> bytes:
k = 2**8
# 1. Get length
import itertools
offset = 0
for length in itertools.count(): #! LOOP RUNS O(N) TIMES !#
offset += (k**length) #! LONG ADDITION IS O(N) !#
if offset > value:
value = value - (offset - k**length)
break
# 2. Get value
result = bytearray(length)
for i in reversed(range(length)):
value, result[i] = divmod(value, k) # (Can be done with bit shifts, ignore for complexity)
return bytes(result)
令
N ≈ len(bytes) ≈ log(int)
可知,该解码器的最坏情况运行时间显然为 O(N^2)
。当然,它的性能很好(<2s runtime) for practical cases (≤32KiB of data), but I'm still curious if it's fundamentally possible to beat that into something that swells less as the inputs get bigger.
# Example / test cases:
assert rank(b"") == 0
assert rank(b"\x00") == 1
assert rank(b"\x01") == 2
...
assert rank(b"\xFF") == 256
assert rank(b"\x00\x00") == 257
assert rank(b"\x00\x01") == 258
...
assert rank(b"\xFF\xFF") == 65792
assert rank(b"\x00\x00\x00") == 65793
assert unrank(0) == b""
assert unrank(1) == b"\x00"
assert unrank(2) == b"\x01"
# ...
assert unrank(256) == b"\xFF"
assert unrank(257) == b"\x00\x00"
assert unrank(258) == b"\x00\x01"
# ...
assert unrank(65792) == b"\xFF\xFF"
assert unrank(65793) == b"\x00\x00\x00"
assert unrank(2**48+1) == b"\xFE\xFE\xFE\xFE\xFF\x00"
如果你这样写
rank
函数会更清楚:
def rank(s: bytes) -> int:
k = 2**8
result = 0
for i, w in enumerate(s):
result *= k
result += w + 1
return result
...你可以这样写
unrank
:
def unrank(value: int) -> bytes:
k = 2**8
ret = bytearray(0)
while value > 0:
value -= 1
value, digit = divmod(value, k)
ret.append(digit)
ret.reverse()
return bytes(ret)
(感谢提供测试用例)