我正在尝试从HackerRank中解决Python3中的排序问题:https://www.hackerrank.com/challenges/fraudulent-activity-notifications/problem
此问题需要找到运行中的每个子列表的中位数。
我的代码通过了示例测试用例,但由于超时终止而未完全通过实际测试用例。我怀疑每次使用sort()来查找中位数都会导致时间延迟。
如何改善我的代码?
def activityNotifications(expenditure, d):
totalDays = len(expenditure)
notified = 0
for x in range(d, totalDays):
check = expenditure[x-d:x]
check.sort()
if d % 2 == 0:
median = (check[int(d/2)] + check[int((d-2)/2)])/2
else:
median = check[int((d-1)/2)]
if expenditure[x] >= median * 2:
notified += 1
return notified
d
不小的情况下。每次迭代的时间复杂度为O(dlog(d))
。 要找到中位数,我们需要一个排序数组,但不需要sort()
方法。如果我们注意到每个expenditure[i]
都在[0;200]
范围内,那么这里的计数排序听起来是个好主意。基本上,我们使用i
计算每个数字counts[i]
的频率。要获得排序的数组,我们只需要迭代j: counts[j] > 0
。
因此,如果counts
在每个长度expenditure
(间隔d
)的间隔中保持[i; i + d)
个数字的频率,我们可以通过检查201
中的最多counts
个数字来找到中位数(请参阅代码以获取细节)。移至下一个间隔[i+1; i+d+1)
,需要将数字i
的频率递减为counts[i]--
,并增加数字i+d
的频度。这种方法需要O(n*201)
时间和O(201)
空间复杂度。
现在,请参见下面的代码:
def activityNotifications(expenditure, d):
totalDays = len(expenditure)
counts = [0] * 201
notifications = 0
for i in xrange(totalDays):
# now we have enough data to check if there was any fraudulent activity
if i >= d:
# let's count frequencies of numbers in range [i - d; i)
current_num_of_numbers = 0
prev_number = -1
for j in xrange(201):
if counts[j] > 0:
current_num_of_numbers += counts[j]
# now we can determine the median because we have enough numbers
if d < (2 * current_num_of_numbers):
if (d % 2 == 0) and (current_num_of_numbers - counts[j] == d / 2):
median = (prev_number + j) / 2
else:
median = j
# if the condition is met then send a notification
if expenditure[i] >= (median * 2):
notifications += 1
break
prev_number = j
counts[expenditure[j - d]] -= 1
counts[expenditure[i]] += 1
return notifications