在一个巨大的字符串中查找子字符串列表

Question

有一个相当长的文本（响应），有近 6 亿个字符，我想检查其中 90000 个名称（pmids）是否存在。我想要一个Python文本中存在的名称列表。这段代码运行良好，但需要几个小时：

  for id in pmcids:
     if id in response.text:
      downloadable_pmcids.append(id)

如何减少时间？

Answer 1

通过示例优化 Python 中的子字符串搜索：

为了提高 Python 中在大字符串中查找子字符串列表的性能，让我们使用高效的算法和数据结构来优化代码。这是演示优化的示例：

蟒蛇

#原代码

downloadable_pmcids = []
for id in pmcids:
    if id in response.text:
        downloadable_pmcids.append(id)

#优化代码

# Convert pmcids list to a set for faster lookups
pmcids_set = set(pmcids)  
# Use list comprehension for filtering
downloadable_pmcids = [id for id in pmcids_set if id in response.text] 

# Example Data
pmcids = ['123', '456', '789', '1011']
response_text = "This is a sample text containing pmcids 123 and 789."

# Optimized Search
downloadable_pmcids_optimized = [id for id in pmcids_set if id in 
response_text]

print(downloadable_pmcids_optimized)

在此示例中：

我们将 pmcids 列表转换为集合 pmcids_set 以加快查找速度。我们使用列表理解来过滤掉response_text中存在的名称。

优化后的代码通过利用集合查找和列表理解来提高效率。

通过实现这些优化，您可以显着减少在 Python 中查找大字符串中的子字符串列表所需的时间。

在一个巨大的字符串中查找子字符串列表

问题描述投票：0回答：1

1个回答

最新问题

在一个巨大的字符串中查找子字符串列表

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1