我想在一个循环中逐个字母地遍历一个长字/文本。但是我想使用线程,因为它可能太慢。我想将文本分成大块,然后分配给每个线程。
我该怎么做?我已经将文本划分为大块,但是如何将它们分配给每个线程?
例如:我有一个包含46个字母的单词。如果我的参数值为46,则我想给46个线程中的每个线程一个单词的字母。
我的代码:
import logging
import threading
import time
import concurrent.futures
import sys
class MyClass:
global arg
arg = int(sys.argv[1])
global word
word = 'Pneumonoultramicroscopicsilicovolcanoconiosiss'
def update(self, name):
upp = len(word) // arg + len(word) % arg
for w in word[0:upp]:
logging.info("Thread %s: starting update", name)
time.sleep(0.1)
print(w)
logging.info("Thread %s: finishing update \n", name)
if __name__ == "__main__":
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO,
datefmt="%H:%M:%S")
database = MyClass()
print("Starting...")
with concurrent.futures.ThreadPoolExecutor(max_workers=arg) as executor:
for index in range(1):
executor.submit(database.update, index)
print("Done!")
并且当我将范围号设置为大于1时-不幸的是,我只是将字母复写,但是我想给每个线程一个带有几个字母的块,以使其工作更快。非常感谢您的帮助,谢谢。
[如果您希望python线程按块处理大文本,则可以使用pythons threading
模块,
尝试一下:
import threading
def process_text(text):
print(text)
if __name__ == "__main__":
large_text = "Pneumonoultramicroscopicsilicovolcanoconiosiss"
# Number of parallel process = 4
number_of_threads = 4
# TODO: divide the text into 4 chunks
# size of each chunck len(text) / 4
chunk_size = len(large_text) // number_of_threads
start = 0
stop = chunk_size
for i in range(number_of_threads):
part_text_to_process = large_text[start:stop]
p = threading.Thread(target=process_text, args=(part_text_to_process,))
p.start()
start = stop
stop += (len(large_text) if i == (number_of_threads - 2) else chunk_size)
尽管我建议避免使用python线程,因为由于python的GIL,线程对cpu密集型任务没有好处。