如何使用线程迭代长文本？

Question

我想在一个循环中逐个字母地遍历一个长字/文本。但是我想使用线程，因为它可能太慢。我想将文本分成大块，然后分配给每个线程。

我该怎么做？我已经将文本划分为大块，但是如何将它们分配给每个线程？

例如：我有一个包含46个字母的单词。如果我的参数值为46，则我想给46个线程中的每个线程一个单词的字母。

我的代码：

import logging
import threading
import time
import concurrent.futures
import sys

class MyClass:

    global arg
    arg = int(sys.argv[1])
    global word
    word = 'Pneumonoultramicroscopicsilicovolcanoconiosiss'
    def update(self, name):
        upp = len(word) // arg + len(word) % arg
        for w in word[0:upp]:
            logging.info("Thread %s: starting update", name)
            time.sleep(0.1)

            print(w)
            logging.info("Thread %s: finishing update \n", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    database = MyClass()
    print("Starting...")
    with concurrent.futures.ThreadPoolExecutor(max_workers=arg) as executor:
        for index in range(1):
            executor.submit(database.update, index)
    print("Done!")

并且当我将范围号设置为大于1时-不幸的是，我只是将字母复写，但是我想给每个线程一个带有几个字母的块，以使其工作更快。非常感谢您的帮助，谢谢。

Answer 1

[如果您希望python线程按块处理大文本，则可以使用pythons threading模块，

尝试一下：

import threading

def process_text(text):
    print(text)

if __name__ == "__main__":
    large_text = "Pneumonoultramicroscopicsilicovolcanoconiosiss"

    # Number of parallel process = 4
    number_of_threads = 4
    # TODO: divide the text into 4 chunks

    # size of each chunck len(text) / 4
    chunk_size = len(large_text) // number_of_threads

    start = 0
    stop = chunk_size
    for i in range(number_of_threads):
        part_text_to_process = large_text[start:stop]
        p = threading.Thread(target=process_text, args=(part_text_to_process,))
        p.start()

        start = stop
        stop += (len(large_text) if i == (number_of_threads - 2) else chunk_size)

尽管我建议避免使用python线程，因为由于python的GIL，线程对cpu密集型任务没有好处。

如何使用线程迭代长文本？

问题描述投票：1回答：1

1个回答

最新问题

如何使用线程迭代长文本？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1