如何使用线程迭代长文本?

问题描述 投票:1回答:1

我想在一个循环中逐个字母地遍历一个长字/文本。但是我想使用线程,因为它可能太慢。我想将文本分成大块,然后分配给每个线程。

我该怎么做?我已经将文本划分为大块,但是如何将它们分配给每个线程?

例如:我有一个包含46个字母的单词。如果我的参数值为46,则我想给46个线程中的每个线程一个单词的字母。

我的代码:

import logging
import threading
import time
import concurrent.futures
import sys

class MyClass:

    global arg
    arg = int(sys.argv[1])
    global word
    word = 'Pneumonoultramicroscopicsilicovolcanoconiosiss'
    def update(self, name):
        upp = len(word) // arg + len(word) % arg
        for w in word[0:upp]:
            logging.info("Thread %s: starting update", name)
            time.sleep(0.1)

            print(w)
            logging.info("Thread %s: finishing update \n", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    database = MyClass()
    print("Starting...")
    with concurrent.futures.ThreadPoolExecutor(max_workers=arg) as executor:
        for index in range(1):
            executor.submit(database.update, index)
    print("Done!")

并且当我将范围号设置为大于1时-不幸的是,我只是将字母复写,但是我想给每个线程一个带有几个字母的块,以使其工作更快。非常感谢您的帮助,谢谢。

python python-3.x multithreading python-multithreading
1个回答
0
投票

[如果您希望python线程按块处理大文本,则可以使用pythons threading模块,

尝试一下:

import threading

def process_text(text):
    print(text)

if __name__ == "__main__":
    large_text = "Pneumonoultramicroscopicsilicovolcanoconiosiss"

    # Number of parallel process = 4
    number_of_threads = 4
    # TODO: divide the text into 4 chunks

    # size of each chunck len(text) / 4
    chunk_size = len(large_text) // number_of_threads

    start = 0
    stop = chunk_size
    for i in range(number_of_threads):
        part_text_to_process = large_text[start:stop]
        p = threading.Thread(target=process_text, args=(part_text_to_process,))
        p.start()

        start = stop
        stop += (len(large_text) if i == (number_of_threads - 2) else chunk_size)

尽管我建议避免使用python线程,因为由于python的GIL,线程对cpu密集型任务没有好处。

© www.soinside.com 2019 - 2024. All rights reserved.