如何提高脚本速度?

问题描述 投票:-1回答:1

你好,所以我试图从这个列表中过滤掉坏词,对于这个脚本,我通常会列出5到1000万行的单词,我试图通过线程化使其变快,但是在第一个20k单词之后,它变得越来越慢为什么会这样,如果我改为使用Multiprocessing会更快吗?我在具有48个CPU内核和200GB RAM的Ubuntu上运行此脚本

from tqdm import tqdm
import queue
import threading

a=input("The List: ")+".txt"
thr=input('Threads: ')
c=input("clear old[y]: ")
inputQueue = queue.Queue()

if c == 'y' or c == 'Y':#clean
    if c =="y":
        open("goodWord.txt",'w').close()

s = ["bad_word"]#bad words list

class myclass:
    def dem(self,my_word):
        for key in s:
            if key in my_word:
                return 1
        return 0

    def chk(self):
        while 1:
            old = open("goodWord.txt","r",encoding='utf-8',errors='ignore').readlines()
            y = inputQueue.get()
            if my_word not in old:
                rez = self.dem(my_word)
                if rez == 0:
                    sav = open("goodWord.txt","a+")
                    sav.write(my_word+"\n")
                    sav.close()
                    self.pbar.update(1)
                else :
                    self.pbar.update(1)

            inputQueue.task_done()



    def run_thread(self):
        for y in tqdm(open(a, 'r',encoding='utf-8', errors='ignore').readlines()):
            inputQueue.put(y)

        tqdm.write("All in the Queue")
        self.pbar = tqdm(total=inputQueue.qsize(),unit_divisor=1000)
        for x in range(int(thr)):
            t = threading.Thread(target=self.chk)
            t.setDaemon(True)
            t.start()
        inputQueue.join()

try:
    open("goodWord.txt","a")
except:
    open("goodWord.txt","w")

old = open("goodWord.txt","r",encoding='utf-8',errors='ignore').readlines()
myclass=myclass()
omyclass.run_thread()



python
1个回答
0
投票

出于好奇和接受教育,我编写了一个几乎相同的(功能上的)程序:

© www.soinside.com 2019 - 2024. All rights reserved.