你好,所以我试图从这个列表中过滤掉坏词,对于这个脚本,我通常会列出5到1000万行的单词,我试图通过线程化使其变快,但是在第一个20k单词之后,它变得越来越慢为什么会这样,如果我改为使用Multiprocessing会更快吗?我在具有48个CPU内核和200GB RAM的Ubuntu上运行此脚本
from tqdm import tqdm
import queue
import threading
a=input("The List: ")+".txt"
thr=input('Threads: ')
c=input("clear old[y]: ")
inputQueue = queue.Queue()
if c == 'y' or c == 'Y':#clean
if c =="y":
open("goodWord.txt",'w').close()
s = ["bad_word"]#bad words list
class myclass:
def dem(self,my_word):
for key in s:
if key in my_word:
return 1
return 0
def chk(self):
while 1:
old = open("goodWord.txt","r",encoding='utf-8',errors='ignore').readlines()
y = inputQueue.get()
if my_word not in old:
rez = self.dem(my_word)
if rez == 0:
sav = open("goodWord.txt","a+")
sav.write(my_word+"\n")
sav.close()
self.pbar.update(1)
else :
self.pbar.update(1)
inputQueue.task_done()
def run_thread(self):
for y in tqdm(open(a, 'r',encoding='utf-8', errors='ignore').readlines()):
inputQueue.put(y)
tqdm.write("All in the Queue")
self.pbar = tqdm(total=inputQueue.qsize(),unit_divisor=1000)
for x in range(int(thr)):
t = threading.Thread(target=self.chk)
t.setDaemon(True)
t.start()
inputQueue.join()
try:
open("goodWord.txt","a")
except:
open("goodWord.txt","w")
old = open("goodWord.txt","r",encoding='utf-8',errors='ignore').readlines()
myclass=myclass()
omyclass.run_thread()
出于好奇和接受教育,我编写了一个几乎相同的(功能上的)程序: