写从环路输出到CSV

Question

我有一个脚本，从输入文件预测产品名称。代码如下：

output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
with open('eng_productnames.csv', newline='') as myFile:
    reader = csv.reader(myFile)
    for rowz in reader:
        try:
            filenamez = rowz[1]
            file = open(DIR+filenamez, "r", encoding ='utf-8')
            filecontentszz = file.read()
            for s in filecontentszz:
                filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
                #filecontents = filecontents.encode().decode('unicode-escape')
                filecontentszz = ''.join([line.lower() for line in filecontentszz]) 
                doc2 = nlp2(filecontentszz)
                for ent in doc2.ents:
                    print(filenamez, ent.label_, ent.text)

                break

        except Exception as e:`

这给了我输出的stringas的形式：

07-09-18 N021024s16PASBUNDLEACK - Acknowledgement P.txt PRODUCT ABC1
06-22-18 Letter from Supl.txt PRODUCT ABC2
06-22-18 Letter from Req to Change .txt PRODUCT ABC3

现在我想所有这些细节导出为CSV 2列，一列作为文件名，并具有根据相应的列名所有文件名和产品名称产品一列。所有产品名称开头的字符串中的产品，然后名称。我怎样才能解决这个问题：

输出CSV应该是这样的：

Filename                                                             PRODUCT
  07-09-18 Acknowledgement P.txt                                 ABC1
  06-22-18 Letter Req to Change.txt                              ABC2

Answer 1

你可以做一个csv.writer写每一行输出文件，使用writerow而不是打印到屏幕上。

output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
with open('eng_productnames.csv', newline='') as input_file, \
        open('output.csv', 'w') as output_file:
    reader = csv.reader(input_file)
    writer = csv.writer(output_file)
    writer.writerow(["Filename", "Product"])  # this is the header row
    for rowz in reader:
        try:
            filenamez = rowz[1]
            file = open(DIR+filenamez, "r", encoding ='utf-8')
            filecontentszz = file.read()
            for s in filecontentszz:
                filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
                #filecontents = filecontents.encode().decode('unicode-escape')
                filecontentszz = ''.join([line.lower() for line in filecontentszz]) 
                doc2 = nlp2(filecontentszz)
                for ent in doc2.ents:
                    writer.writerow([filenamez, ent.text])

                break

我在这里假设filenamez和ent.text包含在每一列所需的信息。如果不是的话，那么你可以操纵它们来得到你写CSV之前需要。

Answer 2

有很多，你可以做到这一点的方式。一，我宁愿是用熊猫，这是一个功能强大的库使用CSV文件的工作。您可以创建一个词典：

predicted_products = {'FILENAME': [], 'PRODUCT': []}

并反复追加文件名和产品提供给相应的列表。

做到这一点之后，转换predicted_products到一个数据帧，并调用to_csv功能：

import Pandas as pd
predicted_products_df = pd.DataFrame.from_dict(predicted_products)
predicted_products_df.to_csv('your_path/file_name.csv')

我喜欢这种方式，因为你可以编辑数据更易于保存文件之前。

要将现有的代码，我想这print(filenamez, ent.label_, ent.text)打印输出。如果是的话：

import Pandas as pd
output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
predicted_products = {'FILENAME': [], 'PRODUCT': []}
with open('eng_productnames.csv', newline='') as myFile:
    reader = csv.reader(myFile)
    for rowz in reader:
        try:
            filenamez = rowz[1]
            file = open(DIR+filenamez, "r", encoding ='utf-8')
            filecontentszz = file.read()
            for s in filecontentszz:
                filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
                #filecontents = filecontents.encode().decode('unicode-escape')
                filecontentszz = ''.join([line.lower() for line in filecontentszz]) 
                doc2 = nlp2(filecontentszz)
                for ent in doc2.ents:
                    print(filenamez, ent.label_, ent.text)
                    predicted_products['FILENAME'].append(filenamez + ' ' + ent.label_)
                    predicted_products['PRODUCT'].append(ent.text)
                break

        except Exception as e:

predicted_products_df = pd.DataFrame.from_dict(predicted_products)
predicted_products_df.to_csv('your_path/file_name.csv')

写从环路输出到CSV

问题描述投票：0回答：2

2个回答

最新问题

写从环路输出到CSV

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2