使用beautifulsoup和pandas进行刮擦时如何将行附加到xlsx文件中?

问题描述 投票:0回答:1

所以,我一直在寻找,我似乎无法弄清楚为什么我无法从我的抓取结果中写入xlsx文件。

如果我运行此程序,它会完美打印

with open('G-Sauce_Urls.csv' , 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    for line in csv_reader:
        r = requests.get(line[0]).text

        soup = BeautifulSoup(r,'lxml')
        business = soup.find('title')
        companys = business.get_text()
        phones = soup.find_all(text=re.compile("Call (.*)"))
        Website = soup.select('head > link:nth-child(4)')
        profile = (Website[0].attrs['href'])

        data = {'Required':[companys], 'Required_no_Email':[phones], 'Business_Fax':[profile] }
        df = pd.DataFrame(data, columns = ['Required','First', 'Last', 'Required_no_Email', 'Business_Fax'])

但是我似乎无法将其附加到xlsx文件中。我只得到最后一个结果,我认为这是因为它只是“写”而不是追加。我试过了:

writer = pd.ExcelWriter("ProspectUploadSheetRob.xlsx", engine='xlsxwriter', mode='a')
df.to_excel(writer, sheet_name='Sheet1', index=False, startrow=4, header=3)

workbook  = writer.book
worksheet = writer.sheets['Sheet1']

AND

with ExcelWriter('path_to_file.xlsx', mode='a') as writer:
     df.to_excel(writer, sheet_name='Sheet1', index=False, startrow=4, header=3)

df = pd.DataFrame(data, columns = ['Required','First', 'Last', 'Required_no_Email', 'Business_Fax'])
writer = pd.ExcelWriter("ProspectUploadSheetRob.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False, startrow=4, header=3)

AND

我开始阅读openpyxl,但在这一点上我很困惑,我不明白。

感谢您的所有帮助

python-3.x pandas openpyxl pandas.excelwriter
1个回答
0
投票

作者仅在运行时产生输出:

writer.save()

我有一个类似的代码,可以使用以下参数打开文件,并且可以正常工作:

writer = pd.ExcelWriter(r'path_to_file.xlsx', engine='xlsxwriter')
... all my modifications ...
writer.save()

请注意,根据documentation'w'或Write是默认模式,在修改对象时也是如此,尽管没有太多说明,但是只有在添加全新的excel对象(Sheets等)时才引用append,或者“ extending”将文档与具有与文档结构完全相同格式的另一个数据框一起使用。为了使其可复制,您可以添加模板xlsx,但我希望它会有所帮助。请让我知道。

© www.soinside.com 2019 - 2024. All rights reserved.