首先,我尝试直接写入 blob。但这没有用。因此,我尝试写入临时目录,然后将文件移动到所需的目录。即使这样也不起作用。我正在寻找一种解决方案,将具有多个工作表的 Excel 写入 Azure Blob。
filename = os.path.join(arg_dict['out_dir'], old_attribute_file_path.replace(old_attribute_file_path.split('/')[-1].split('-')[-1].split('.')[0], attribute_files[0].split('-')[1]))
temp_file_name = os.path.join(TMP_PATH, old_attribute_file_path.replace(old_attribute_file_path.split('/')[-1].split('-')[-1].split('.')[0], attribute_files[0].split('-')[1]))
fill_color = PatternFill(start_color='FFFF00', end_color='FFFF00', fill_type='solid')
# Write DataFrames to Excel
with pd.ExcelWriter(temp_file_name, engine='openpyxl') as writer:
df1.to_excel(writer, index=False, sheet_name='Sheet1')
df2.to_excel(writer, index=False, sheet_name='Sheet2')
# Load the workbook
workbook = writer.book
# Save the workbook
workbook.save(temp_file_name)
shutil.move(temp_file_name, filename)
我收到的错误->
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-6a81eac8-0226-477b-9715-070566214b43/lib/python3.10/site-packages/openpyxl/writer/excel.py:294, in save_workbook(workbook, filename)
292 workbook.properties.modified = datetime.datetime.utcnow()
293 writer = ExcelWriter(workbook, archive)
--> 294 writer.save()
295 return True
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-6a81eac8-0226-477b-9715-070566214b43/lib/python3.10/site-packages/openpyxl/writer/excel.py:275, in ExcelWriter.save(self)
273 def save(self):
274 """Write data into the archive."""
--> 275 self.write_data()
276 self._archive.close()
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-6a81eac8-0226-477b-9715-070566214b43/lib/python3.10/site-packages/openpyxl/writer/excel.py:60, in ExcelWriter.write_data(self)
57 archive = self._archive
59 props = ExtendedProperties()
---> 60 archive.writestr(ARC_APP, tostring(props.to_tree()))
62 archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
63 if self.workbook.loaded_theme:
File /usr/lib/python3.10/zipfile.py:1816, in ZipFile.writestr(self, zinfo_or_arcname, data, compress_type, compresslevel)
1814 zinfo.file_size = len(data) # Uncompressed size
1815 with self._lock:
-> 1816 with self.open(zinfo, mode='w') as dest:
1817 dest.write(data)
File /usr/lib/python3.10/zipfile.py:1182, in _ZipWriteFile.close(self)
1180 self._fileobj.seek(self._zinfo.header_offset)
1181 self._fileobj.write(self._zinfo.FileHeader(self._zip64))
-> 1182 self._fileobj.seek(self._zipfile.start_dir)
1184 # Successfully written: Add file to our caches
1185 self._zipfile.filelist.append(self._zinfo)
OSError: [Errno 95] Operation not supported
PySpark 数据帧缺少
to_excel
方法,并且 databricks 无法将 PySpark 数据帧转换为 Excel 文件。
解决方案是将文件保存在
databricks/drivers
中。然后移动该文件并将其从驱动程序中删除。
with pd.ExcelWriter(r'export2.xlsx', engine="openpyxl") as writer:
#file will be written to /databricks/driver/ i.e., local file system
data.to_excel(writer, index=False, sheet_name='Sheet1')
data2.to_excel(writer, index=False, sheet_name='Sheet2')
workbook = writer.book
# Save the workbook
workbook.save('export2.xlsx')
在这里您可以看到文件已存储在驱动程序文件夹中:
然后将文件从驱动程序移动到 DBFS 文件夹:
from shutil import move
move('/databricks/driver/export2.xlsx','/dbfs/export2.xlsx')
openpyxl 和 xlsxwriter 都用于 pandas 数据帧。您可以使用 spark 插件 将 Excel 直接写入 Blob 存储
你的代码将如下所示。 首先获取您的访问密钥并将其设置为 Spark 配置。
spark.conf.set(
"fs.azure.account.key.<storage_name>.dfs.core.windows.net",
dbutils.secrets.get(scope=<scope_name>, key=<access_key>))
接下来,设置路径并将 Spark 数据帧写入相同的路径但不同的工作表。
path = "abfss://[email protected]/testdir/test.xlsx"
spark_Df1.write.format("com.crealytics.spark.excel")\
.option("header", "true")\
.option("dataAddress", "'My Sheet1'!A1")\
.mode("append")\
.save(path)
spark_Df2.write.format("com.crealytics.spark.excel")\
.option("header", "true")\
.option("dataAddress", "'My Sheet2'!A1")\
.mode("append")\
.save(path)