当我尝试使用 pyarrow 打开简单的 .orc 文件时，出现“致命 Python 错误：中止”并且没有可以处理的解释性错误消息

Question

我正在使用：赢 10 专业版 Intel(R) Xeon(R) W-1250 CPU @ 3.30GHz / 16 GB RAM

Anaconda 导航器 2.5.0， venv 中的 Python 3.10.13 pyarrow 11.0.0 熊猫2.1.1 在 Spyder IDE 5.4.3 中运行脚本

我想打开/处理 .orc 文件（.csv 在我的情况下是不可能的）并进行简单的测试为什么我使用张量流的神经网络无法正常工作，我编写了简单的脚本 CreateORC 和 OpenORC。在我的例子中，打开和读取 ocr 应该是一个非常容易的任务，因为我创建了一个非常简单的 ocr 文件，但是读取会导致崩溃。

CreateORC.py：（这部分工作正常，并且创建了包含数据的 ORC 文件）

import pandas as pd
import pyarrow as pa
import pyarrow.orc as orc

# Create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df.head())

# Convert the DataFrame to a PyArrow Table
table = pa.Table.from_pandas(df)

# Write the table to an ORC file
orc_file_path = 'sample_data.orc'

# Write the table to an ORC file
orc_file_path = 'sample_data.orc'
with open(orc_file_path, 'wb') as f:
    orc.write_table(table, f)

print(f"ORC file '{orc_file_path}' created successfully.")

OpenORC.py（这部分会导致崩溃）

import pyarrow.orc as orc

def read_orc_file(file_path):
    # Open the ORC file
    with open(file_path, 'rb') as f:
        orc_file = orc.ORCFile(f)

    # Read the ORC file as a PyArrow Table
    table = orc_file.read()

    return table

# Specify the path to your ORC file
orc_file_path = r'C:\Users\WEY\Desktop\KIT_NN\Notebooks\sample_data.orc'

# Read the ORC file
try:
    table = read_orc_file(orc_file_path)
    print(table)
except Exception as e:
    print("Error reading ORC file:", e)

当我运行它时，我得到这个：

`runfile('C:/Users/WEY/Desktop/KIT_NN/Notebooks/OpenORC.py', wdir='C:/Users/WEY/Desktop/KIT_NN/Notebooks')

致命的 Python 错误：已中止

主线：当前线程 0x000039ec（最近调用优先）：文件“C:\Users\WEY naconda3 nvs\KIT\lib\site-packages\pyarrow\orc.py”，第 187 行读取文件“C:/Users/WEY/Desktop/KIT_NN/Notebooks/OpenORC.py”，read_orc_file 中的第 16 行文件“C:/Users/WEY/Desktop/KIT_NN/Notebooks/OpenORC.py”，第 25 行文件“C:\Users\WEY naconda3 nvs\KIT\lib\site-packages\debugpy_vendored\pydevd_pydev_bundle_pydev_execfile.py”，execfile 中的第 14 行文件“C:\Users\WEY naconda3 nvs\KIT\lib\site-packages\debugpy_vendored\pydevd_pydev_bundle\pydev_umd.py”，运行文件中的第 175 行文件“C:\Users\WEY\AppData\Local\Temp\ipykernel_17556V212489.py”，

中的第1行

重新启动内核...`

我用在线 orc 查看器打开了 orc 文件，它运行没有问题。
在Anaconda中新建了一个环境，并在anaconda中再次安装了pyarrow。

还是没成功

Answer 1

我已经找到解决办法了！

这没有任何意义，但它有效。

每次我收到致命 Python 错误：“已中止...正在重新启动内核...” 我需要运行这个脚本：

import os
import pandas as pd

path1 = "res_32.orc"

res_file_path = os.path.join("C:\\", "Users", "WEY", "Desktop", "KIT_NN", "gr_32", f"{path1}")

df = pd.read_orc(res_file_path)

或

import os
import pyorc

path1 = "tspectra32_0_0.orc"

res_file_path = os.path.join("C:\\", "Users", "WEY", "Desktop", "KIT_NN", "gr_32", f"{path1}")

with open(res_file_path, "rb") as orc_file:
    reader = pyorc.Reader(orc_file)

如果我运行这两个脚本之一，不同模块中的其他代码也可以正常工作。对我来说，这太复杂了，无法理解发生了什么，但它确实有效！

当我尝试使用 pyarrow 打开简单的 .orc 文件时，出现“致命 Python 错误：中止”并且没有可以处理的解释性错误消息

问题描述投票：0回答：1

1个回答

最新问题

当我尝试使用 pyarrow 打开简单的 .orc 文件时，出现“致命 Python 错误：中止”并且没有可以处理的解释性错误消息

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1