file.read() UnicodeDecodeError - Devuan Daedalus (Debian 12 w/o systemd)

问题描述 投票:0回答:1

我有一个由我的朋友编写的用于文本替换的 Python 脚本,可以在他的系统 Ubuntu Focal 上运行。

以下是脚本:

#!/usr/bin/env python3
"""
Script Name: replace_text.py
Purpose: This Python script performs text substitution in files within a given directory.
It replaces specific characters as per predefined substitutions, providing a convenient way to modify text files.

Usage:
python replace_text.py /path/to/your/directory

Note:
- Ensure you have Python installed on your system.
- The script processes all files within the specified directory and its subdirectories.
- Files are modified in-place, so have a backup if needed.
"""

import os
import sys

def replace_text_in_files(directory):
    # Character substitutions
    substitutions = {
        '': 'fi',
        '': 'fl',
        'ä': 'ā',
        'é': 'ī',
        'ü': 'ū',
        'å': 'ṛ',
        'è': 'ṝ',
        'ì': 'ṅ',
        'ñ': 'ṣ',
        'ï': 'ñ',
        'ö': 'ṭ',
        'ò': 'ḍ',
        'ë': 'ṇ',
        'ç': 'ś',
        'à': 'ṁ',
        'ù': 'ḥ',
        'ÿ': 'ḷ',
        'û': 'ḹ',
        'Ä': 'Ā',
        'É': 'Ī',
        'Ü': 'Ū',
        'Å': 'Ṛ',
        'È': 'Ṝ',
        'Ì': 'Ṅ',
        'Ñ': 'Ṣ',
        'Ï': 'Ñ',
        'Ö': 'Ṭ',
        'Ò': 'Ḍ',
        'Ë': 'Ṇ',
        'Ç': 'Ś',
        'À': 'Ṁ',
        'Ù': 'Ḥ',
        'ß': 'Ḷ',
        '“': '“',
        '”': '”',
        ' ': ' ',
        '‘': '‘',
        '–': '-',
        '’': '’',
        '—': '—',
        '•': '»',
        '…': '...',
    }

    # Walk through the directory and its subdirectories
    for root, dirs, files in os.walk(directory):
        for file_name in files:
            file_path = os.path.join(root, file_name)
            with open(file_path, 'r', encoding='utf-8') as file:
                file_content = file.read()
            
            # Perform substitutions
            for original, replacement in substitutions.items():
                file_content = file_content.replace(original, replacement)

            # Write the modified content back to the file
            with open(file_path, 'w', encoding='utf-8') as file:
                file.write(file_content)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python replace_text.py /path/to/your/directory")
        sys.exit(1)

    directory_path = sys.argv[1]
    replace_text_in_files(directory_path)
    print("Text substitution completed successfully.")

我 Devuan Daedalus,它基于 Debian 12,但没有 systemd。 在我的机器上运行此脚本时,出现以下错误:

~/Documents/software-related/software-files$ python3 replace_text.py ~/Desktop/test-dir/
Traceback (most recent call last):
  File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 89, in <module>
    replace_text_in_files(directory_path)
  File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 73, in replace_text_in_files
    file_content = file.read()
                   ^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 41: invalid start byte

他对此一无所知。而且我对Python一无所知。因此我寻求这个论坛中知识渊博的人的帮助。

我采纳了 Ofer Sadan 的建议,将文件作为字节文件打开。但这给了我另一个错误:

 binary mode doesn't take an encoding argument

如果出现以下情况,请索取更多信息:

  1. 这个问题似乎太模糊/开放式/笼统。
  2. 我没有提供足够的信息。

谢谢,

python python-3.x runtime-error debian-based
1个回答
0
投票

Python 会抱怨,因为在将其中一个文件读取为转换为 utf-8 字符的字节时。到了一个字节就不再是有效的 utf-8 字符的地步了。您确定该文件实际上是 utf-8 编码的文件吗? https://www.charset.org/utf-8

尝试以二进制方式读取文件将为您提供实际的字节,但您想要替换字符。那么你必须使用 utf-8 编解码器将字节转换为字符串,我猜,你最终会遇到相同的错误。

在您的情况(备份)中我会格外小心,您可能正在尝试调整实际的二进制文件。您确定要修改您正在触摸的文件吗?

© www.soinside.com 2019 - 2024. All rights reserved.