Python - googletrans | “NoneType”对象没有属性“group”

问题描述 投票:0回答:1

我正在尝试将我的数据集翻译为英语 - 它包含一些不同语言的数据,然后我首先写了这个:

import pandas as pd
from langdetect import detect


df = pd.read_csv('jobs.csv')
def detect_language(text):
    try:
        return detect(text)
    except:
        return None

df['language'] = df['job_title'].apply(detect_language)

print(df.head())

然后尝试将非英语语言翻译为英语

from googletrans import Translator

def translate_to_english(text):
    if pd.isnull(text):
        return text
    else:
        translator = Translator()
        translated = translator.translate(text, src='auto', dest='en')
        return translated.text

english_jobs = df[df['language'] == 'en']

non_english_jobs = df[df['language'] != 'en']
non_english_jobs['translated_job_title'] = non_english_jobs['job_title'].apply(translate_to_english)

translated_df = pd.concat([english_jobs, non_english_jobs])

print(translated_df.head())

但是第二个给出了这样的错误:

AttributeError                            Traceback (most recent call last)
Cell In[29], line 20
     18 # Translate non-English text to English
     19 non_english_jobs = df[df['language'] != 'en']
---> 20 non_english_jobs['translated_job_title'] = non_english_jobs['job_title'].apply(translate_to_english)
     22 # Concatenate English and translated non-English jobs
     23 translated_df = pd.concat([english_jobs, non_english_jobs])

AttributeError: 'NoneType' object has no attribute 'group'

你能帮我解决这个问题吗?

python pandas dataframe translation attributeerror
1个回答
0
投票

首先我对此进行了测试,我发现主要问题来自

googletrans
。我用
deep-translator
代替那个。

import pandas as pd
from langdetect import detect
from deep_translator import GoogleTranslator 

我创建了虚拟 df 来检查我的代码。

data = {'job_title': ["Software Engineer", "Ingeniero de Software", "Développeur logiciel", "Data Scientist", "Gerente de Proyecto"]}
df = pd.DataFrame(data)

使用

lambda
代替
detect_language(text):

df['language'] = df['job_title'].apply(lambda x: detect(x) if x is not None else None)

def translate_to_english(text):
    if pd.isnull(text):  
        return text
    else:
        try:
            translator = GoogleTranslator(source='auto', target='en')
            translated = translator.translate(text)
            return translated
        except Exception as e:
            print(f"Error translating '{text}': {e}")
            return None

non_english_jobs = df[df['language'] != 'en']

我在修改之前创建了一个副本来处理

SettingWithCopyWarning

non_english_jobs_copy = non_english_jobs.copy()  
non_english_jobs_copy['translated_job_title'] = non_english_jobs_copy['job_title'].apply(translate_to_english)

translated_df = pd.concat([df[df['language'] == 'en'], non_english_jobs_copy])
print(translated_df.head())

输出如下:

               job_title language translated_job_title
0      Software Engineer       en                  NaN
1  Ingeniero de Software       de    Software engineer
2   Développeur logiciel       fr   Software developer
3         Data Scientist       it       Data Scientist
4    Gerente de Proyecto       es      Project Manager
© www.soinside.com 2019 - 2024. All rights reserved.