有没有办法将 python 中的邮政编码和另一列添加到数据集?

问题描述 投票:0回答:0

我正在做一个机器学习项目,我需要使用 sckit-learn 库。我的预处理任务是导入数据集,然后复制它进行预处理。我需要根据“社区名称”添加一个邮政编码列我想知道是否有办法在 python 中执行此操作。作为参考,我使用的数据集来自此链接:https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data

这是我用的,但我不明白发生了什么,我是 python 的新手。

# Function to extract zip code from neighborhood column
def extract_zip_code(string):
    try:
        # Extracting the zip code from the string using regex
        zip_code = re.findall(r'\b\d{5}\b', string)[0]
    except:
        # If no zip code is found, return an empty string
        zip_code = ''
    return zip_code

# Applying the function to the neighborhood column of the NYC dataset
df_nyc['zip_code'] = df_nyc['neighbourhood'].apply(extract_zip_code)

# Using the zip codes from the NYC dataset to fill in the zip codes in the copied dataset
for i, row in df_nyc.iterrows():
    index = df_copy.index[(df_copy['city'] == row['city']) & (df_copy['state'] == row['state']) & (df_copy['zip_code'] == '')].tolist()
    if len(index) > 0:
        df_copy.at[index[0], 'zip_code'] = row['zip_code']

# Exporting the copied dataset with zip codes as a new CSV file
df_copy.to_csv('us_airbnb_open_data_with_zipcodes.csv', index=False)
python pandas dataframe
© www.soinside.com 2019 - 2024. All rights reserved.