pandas 中将对象类型列转换为 int 类型的问题

问题描述 投票:0回答:1

大家好,我有如下所示的 CSV 文件:

Make,Colour,Odometer (KM),Doors,Price
Toyota,White,150043,4,"$4,000.00"

在我的pandas中,我将价格列的数据类型作为对象而不是int,所以我尝试转换 列到 int 类型:

import pandas as pd

cars_sales = pd.read_csv("./data/car_sales.csv")

cars_sales["Price"] = cars_sales["Price"].str.replace('[\$\,]|\.\d*', '').astype(int)

但是当我执行单元格时,我收到错误:

<>:1: SyntaxWarning: invalid escape sequence '\$'
<>:1: SyntaxWarning: invalid escape sequence '\$'
C:\Users\safkh\AppData\Local\Temp\ipykernel_9780\1557943234.py:1: SyntaxWarning: invalid escape sequence '\$'
  cars_sales["Price"] = cars_sales["Price"].str.replace('[\$\,]|\.\d*', '').astype(int)
C:\Users\safkh\AppData\Local\Temp\ipykernel_9780\1557943234.py:1: SyntaxWarning: invalid escape sequence '\$'
  cars_sales["Price"] = cars_sales["Price"].str.replace('[\$\,]|\.\d*', '').astype(int)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[62], line 1
----> 1 cars_sales["Price"] = cars_sales["Price"].str.replace('[\$\,]|\.\d*', '').astype(int)

File c:\Projects\play_ground\ML_AND_DS\ml_ground\venv\Lib\site-packages\pandas\core\generic.py:6299, in NDFrame.__getattr__(self, name)
   6292 if (
   6293     name not in self._internal_names_set
   6294     and name not in self._metadata
   6295     and name not in self._accessors
   6296     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297 ):
   6298     return self[name]
-> 6299 return object.__getattribute__(self, name)

File c:\Projects\play_ground\ML_AND_DS\ml_ground\venv\Lib\site-packages\pandas\core\accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
    221 if obj is None:
    222     # we're accessing the attribute of the class, i.e., Dataset.geo
    223     return self._accessor
--> 224 accessor_obj = self._accessor(obj)
    225 # Replace the property with the accessor object. Inspired by:
    226 # https://www.pydanny.com/cached-property.html
    227 # We need to use object.__setattr__ because we overwrite __setattr__ on
    228 # NDFrame
    229 object.__setattr__(obj, self._name, accessor_obj)
...
    244 if inferred_dtype not in allowed_types:
--> 245     raise AttributeError("Can only use .str accessor with string values!")
    246 return inferred_dtype

AttributeError: Can only use .str accessor with string values!
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

我该如何解决这个问题,谢谢。

python pandas
1个回答
0
投票

鉴于您的数据框的性质,我怀疑您需要使用

regex
来处理美元符号。

import pandas as pd

data = {
    "Make": ["Toyota", "Honda", "Ford"],
    "Colour": ["White", "Black", "Blue"],
    "Odometer (KM)": [150043, 86000, 123000],
    "Doors": [4, 4, 2],
    "Price": ["$4,000.00", "$5,500.00", "$7,000.00"]
}

sample_df = pd.DataFrame(data)

sample_df['Price'] = sample_df['Price'].astype(str).str.replace(r'[\$\,]|\.\d*', '', regex=True)
sample_df['Price'] = pd.to_numeric(sample_df['Price'], errors='coerce')
sample_df['Price'] = sample_df['Price'].fillna(0).astype(int)

sample_df

退货者

     Make Colour  Odometer (KM)  Doors  Price
0  Toyota  White         150043      4   4000
1   Honda  Black          86000      4   5500
2    Ford   Blue         123000      2   7000
© www.soinside.com 2019 - 2024. All rights reserved.