大家好,我有如下所示的 CSV 文件:
Make,Colour,Odometer (KM),Doors,Price
Toyota,White,150043,4,"$4,000.00"
在我的pandas中,我将价格列的数据类型作为对象而不是int,所以我尝试转换 列到 int 类型:
import pandas as pd
cars_sales = pd.read_csv("./data/car_sales.csv")
cars_sales["Price"] = cars_sales["Price"].str.replace('[\$\,]|\.\d*', '').astype(int)
但是当我执行单元格时,我收到错误:
<>:1: SyntaxWarning: invalid escape sequence '\$'
<>:1: SyntaxWarning: invalid escape sequence '\$'
C:\Users\safkh\AppData\Local\Temp\ipykernel_9780\1557943234.py:1: SyntaxWarning: invalid escape sequence '\$'
cars_sales["Price"] = cars_sales["Price"].str.replace('[\$\,]|\.\d*', '').astype(int)
C:\Users\safkh\AppData\Local\Temp\ipykernel_9780\1557943234.py:1: SyntaxWarning: invalid escape sequence '\$'
cars_sales["Price"] = cars_sales["Price"].str.replace('[\$\,]|\.\d*', '').astype(int)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[62], line 1
----> 1 cars_sales["Price"] = cars_sales["Price"].str.replace('[\$\,]|\.\d*', '').astype(int)
File c:\Projects\play_ground\ML_AND_DS\ml_ground\venv\Lib\site-packages\pandas\core\generic.py:6299, in NDFrame.__getattr__(self, name)
6292 if (
6293 name not in self._internal_names_set
6294 and name not in self._metadata
6295 and name not in self._accessors
6296 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
File c:\Projects\play_ground\ML_AND_DS\ml_ground\venv\Lib\site-packages\pandas\core\accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
221 if obj is None:
222 # we're accessing the attribute of the class, i.e., Dataset.geo
223 return self._accessor
--> 224 accessor_obj = self._accessor(obj)
225 # Replace the property with the accessor object. Inspired by:
226 # https://www.pydanny.com/cached-property.html
227 # We need to use object.__setattr__ because we overwrite __setattr__ on
228 # NDFrame
229 object.__setattr__(obj, self._name, accessor_obj)
...
244 if inferred_dtype not in allowed_types:
--> 245 raise AttributeError("Can only use .str accessor with string values!")
246 return inferred_dtype
AttributeError: Can only use .str accessor with string values!
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
我该如何解决这个问题,谢谢。
鉴于您的数据框的性质,我怀疑您需要使用
regex
来处理美元符号。
import pandas as pd
data = {
"Make": ["Toyota", "Honda", "Ford"],
"Colour": ["White", "Black", "Blue"],
"Odometer (KM)": [150043, 86000, 123000],
"Doors": [4, 4, 2],
"Price": ["$4,000.00", "$5,500.00", "$7,000.00"]
}
sample_df = pd.DataFrame(data)
sample_df['Price'] = sample_df['Price'].astype(str).str.replace(r'[\$\,]|\.\d*', '', regex=True)
sample_df['Price'] = pd.to_numeric(sample_df['Price'], errors='coerce')
sample_df['Price'] = sample_df['Price'].fillna(0).astype(int)
sample_df
退货者
Make Colour Odometer (KM) Doors Price
0 Toyota White 150043 4 4000
1 Honda Black 86000 4 5500
2 Ford Blue 123000 2 7000