我正在尝试完成一个简单的任务,即修剪数据帧中每一列的所有空白。我有一些值,这些值在单词之后,单词之前以及其他仅包含" "
值的列中都有尾随空格。我希望所有内容都被删除。
我阅读了this post,它为实现这一目标提供了一个很好的方法:data_frame_trimmed = data_frame.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
但是,我经常得到以下内容:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-31d35db1d48c> in <module>
1 df = (pd.read_csv('C:\\Users\\J39304\Desktop\\aggregated_po_data.csv',
----> 2 encoding = "ISO-8859-1", low_memory=False).apply(lambda x: x.str.strip() if (x.dtype == "object") else x))
3 print(df.shape)
4
5 label = df['ON_TIME']
c:\python367-64\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
6876 kwds=kwds,
6877 )
-> 6878 return op.get_result()
6879
6880 def applymap(self, func) -> "DataFrame":
c:\python367-64\lib\site-packages\pandas\core\apply.py in get_result(self)
184 return self.apply_raw()
185
--> 186 return self.apply_standard()
187
188 def apply_empty_result(self):
c:\python367-64\lib\site-packages\pandas\core\apply.py in apply_standard(self)
294 try:
295 result = libreduction.compute_reduction(
--> 296 values, self.f, axis=self.axis, dummy=dummy, labels=labels
297 )
298 except ValueError as err:
pandas\_libs\reduction.pyx in pandas._libs.reduction.compute_reduction()
pandas\_libs\reduction.pyx in pandas._libs.reduction.Reducer.get_result()
<ipython-input-9-31d35db1d48c> in <lambda>(x)
1 df = (pd.read_csv('C:\\Users\\wundermahn\Desktop\\aggregated_data.csv',
----> 2 encoding = "ISO-8859-1", low_memory=False).apply(lambda x: x.str.strip() if (x.dtype == "object") else x))
3 print(df.shape)
4
5 label = df['ON_TIME']
c:\python367-64\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5268 or name in self._accessors
5269 ):
-> 5270 return object.__getattribute__(self, name)
5271 else:
5272 if self._info_axis._can_hold_identifiers_and_holds_name(name):
c:\python367-64\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls)
185 # we're accessing the attribute of the class, i.e., Dataset.geo
186 return self._accessor
--> 187 accessor_obj = self._accessor(obj)
188 # Replace the property with the accessor object. Inspired by:
189 # http://www.pydanny.com/cached-property.html
c:\python367-64\lib\site-packages\pandas\core\strings.py in __init__(self, data)
2039
2040 def __init__(self, data):
-> 2041 self._inferred_dtype = self._validate(data)
2042 self._is_categorical = is_categorical_dtype(data)
2043 self._is_string = data.dtype.name == "string"
c:\python367-64\lib\site-packages\pandas\core\strings.py in _validate(data)
2096
2097 if inferred_dtype not in allowed_types:
-> 2098 raise AttributeError("Can only use .str accessor with string values!")
2099 return inferred_dtype
2100
**AttributeError: Can only use .str accessor with string values!**
因此,在尝试找到解决方法时,我偶然发现了这篇文章,该文章建议使用:
data_frame_trimmed = data_frame.apply(lambda x: x.str.strip() if x.dtype == "str" else x)
但是,这不会删除仅包含空格或制表符的空单元格。
如何有效去除空白的所有变体?我最终将删除null
值超过50%的列。
您可以尝试使用try
:
def trim(x):
try:
return x.str.strip()
except:
return x
df = df.apply(trim)
首先使用select_dtypes
选择正确的列:
# example dataframe
df = pd.DataFrame({'col1':[1,2,3],
'col2':list('abc'),
'col3':[4.0, 5.0, 6.0],
'col4':[' foo', ' bar', 'foobar. ']})
col1 col2 col3 col4
0 1 a 4.0 foo
1 2 b 5.0 bar
2 3 c 6.0 foobar.
str_cols = df.select_dtypes('object').columns
df[str_cols] = df[str_cols].apply(lambda x: x.str.strip())
print(df)
col1 col2 col3 col4
0 1 a 4.0 foo
1 2 b 5.0 bar
2 3 c 6.0 foobar.