在整个熊猫数据框中正确修剪空白?

问题描述 投票:0回答:2

我正在尝试完成一个简单的任务,即修剪数据帧中每一列的所有空白。我有一些值,这些值在单词之后,单词之前以及其他仅包含" "值的列中都有尾随空格。我希望所有内容都被删除。

我阅读了this post,它为实现这一目标提供了一个很好的方法:data_frame_trimmed = data_frame.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

但是,我经常得到以下内容:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-31d35db1d48c> in <module>
      1 df = (pd.read_csv('C:\\Users\\J39304\Desktop\\aggregated_po_data.csv',
----> 2                     encoding = "ISO-8859-1", low_memory=False).apply(lambda x: x.str.strip() if (x.dtype == "object") else x))
      3 print(df.shape)
      4 
      5 label = df['ON_TIME']

c:\python367-64\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   6876             kwds=kwds,
   6877         )
-> 6878         return op.get_result()
   6879 
   6880     def applymap(self, func) -> "DataFrame":

c:\python367-64\lib\site-packages\pandas\core\apply.py in get_result(self)
    184             return self.apply_raw()
    185 
--> 186         return self.apply_standard()
    187 
    188     def apply_empty_result(self):

c:\python367-64\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    294             try:
    295                 result = libreduction.compute_reduction(
--> 296                     values, self.f, axis=self.axis, dummy=dummy, labels=labels
    297                 )
    298             except ValueError as err:

pandas\_libs\reduction.pyx in pandas._libs.reduction.compute_reduction()

pandas\_libs\reduction.pyx in pandas._libs.reduction.Reducer.get_result()

<ipython-input-9-31d35db1d48c> in <lambda>(x)
      1 df = (pd.read_csv('C:\\Users\\wundermahn\Desktop\\aggregated_data.csv',
----> 2                     encoding = "ISO-8859-1", low_memory=False).apply(lambda x: x.str.strip() if (x.dtype == "object") else x))
      3 print(df.shape)
      4 
      5 label = df['ON_TIME']

c:\python367-64\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5268             or name in self._accessors
   5269         ):
-> 5270             return object.__getattribute__(self, name)
   5271         else:
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):

c:\python367-64\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls)
    185             # we're accessing the attribute of the class, i.e., Dataset.geo
    186             return self._accessor
--> 187         accessor_obj = self._accessor(obj)
    188         # Replace the property with the accessor object. Inspired by:
    189         # http://www.pydanny.com/cached-property.html

c:\python367-64\lib\site-packages\pandas\core\strings.py in __init__(self, data)
   2039 
   2040     def __init__(self, data):
-> 2041         self._inferred_dtype = self._validate(data)
   2042         self._is_categorical = is_categorical_dtype(data)
   2043         self._is_string = data.dtype.name == "string"

c:\python367-64\lib\site-packages\pandas\core\strings.py in _validate(data)
   2096 
   2097         if inferred_dtype not in allowed_types:
-> 2098             raise AttributeError("Can only use .str accessor with string values!")
   2099         return inferred_dtype
   2100 

**AttributeError: Can only use .str accessor with string values!**

因此,在尝试找到解决方法时,我偶然发现了这篇文章,该文章建议使用:

data_frame_trimmed = data_frame.apply(lambda x: x.str.strip() if x.dtype == "str" else x)

但是,这不会删除仅包含空格或制表符的空单元格。

如何有效去除空白的所有变体?我最终将删除null值超过50%的列。

python pandas whitespace trim
2个回答
0
投票

您可以尝试使用try

def trim(x):
    try:
        return x.str.strip()
    except:
        return x

df = df.apply(trim)

0
投票

首先使用select_dtypes选择正确的列:

# example dataframe
df = pd.DataFrame({'col1':[1,2,3],
                   'col2':list('abc'),
                   'col3':[4.0, 5.0, 6.0],
                   'col4':[' foo', '   bar', 'foobar. ']})

   col1 col2  col3      col4
0     1    a   4.0       foo
1     2    b   5.0       bar
2     3    c   6.0  foobar. 
str_cols = df.select_dtypes('object').columns
df[str_cols] = df[str_cols].apply(lambda x: x.str.strip())

print(df)
   col1 col2  col3     col4
0     1    a   4.0      foo
1     2    b   5.0      bar
2     3    c   6.0  foobar.
© www.soinside.com 2019 - 2024. All rights reserved.