Pandas过滤并转换为Date to datetime64ns

问题描述 投票:1回答:2

我正在尝试找出问题,但到目前为止,我找不到任何解决方案,希望您可以提供帮助。我有一个DataFrame,我想将str转换为datatime,但是有一些无效的行要过滤掉。这是两个示例:

Out[6]:
  #  name    date
  0  aa      2012-11-30T14:00:00+01:00
  1  bb      2012-12-01T08:16:00+01:00
  2  cc      2012-12-01T10:14:00+01:00
  3  ee      2012-12-01T11:05:00+01:00
  4  gg      2012-12-01T11:05:00+01:00

In [7]: df2
Out[7]:
  #  name    date
  0  aa      2012-11-30T14:00:00+01:00
  1  bb      2012-12-01T08:16:00+01:00
  2  cc      2012-12-01T10:14:00+01:00
  3  ee      2012-12-01T11:05:00+01:00
  4  ff      fsadfi2 2ih3ro
  5  gg      2012-12-01T11:05:00+01:00
In [11]: df.dtypes
Out[11]:
name    <class 'str'>
date    <class 'str'>
dtype: object

In [12]: df2.dtypes
Out[12]:
name    <class 'str'>
date    <class 'str'>
dtype: object

df我很好,它在date列中只有有效日期。但是df2有一些无效的行。我们先来看一下df,下面一行可以转换为datetime

df['pdate']=df.date.values.astype('datetime64[ns]')

效果很好:


In [16]: df
Out[16]:
  #  name    date                       pdate
  0  aa      2012-11-30T14:00:00+01:00  2012-11-30 13:00:00.000000000
  1  bb      2012-12-01T08:16:00+01:00  2012-12-01 07:16:00.000000000
  2  cc      2012-12-01T10:14:00+01:00  2012-12-01 09:14:00.000000000
  3  ee      2012-12-01T11:05:00+01:00  2012-12-01 10:05:00.000000000
  4  gg      2012-12-01T11:05:00+01:00  2012-12-01 10:05:00.000000000

In [17]: df.dtypes
Out[17]:
name      <class 'str'>
date      <class 'str'>
pdate    datetime64[ns]
dtype: object

现在我尝试使用非常简单的str.contains ::]进行过滤

In [18]: df2_filtered=df2[df2['date'].str.contains(':00')]

In [19]: df2_filtered
Out[19]:
  #  name    date
  0  aa      2012-11-30T14:00:00+01:00
  1  bb      2012-12-01T08:16:00+01:00
  2  cc      2012-12-01T10:14:00+01:00
  3  ee      2012-12-01T11:05:00+01:00
  4  gg      2012-12-01T11:05:00+01:00

In [20]: df2_filtered.dtypes
Out[20]:
name    <class 'str'>
date    <class 'str'>
dtype: object

仅具有5 Rows。现在,我尝试进行转换,并收到一条不错的错误消息:

In [21]: df2_filtered['pdate']=df2_filtered.date.values.astype('datetime64[ns]')
    ...:
/usr/local/bin/ipython:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future
  #!/opt/local/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-563087d6f949> in <module>
----> 1 df2_filtered['pdate']=df2_filtered.date.values.astype('datetime64[ns]')

/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py in __setitem__(self, name, value)
   4370         if isinstance(name, six.string_types):
   4371             if isinstance(value, (np.ndarray, Column)):
-> 4372                 self.add_column(name, value)
   4373             else:
   4374                 self.add_virtual_column(name, value)

/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py in add_column(self, name, data, dtype)
   5743         #     self._length_original = len(data)
   5744         #     self._index_end = self._length_unfiltered
-> 5745         super(DataFrameArrays, self).add_column(name, data, dtype=dtype)
   5746         self._length_unfiltered = int(round(self._length_original * self._active_fraction))
   5747         # self.set_active_fraction(self._active_fraction)

/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py in add_column(self, name, f_or_array, dtype)
   2872                     # give a better warning to avoid confusion
   2873                     if len(self) == len(ar):
-> 2874                         raise ValueError("Array is of length %s, while the length of the DataFrame is %s due to the filtering, the (unfiltered) length is %s." % (len(ar), len(self), self.length_unfiltered()))
   2875                 raise ValueError("array is of length %s, while the length of the DataFrame is %s" % (len(ar), self.length_original()))
   2876             # assert self.length_unfiltered() == len(data), "columns should be of equal length, length should be %d, while it is %d" % ( self.length_unfiltered(), len(data))

ValueError: Array is of length 5, while the length of the DataFrame is 5 due to the filtering, the (unfiltered) length is 6.

说:ValueError:数组的长度为5,而由于过滤,DataFrame的长度为5,(未过滤的)长度为6。

但是据我在df2_filtered中的了解,我只有5行。我不知道df2中有多少行为什么很重要。

基本上我的问题是如何过滤掉不必要的数据并将列转换为Datetime?

我正在尝试找出问题,但到目前为止,我找不到任何解决方案,希望您可以提供帮助。我有一个DataFrame,我想将str转换为datatime,但是有一些无效的行...

python pandas dataframe hdf5 vaex
2个回答
0
投票

IIUC,pd.to_datetime,它允许您使用某些关键字参数将列转换为DateTime。在这种情况下,您需要errors='coerce'


0
投票

很遗憾,我没有完整的答案,但是您可能会对这部分问题有所了解:

© www.soinside.com 2019 - 2024. All rights reserved.