我正在尝试找出问题,但到目前为止,我找不到任何解决方案,希望您可以提供帮助。我有一个DataFrame,我想将str
转换为datatime
,但是有一些无效的行要过滤掉。这是两个示例:
Out[6]:
# name date
0 aa 2012-11-30T14:00:00+01:00
1 bb 2012-12-01T08:16:00+01:00
2 cc 2012-12-01T10:14:00+01:00
3 ee 2012-12-01T11:05:00+01:00
4 gg 2012-12-01T11:05:00+01:00
In [7]: df2
Out[7]:
# name date
0 aa 2012-11-30T14:00:00+01:00
1 bb 2012-12-01T08:16:00+01:00
2 cc 2012-12-01T10:14:00+01:00
3 ee 2012-12-01T11:05:00+01:00
4 ff fsadfi2 2ih3ro
5 gg 2012-12-01T11:05:00+01:00
In [11]: df.dtypes
Out[11]:
name <class 'str'>
date <class 'str'>
dtype: object
In [12]: df2.dtypes
Out[12]:
name <class 'str'>
date <class 'str'>
dtype: object
df
我很好,它在date
列中只有有效日期。但是df2
有一些无效的行。我们先来看一下df
,下面一行可以转换为datetime
:
df['pdate']=df.date.values.astype('datetime64[ns]')
效果很好:
In [16]: df
Out[16]:
# name date pdate
0 aa 2012-11-30T14:00:00+01:00 2012-11-30 13:00:00.000000000
1 bb 2012-12-01T08:16:00+01:00 2012-12-01 07:16:00.000000000
2 cc 2012-12-01T10:14:00+01:00 2012-12-01 09:14:00.000000000
3 ee 2012-12-01T11:05:00+01:00 2012-12-01 10:05:00.000000000
4 gg 2012-12-01T11:05:00+01:00 2012-12-01 10:05:00.000000000
In [17]: df.dtypes
Out[17]:
name <class 'str'>
date <class 'str'>
pdate datetime64[ns]
dtype: object
现在我尝试使用非常简单的str.contains
::]进行过滤
In [18]: df2_filtered=df2[df2['date'].str.contains(':00')] In [19]: df2_filtered Out[19]: # name date 0 aa 2012-11-30T14:00:00+01:00 1 bb 2012-12-01T08:16:00+01:00 2 cc 2012-12-01T10:14:00+01:00 3 ee 2012-12-01T11:05:00+01:00 4 gg 2012-12-01T11:05:00+01:00 In [20]: df2_filtered.dtypes Out[20]: name <class 'str'> date <class 'str'> dtype: object
仅具有
5 Rows
。现在,我尝试进行转换,并收到一条不错的错误消息:
In [21]: df2_filtered['pdate']=df2_filtered.date.values.astype('datetime64[ns]') ...: /usr/local/bin/ipython:1: DeprecationWarning: parsing timezone aware datetimes is deprecated; this will raise an error in the future #!/opt/local/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-21-563087d6f949> in <module> ----> 1 df2_filtered['pdate']=df2_filtered.date.values.astype('datetime64[ns]') /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py in __setitem__(self, name, value) 4370 if isinstance(name, six.string_types): 4371 if isinstance(value, (np.ndarray, Column)): -> 4372 self.add_column(name, value) 4373 else: 4374 self.add_virtual_column(name, value) /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py in add_column(self, name, data, dtype) 5743 # self._length_original = len(data) 5744 # self._index_end = self._length_unfiltered -> 5745 super(DataFrameArrays, self).add_column(name, data, dtype=dtype) 5746 self._length_unfiltered = int(round(self._length_original * self._active_fraction)) 5747 # self.set_active_fraction(self._active_fraction) /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/vaex/dataframe.py in add_column(self, name, f_or_array, dtype) 2872 # give a better warning to avoid confusion 2873 if len(self) == len(ar): -> 2874 raise ValueError("Array is of length %s, while the length of the DataFrame is %s due to the filtering, the (unfiltered) length is %s." % (len(ar), len(self), self.length_unfiltered())) 2875 raise ValueError("array is of length %s, while the length of the DataFrame is %s" % (len(ar), self.length_original())) 2876 # assert self.length_unfiltered() == len(data), "columns should be of equal length, length should be %d, while it is %d" % ( self.length_unfiltered(), len(data)) ValueError: Array is of length 5, while the length of the DataFrame is 5 due to the filtering, the (unfiltered) length is 6.
说:ValueError:数组的长度为5,而由于过滤,DataFrame的长度为5,(未过滤的)长度为6。
但是据我在df2_filtered
中的了解,我只有5行。我不知道df2
中有多少行为什么很重要。
基本上我的问题是如何过滤掉不必要的数据并将列转换为Datetime?
我正在尝试找出问题,但到目前为止,我找不到任何解决方案,希望您可以提供帮助。我有一个DataFrame,我想将str转换为datatime,但是有一些无效的行...
IIUC,pd.to_datetime
,它允许您使用某些关键字参数将列转换为DateTime。在这种情况下,您需要errors='coerce'
很遗憾,我没有完整的答案,但是您可能会对这部分问题有所了解: