我有一个关于DataFrames的问题。我有一个Dataframe,其间隔为0.1秒,特征属于该间隔。我想添加一列包含之前算法的预测(这个间隔是沉默还是发声)。我有一个字典,其中包含每个音频记录的所有预测沉默间隔。我的Dataframe会是这样的。这里的 df 是根据 audio_id==0 过滤的,并根据 interval_x 排序。
audio_id interval_x interval_y predicted_value
0 0 0.579367 0.679367 0
1 0 0.679367 0.779367 0
2 0 0.779367 0.879367 0
3 0 0.879367 0.979367 0
4 0 0.979367 1.079367 0
... ... ... ... ...
518 0 50.805830 50.905830 0
519 0 50.905830 51.005830 0
520 0 51.005830 51.105830 0
521 0 51.105830 51.205830 0
522 0 51.205830 51.212938 0
我的字典中包含的静音间隔是这样的。
{'0': [[1.4501383219954658, 2.058138321995466],
[3.298138321995466, 4.762138321995465],
[7.682138321995467, 8.266138321995465],
[11.266138321995466, 11.938138321995465],
[13.242138321995466, 13.706138321995466],
[16.73013832199547, 17.82613832199547],
[24.53813832199547, 25.130138321995467],
[26.394138321995467, 27.042138321995466],
[28.21013832199547, 28.722138321995466]],
'1': [[0.0, 0.31253968253968023],
[4.296539682539681, 5.040539682539681],
[8.64053968253968, 9.296539682539679],
每个音频文件的等。
有什么有效的方法可以做到这一点?
这里有一个解决方案,使用 merge_asof
以将间隔与它们最接近的沉默时间相匹配。d
是问题中的字典,而 intervals
是数据帧。
silent_times = pd.DataFrame.from_records([(file, from_time, to_time) for file, values in d.items()
for [from_time, to_time] in values],
columns = ["audio_id", "from_time", "to_time"])
silent_times.audio_id = silent_times.audio_id.astype(int)
res = pd.DataFrame()
for inx in intervals.audio_id.unique():
intervals_slice = intervals[intervals.audio_id == inx]
silent_times_slice = silent_times[silent_times.audio_id == inx]
t = pd.merge_asof(intervals_slice, silent_times_slice, left_on=["interval_x"], right_on=["from_time"])
t.loc[(t.interval_x>=t.from_time) & (t.interval_y <=t.to_time), "predicted_value"] = 1
res = res.append(t)
从问题的数据帧,以及这个沉默的区间的结果。
d = {'0': [
[1.4501383219954658, 2.058138321995466],
[3.298138321995466, 4.762138321995465],
[7.682138321995467, 8.266138321995465],
[50.01, 51.01]
],
'1': [
[0.0, 0.31253968253968023],
[4.296539682539681, 5.040539682539681],
[8.64053968253968, 9.296539682539679]]}
是如下:
print(res[["audio_id_x", "interval_x", "interval_y", "predicted_value"]])
audio_id_x interval_x interval_y predicted_value
0 0 0.579367 0.679367 0
1 0 0.679367 0.779367 0
2 0 0.779367 0.879367 0
3 0 0.879367 0.979367 0
4 0 0.979367 1.079367 0
5 0 50.805830 50.905830 1
6 0 50.905830 51.005830 1
7 0 51.005830 51.105830 0
8 0 51.105830 51.205830 0
9 0 51.205830 51.212938 0