我有两个tf.data.Dataset,让它们分别称为d1
和d2
,我想构造另一个包含d1
和d2
元素的数据集。举一个例子更容易解释。让我们说:
d1 = [0,1,2,3,4,5,6,7,...] # it is not a list, just the content of the dataset
d2 = ["a", "b", "c", "d",... ]
并且我有一对夫妇指定每个数据集中的连续元素数(例如(3,1))。
我正在寻找的结果是:
result = [0, 1, 2, "a", 3, 4, 5, "b", 6, 7, 8, "c"...]
编辑:d1和d2是tf.data.Dataset类的对象。上面的示例仅显示数据集的内容,但不是代码。
print(d1)
print("---------------------------")
print(d2)
print("---------------------------")
def interweave(x, d1, y, d2):
"""
x = How many lines of d1 to add before adding a line from d2
d1 = the d1 dataframe
y = How many lines of d2 to add before adding a line from d1 again
d2 = the d2 dataframe
"""
d3 = pd.DataFrame()
countx = 0
county = 0
length = len(d1) if len(d1) > len(d2) else len(d2)
for count in range(0,length):
for i in range(countx, countx + x):
try: # This will prevent script halt from unequal or indivisible lengths
row = d1.iloc[(i)]
except:
break
d3 = d3.append(row)
countx += 1
for j in range(county, county + y):
try: # This will prevent script halt from unequal or indivisible lengths
row = d2.iloc[j]
except:
break
d3 = d3.append(row)
county += 1
d3 = d3.reset_index(drop = True)
return d3
d3 = interweave(3, d1, 1, d2)
print(d3)
输出:
Col1 Col2
0 0 0
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
6 6 60
7 7 70
8 8 80
9 9 90
10 10 100
---------------------------
Col1 Col2
0 a A
1 b B
2 c C
---------------------------
Col1 Col2
0 0 0
1 1 10
2 2 20
3 a A
4 3 30
5 4 40
6 5 50
7 b B
8 6 60
9 7 70
10 8 80
11 c C
12 9 90
13 10 100
假设TF 2.0。技巧基于batch,然后是数据集交织和unbatch。
import tensorflow as tf
# input datasets
d1 = tf.data.Dataset.from_tensors([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]).unbatch()
d2 = tf.data.Dataset.from_tensors([100, 101, 102]).unbatch()
# replaced letters with numbers to make tensor types match
# define ratio
r1 = 3
r2 = 1
b1 = d1.batch(r1)
b2 = d2.batch(r2)
zipped = tf.data.Dataset.zip((b1, b2)).map(lambda x, y: tf.concat((x, y), axis=0))
result = zipped.unbatch()
输出:
In [9]: list(result)
Out[9]:
[<tf.Tensor: id=224, shape=(), dtype=int32, numpy=0>,
<tf.Tensor: id=225, shape=(), dtype=int32, numpy=1>,
<tf.Tensor: id=226, shape=(), dtype=int32, numpy=2>,
<tf.Tensor: id=227, shape=(), dtype=int32, numpy=100>,
<tf.Tensor: id=228, shape=(), dtype=int32, numpy=3>,
<tf.Tensor: id=229, shape=(), dtype=int32, numpy=4>,
<tf.Tensor: id=230, shape=(), dtype=int32, numpy=5>,
<tf.Tensor: id=231, shape=(), dtype=int32, numpy=101>,
<tf.Tensor: id=232, shape=(), dtype=int32, numpy=6>,
<tf.Tensor: id=233, shape=(), dtype=int32, numpy=7>,
<tf.Tensor: id=234, shape=(), dtype=int32, numpy=8>,
<tf.Tensor: id=235, shape=(), dtype=int32, numpy=102>]
注意:此解决方案可能会删除d1
或d2
末尾的某些元素-必须将其长度调整为比率。