如何用包含列表中的值的字典来扩展pydatable的列？

Question

我已经创建了一个示例datable作为。

DT_EX = dt.Frame({'recency': ['current','savings','fixex','current','savings','fixed','savings','current'],
                  'amount': [4200,2300,1500,8000,1200,6500,4500,9010],
                  'no_of_pl': [3,2,1,5,1,2,5,4],
                  'default': [True,False,True,False,True,True,True,False]})

而它可以看作是：

   | recency  amount  no_of_pl  default
-- + -------  ------  --------  -------
 0 | current    4200         3        1
 1 | savings    2300         2        0
 2 | fixex      1500         1        1
 3 | current    8000         5        0
 4 | savings    1200         1        1
 5 | fixed      6500         2        1
 6 | savings    4500         5        1
 7 | current    9010         4        0

[8 rows x 4 columns]

我正在做一些数据操作，具体步骤如下。

第1步：在datable中添加两列新的数据，分别为

DT_EX[:, f[:].extend({"total_amount": f.amount*f.no_of_pl,
                      'test_col': f.amount/f.no_of_pl})]

输出。

   | recency  amount  no_of_pl  default  total_amount  test_col
-- + -------  ------  --------  -------  ------------  --------
 0 | current    4200         3        1         12600    1400  
 1 | savings    2300         2        0          4600    1150  
 2 | fixex      1500         1        1          1500    1500  
 3 | current    8000         5        0         40000    1600  
 4 | savings    1200         1        1          1200    1200  
 5 | fixed      6500         2        1         13000    3250  
 6 | savings    4500         5        1         22500     900  
 7 | current    9010         4        0         36040    2252.5

[8 rows x 6 columns]

Step2:

创建一个字典，注意它的值是： 存储在一个列表中

test_dict = {'discount': [10,20,30,40,50,60,70,80],
             'charges': [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8]}

第3步：用上面提到的dict创建一个新的datatable，并将其追加到datatable DT_EX中，作为，。

用上面提到的dict创建一个新的datatable，并将其追加到datatable DT_EX中。

dt.cbind(DT_EX, dt.Frame(test_dict))

输出。

   | recency  amount  no_of_pl  default  discount  charges
-- + -------  ------  --------  -------  --------  -------
 0 | current    4200         3        1        10      0.1
 1 | savings    2300         2        0        20      0.2
 2 | fixex      1500         1        1        30      0.3
 3 | current    8000         5        0        40      0.4
 4 | savings    1200         1        1        50      0.5
 5 | fixed      6500         2        1        60      0.6
 6 | savings    4500         5        1        70      0.7
 7 | current    9010         4        0        80      0.8

[8 rows x 6 columns]

这里我们可以看到一个数据表，里面有新添加的列（折扣，费用）。

第四步

我们知道extend函数可以用来增加列，我试着在字典中传递名为 检验口述 作为：

DT_EX[:, f[:].extend(test_dict)]

输出。

Out[18]: 
   | recency  amount  no_of_pl  default  discount  discount.0  discount.1  discount.2  discount.3  discount.4  …  charges.2  charges.3  charges.4  charges.5  charges.6
-- + -------  ------  --------  -------  --------  ----------  ----------  ----------  ----------  ----------     ---------  ---------  ---------  ---------  ---------
 0 | current    4200         3        1        10          20          30          40          50          60  …        0.4        0.5        0.6        0.7        0.8
 1 | savings    2300         2        0        10          20          30          40          50          60  …        0.4        0.5        0.6        0.7        0.8
 2 | fixex      1500         1        1        10          20          30          40          50          60  …        0.4        0.5        0.6        0.7        0.8
 3 | current    8000         5        0        10          20          30          40          50          60  …        0.4        0.5        0.6        0.7        0.8
 4 | savings    1200         1        1        10          20          30          40          50          60  …        0.4        0.5        0.6        0.7        0.8
 5 | fixed      6500         2        1        10          20          30          40          50          60  …        0.4        0.5        0.6        0.7        0.8
 6 | savings    4500         5        1        10          20          30          40          50          60  …        0.4        0.5        0.6        0.7        0.8
 7 | current    9010         4        0        10          20          30          40          50          60  …        0.4        0.5        0.6        0.7        0.8

[8 rows x 20 columns]

注： : 在这里，在输出中可以看到，有大约8列创建（列表中的每个元素被填写），为每个字典键（折扣，费用）和总的新增加的列是16。

第5步：建立字典键

我曾想过用numpy数组的值创建一个字典，如：

test_dict_1 = {'discount': np.array([10,20,30,40,50,60,70,80]),
               'charges': np.array([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8])}

我已经通过了 test_dict_1 扩展功能为

DT_EX[:, f[:].extend(test_dict_1)]

的输出。

Out[20]: 
   | recency  amount  no_of_pl  default  discount  charges
-- + -------  ------  --------  -------  --------  -------
 0 | current    4200         3        1        10      0.1
 1 | savings    2300         2        0        20      0.2
 2 | fixex      1500         1        1        30      0.3
 3 | current    8000         5        0        40      0.4
 4 | savings    1200         1        1        50      0.5
 5 | fixed      6500         2        1        60      0.6
 6 | savings    4500         5        1        70      0.7
 7 | current    9010         4        0        80      0.8

[8 rows x 6 columns]

在这一步中，extend取得了一个字典并将新的列添加到DT_EX中，这是一个预期的输出。

所以，我想知道在第4步中发生了什么？为什么没有从一个字典键中取值列表来添加一个新的列？为什么步骤5的情况下被执行？

能否请你写下你的意见和答案？

Answer 1

你可以用Frame构造函数来包装字典，以得到想要的结果。

>>> DT_EX[:, f[:].extend(dt.Frame(test_dict))]
   | recency  amount  no_of_pl  default  discount  charges
-- + -------  ------  --------  -------  --------  -------
 0 | current    4200         3        1        10      0.1
 1 | savings    2300         2        0        20      0.2
 2 | fixex      1500         1        1        30      0.3
 3 | current    8000         5        0        40      0.4
 4 | savings    1200         1        1        50      0.5
 5 | fixed      6500         2        1        60      0.6
 6 | savings    4500         5        1        70      0.7
 7 | current    9010         4        0        80      0.8

[8 rows x 6 columns]

至于第四步会发生什么，应用了以下逻辑：当我们评估一个字典的时候 DT[] 调用，我们将其简单地视为一个元素列表，列表中的每个项目由相应的键命名。如果一个 "项 "产生多个列，那么每个列都从键中得到相同的名称。现在，在这种情况下，每个 "项 "又是一个列表，我们没有任何特殊的规则来评估这种基元列表。因此，它们最终会扩展成一个列的列表，其中每个列都是一个常量。

您说得没错，最终结果看起来很反常，所以我们可能需要调整在 DT[] 的表达方式。

如何用包含列表中的值的字典来扩展pydatable的列？

问题描述投票：1回答：1

1个回答

最新问题

如何用包含列表中的值的字典来扩展pydatable的列？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1