如何将列表转换为具有特定规则的数据框?

问题描述 投票:2回答:5

我有一个列表lst,我想将其转换为pandas dataframe对象。将选择具有字符:的元素作为列名。所选元素之后的元素将是值。

lst = ['k1:', 'a1', 'a2', 'a3', 'k2:', 'b1', 'k3:', 'c1', 'c2', 'c3', 'k4:', 'd1']

我想要这个:

              k1    k2            k3    k4
0   [a1, a2, a3]    b1  [c1, c2, c3]    d1

帮助将不胜感激!

python pandas
5个回答
1
投票

这是使用list comprehensionpandasnumpy的矢量化解决方案:

# Split the list to values and columns
cols = [x[:2] for x in lst if ":" in x]
vals = [x for x in lst if ":" not in x]

print(cols)
print(vals)

['k1', 'k2', 'k3', 'k4']
['a1', 'a2', 'a3', 'b1', 'c1', 'c2', 'c3', 'd1']

从列表中创建数据框

s = pd.DataFrame(vals, columns=['values'])
s['letter'] = s['values'].str.slice(stop=1)
s = pd.DataFrame(s.groupby('letter')['values'].apply(list).reset_index(drop=True))
df = pd.DataFrame(s.to_numpy().reshape(1,4), columns=cols, index=[0])

print(df)
             k1    k2            k3    k4
0  [a1, a2, a3]  [b1]  [c1, c2, c3]  [d1]

1
投票

使用collections.defaultdictfor循环来重组lst

from collections import defaultdict

d = defaultdict(list)

for i in lst:
    if ':' in i:
        current_key = i
    else:
        d[current_key].append(i)

df = pd.DataFrame([d.values()], columns=d.keys())

[OUT]

            k1:   k2:           k3:   k4:
0  [a1, a2, a3]  [b1]  [c1, c2, c3]  [d1]

0
投票

示例代码:

我首先在值":"中基于e.g [['k1:', 'a1', 'a2', 'a3'],['k3:', 'c1', 'c2', 'c3'],....]拆分列表,然后创建一个字典第一个值作为键,其余的值作为列表e.g {'k1':['a1', 'a2', 'a3'],....}中的值。使用字典创建数据框。

import pandas as pd

lst = ['k1:', 'a1', 'a2', 'a3', 'k2:', 'b1', 'k3:', 'c1', 'c2', 'c3', 'k4:', 'd1']

#----- SPlit list based on ":" in values ----#
def group(seq, sep):
    g = []
    for el in seq:
        if sep in str(el):
            yield g
            g = []
        g.append(el)
    yield g

result = list(group(lst, ':'))

Data = {}
for l in result:
    if len(l):
        key = l[0]
        values = l[1:]
        Data[key] = [values]

DF = pd.DataFrame.from_dict(Data)
print(DF)

输出:

            k1:   k2:           k3:   k4:
0  [a1, a2, a3]   [b1]   [c1, c2, c3]   [d1]

0
投票

这是另一种方法:

lst = ['k1: ', 'a1', 'a2', 'a3', 'k2:', 'b1', 'k3:', 'c1', 'c2', 'c3', 'k4:', 'd1']
ret_dict = {}
last_key = None

for key in lst:
    pos = key.find(':')
    if pos > -1:
        last_key = key[:pos]
        ret_dict[last_key] = [[]]
    else:
        ret_dict[last_key][0].append(key)

pd.DataFrame(ret_dict)

0
投票
d = {}
temp = []

for i in lst:
    if ':' in i:
        if temp:
            d[h] = str(temp)
            temp = []
        h = i.split(':')[0]    
    else:
        temp.append(i)
d[h] = temp    

pd.DataFrame(d, index=[0])

产量

                   k1      k2                  k3  k4
0  ['a1', 'a2', 'a3']  ['b1']  ['c1', 'c2', 'c3']  d1
© www.soinside.com 2019 - 2024. All rights reserved.