将多个单列excel文件与特定的嵌套列表/元组进行比较

Question

我正在征求一些意见。我有一个包含30个嵌套元素的元组（从json响应转换）以这种格式：

[('Group_1',['xyz123','str123','834hsj','nmp001','888tyu','abc123']),...('Group_30' ,['aaaa', 'bbb', 'fff'])

我有5个以元组中5个相应组命名的excel文件，其行如下：

Excel xls文件1：名称：Group_1内容：

Column: A
Row1: Group_1
Row2: xyz123
Row3: str123
Row4: 834hsj
Row5: nmp001
Row6: 888tyu
Row7: abc123

Excel xls文件2：名称：Group_2内容：

Row1：Group_2

直到Group_5等

目的是比较元组和excel文件中元素之间的组匹配值，以使元组加嵌套列表中的Group_1至Group_5与excel的内容及其列内容匹配。如果相应的组中有差异，请列出缺少或突出的字符串及其位置。

您是否建议将excel文件（大小均为1列，长度不同，长度各不相同）作为单独的数据框导入到panda中，并将元组分解为单独的列表，然后再将其分为熊猫数据框？或将excel导入数据框中，然后转换为列表（每组1个）以与元组（将其分成组列表）进行比较。

谢谢

Answer 1

最简单的方法是循环读取每个文件，将每个列表变成一组，和get wild.：

假定您的元组列表在列表groups中：

groups

[('Group_1',['xyz123','str123','834hsj','nmp001','888tyu','abc123']),
 ('Group_30' ,['aaaa', 'bbb', 'fff'])]

并且您具有使用如下组名命名的文件：

Group_1.xls
Group_30.xls

首先，读入XLS，跳过第一行（即'A'，并将第二行设置为列名（即'Group_1'）。

for group in groups:
    df = pd.read_excel(group[0] + '.xls', header=0, skiprows=[0])

应该看起来像这样：

df

  Group_1
0  xyz123
1  str123
2  834hsj
3  nmp001
4  888tyu
5  abc123

然后，我们将文件和列表中的元素转换为集合并输出结果：

for group in groups:
    df = pd.read_excel(group[0] + '.xls', header=0, skiprows=[0])

    file_set = set(df[group[0]].to_list())
    tup_set = set(group[1])

    print()
    print("In file and in tuple")
    print(file_set.intersection(tup_set))
    print("In file, but not in tuple")
    print(file_set.difference(tup_set))
    print("In tuple, but not in file")
    print(tup_set.difference(file_set))

您应该获得这样的输出：

In file and in tuple
{'nmp001', '834hsj', '888tyu', 'str123', 'abc123', 'xyz123'}
In file, but not in tuple
set()
In tuple, but not in file
set()

In file and in tuple
set()
In file, but not in tuple
{'nmp001', '834hsj', '888tyu', 'str123', 'abc123', 'xyz123'}
In tuple, but not in file
{'bbb', 'fff', 'aaaa'}

PS。 set()是空集。

将多个单列excel文件与特定的嵌套列表/元组进行比较

问题描述投票：1回答：1

1个回答

最新问题

将多个单列excel文件与特定的嵌套列表/元组进行比较

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1