names=[['Pat','Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values=[[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]
我有 2 个列表:
names
和 values
。每个值都有一个名称,即 Pat
对应于值 1
和 Sam
对应于值 9
.
我想从
nan
中删除names
和values
中的相应值。
也就是说,我想要一个看起来像这样的
new_names
列表:
[['Pat','Sam', 'Tom', ''], ["Angela", "James", ".", "Jackie"]]
和一个
new_values
列表,看起来像这样:
[[1, 9, 2, 1], [1, 1, 5, 10]]
我的尝试是首先找到这些
nan
条目的索引:
all_nan_idx = []
for idx, name in enumerate(names):
if pd.isnull(name):
all_nan_idx.append(idx)
但是,上面没有考虑嵌套列表。
就这个?
import numpy as np
import pandas as pd
names=[['Pat','Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values=[[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]
new_names = []
new_values = []
for names_, values_ in zip(names, values):
n = []
v = []
for name, value in zip(names_, values_):
if not pd.isnull(name):
n.append(name)
v.append(value)
new_names.append(n)
new_values.append(v)
可能有一个难以理解的理解可以做到这一点,但这里有一个循序渐进的方法:
import numpy as np
names = [
['Pat', 'Sam', np.nan, 'Tom', ''],
["Angela", np.nan, "James", ".", "Jackie"]
]
values = [
[1, 9, 1, 2, 1],
[1, 3, 1, 5, 10]
]
new_names = []
new_values = []
for nn, vv in zip(names, values):
new_names.append([])
new_values.append([])
for n, v in zip(nn, vv):
if not n is np.nan:
new_names[-1].append(n)
new_values[-1].append(v)
print(new_names)
print(new_values)
输出:
[['Pat', 'Sam', 'Tom', ''], ['Angela', 'James', '.', 'Jackie']]
[[1, 9, 2, 1], [1, 1, 5, 10]]
使用递归函数:
import numpy as np
def filter_nan(names, values):
new_names, new_values = [], []
for name, value in zip(names, values, strict=True):
if name is np.nan:
continue
if isinstance(name, list) and isinstance(value, list):
name, value = filter_nan(name, value)
new_names.append(name)
new_values.append(value)
return new_names, new_values
试试看:
names = [['Pat', 'Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values = [[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]
print(filter_nan(names, values))
'''
(
[['Pat', 'Sam', 'Tom', ''], ['Angela', 'James', '.', 'Jackie']],
[[1, 9, 2, 1], [1, 1, 5, 10]]
)
'''
也许有点太多了,但这是另一种选择:
import numpy as np
names = [['Pat', 'Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values = [[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]
new_names = []
new_values = []
for aux_list in zip(names, values):
filtered_names, filtered_values = zip(*filter(lambda x: x[0] is not np.nan, zip(*aux_list)))
new_names.append(list(filtered_names))
new_values.append(list(filtered_values))
这是处理此类情况的更好、更简单的方法
import numpy as np
names=[['Pat','Sam', np.nan, 'Tom', ''], ["Angela", np.nan, "James", ".", "Jackie"]]
values=[[1, 9, 1, 2, 1], [1, 3, 1, 5, 10]]
new_names = []
new_values = []
for i in range(len(names)):
new_names.append([])
new_values.append([])
for j in range(len(names[i])):
if not isinstance(names[i][j], float):
new_names[i].append(names[i][j])
new_values[i].append(values[i][j])
print(new_names)
print(new_values)
这是一个使用pandas的解决方案:
import pandas as pd
result = []
for n, v in zip(names, values):
n = pd.Series(n).dropna()
result.append((n.tolist(), pd.Series(v).loc[n.index].tolist()))
names, values = map(list, zip(*result))
您也可以使用单行代码(如果您使用的是 Python >= 3.8):
import pandas as pd
names, values = map(list, zip(*(
((s := pd.Series(n).dropna()).tolist(), pd.Series(v).loc[s.index].tolist())
for n, v in zip(names, values)
)))
为了在一个语句中有效地做到这一点,您可以将输入列表转置为名称-值对序列,以便您可以使用生成器表达式过滤掉空名称,然后将它们转置回两个列表:
new_names, new_values = map(list, zip(*(
map(list, zip(*(
(name, value)
for name, value in zip(*pairs)
if not pd.isnull(name)
)))
for pairs in zip(names, values)
)))
演示:https://replit.com/@blhsing/EnormousHarshFreesoftware#main.py