目前,我已经创建了一个二分网络图,该图将疾病映射为症状。因此,一种疾病可能与一种或多种症状有关。另外,我有一些基本的统计数据,例如至少有一种疾病的症状等。
import networkx as nx
csv_dictionary = {"Da": ["A", "C"], "Db": ["B"], "Dc": ["A", "C", "F"], "Dd": ["D"], "De": ["E", "B"], "Df":["F"], "Dg":["F"], "Dh":["F"]}
G = nx.Graph()
all_symptoms = set()
for disorder, symptoms in csv_dictionary.items():
for i in range (0, len(symptoms)):
G.add_edge(disorder, symptoms[i])
all_symptoms.add(symptoms[i])
symptoms_with_multiple_diseases = [symptom for symptom in all_symptoms if G.degree(symptom) > 1]
sorted_symptoms = list(sorted(symptoms_with_multiple_diseases, key= lambda symptom:
G.degree(symptom)))
我需要找到具有至少两个症状的疾病。因此,具有两个共同症状的疾病。我已经做过一些研究,我认为我应该根据边缘的连接方式为边缘增加权重,但我无法将其包裹住。
因此,在上面的示例中,Da和Dc共享两个症状(A和C)。
您可以遍历获得的2
的长度symptoms_with_multiple_diseases
组合,并更新找到每个组合的nx.common_neighbours
。
所以也要从跟踪所有疾病开始。
nx.common_neighbours
检查度数是否高于all_symptoms = set()
all_disorders = set()
for disorder, symptoms in csv_dictionary.items():
for i in range (0, len(symptoms)):
G.add_edge(disorder, symptoms[i])
all_symptoms.add(symptoms[i])
all_disorders.add(disorder)
:
2
然后遍历disorders_with_multiple_diseases = [symptom for symptom in all_disorders
if G.degree(symptom) > 1]
的所有2
组合:
all_dissorders
from itertools import combinations
common_symtpoms = dict()
for nodes in combinations(all_disorders, r=2):
cn = list(nx.common_neighbors(G, *nodes))
if len(cn)>1:
common_symtpoms[nodes] = list(cn)