核苷酸比较程序卡方部分的KeyError问题

问题描述 投票:0回答:0

前言:我添加的照片不是真实的患者数据,只是测试示例

我正在创建一个生物信息学程序,它将获取患者核苷酸调用的结果,并将其与全球人口统计参考库进行比较,以标记任何显着罕见的变异。

以下是数据的示例照片: PatientData PopulationData

我尝试使用 scipy 和 numpy 添加卡方检验。除此部分外,代码中的所有内容均有效。我对自己做错了什么感到难过。

#This section will create a chi test for flagging significance in changes
rsIDs = patient_data['NCBI SNP Reference'].unique()
for rsID in rsIDs:  # loop through each rsID and compare the patient's nucleotides to the population reference
    pat_counts = patient_data.loc[patient_data['NCBI SNP Reference'] == rsID, 'Call'].value_counts()
    ref_counts = pop_ref_data[rsID]
    exp_counts = np.array([ref_counts.get(nuc, 0) for nuc in pat_counts.index])
    obs_counts = np.array(pat_counts)

    # calculate the chi-squared statistic and p-value
    chi2, p, _, _ = chi2_contingency([obs_counts, exp_counts])

    # check if the p-value is less than a specified threshold (e.g. 0.05)
    if p < 0.05:
        print(f"Significant difference for rsID {rsID} (p={p:.3f})")

这是给我带来麻烦的代码部分。如果需要,我可以发布其余的更大的代码。

这是错误代码:

Traceback (most recent call last):
  File "C:\Users\xx\PycharmProjects\WorkStuff\venv\lib\site-packages\pandas\core\indexes\base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'hCV32407240'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\xx\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\Flag_Process--2.13.py", line 74, in <module>
    ref_counts = pop_ref_data[rsID]
  File "C:\Users\xx\PycharmProjects\WorkStuff\venv\lib\site-packages\pandas\core\frame.py", line 3807, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\xx\PycharmProjects\WorkStuff\venv\lib\site-packages\pandas\core\indexes\base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 'hCV32407240'
python numpy scipy statistics bioinformatics
© www.soinside.com 2019 - 2024. All rights reserved.