我必须编写一个程序,将吸烟与肺癌风险相关联。为此,我有两个文件中的数据。我的代码正在计算同一行中给出的数据(例如:America,23.3和Spain,77.9和意大利(24.2,俄罗斯(60.8))如何修改我的代码,以便计算出相同国家/地区的数量,而只将一个国家/地区中的国家/地区排除在外(不应该计算德国,法国,中国,韩国,因为它们仅在一个文件中)非常感谢您的提前帮助:)
吸烟文件:
**国家,卷烟吸烟者百分比数据
美国,23.3
意大利,24.2
俄罗斯,23.7
法国,14.9
英格兰,17.9
西班牙,17
德国,21.7 *
第二个文件:
**每100000例肺癌病例
西班牙,77.9
俄罗斯,60.8
韩国,61.3
美国,73.3
中国,66.8
越南,64.5
意大利,43.9
*和我的代码:
'''
Reads the data from the provided file objects smoking_datafile
and cancer_datafile. Returns a list of the data read from each
in a tuple of the form (smoking_datafile, cancer_datafile).
'''
# init
smoking_data = []
cancer_data = []
empty_str = ''
# read past file headers
smoking_datafile.readline()
cancer_datafile.readline()
# read data files
eof = False
while not eof:
# read line of data from each file
s_line = smoking_datafile.readline()
c_line = cancer_datafile.readline()
# check if at end-of-file of both files
if s_line == empty_str and c_line == empty_str:
eof = True
# check if end of smoking data file only
elif s_line == empty_str:
raise OSError('Unexpected end-of-file for smoking data file')
# check if at end of cancer data file only
elif c_line == empty_str:
raise OSError('Unexpected end-of-file for cancer data file')
# append line of data to each list
else:
smoking_data.append(s_line.strip().split(','))
cancer_data.append(c_line.strip().split(','))
# return list of data from each file
return (smoking_data, cancer_data)
def calculateCorrelation(smoking_data, cancer_data):
'''
Calculates and returns the correlation value for the data
provided in lists smoking_data and cancer_data
'''
# init
sum_smoking_vals = sum_cancer_vals = 0
sum_smoking_sqrd = sum_cancer_sqrd = 0
sum_products = 0
# calculate intermediate correlation values
num_values = len(smoking_data)
for k in range(0,num_values):
sum_smoking_vals = sum_smoking_vals + float(smoking_data[k][1])
sum_cancer_vals = sum_cancer_vals + float(cancer_data[k][1])
sum_smoking_sqrd = sum_smoking_sqrd + \
float(smoking_data[k][1]) ** 2
sum_cancer_sqrd = sum_cancer_sqrd + \
float(cancer_data[k][1]) ** 2
sum_products = sum_products + float(smoking_data[k][1]) * \
float(cancer_data[k][1])
# calculate and display correlation value
numer = (num_values * sum_products) - \
(sum_smoking_vals * sum_cancer_vals)
denom = math.sqrt(abs( \
((num_values * sum_smoking_sqrd) - (sum_smoking_vals ** 2)) * \
((num_values * sum_cancer_sqrd) - (sum_cancer_vals ** 2)) \
))
return numer / denom```
My code is computing the data given in the same lines (eg:America,23.3 Spain,77.9
Italy,24.2 with Russia,60.8)
How to modify my code so that it computes the numbers of the same countries and leaves out the countries that occur only in one file (it shouldn't compute Germany, France, China, Korea because they are only in one file)
Thank you so much for your help in advance:)
我们只专注于将数据转换为易于使用的格式。下面的代码将为您提供形式为...
的字典smokers_cancer_data = {
'America': {
'smokers': '23.3',
'cancer': '73.3'
},
'Italy': {
'smokers': '24.2',
'cancer': '43.9'
},
...
}
一旦有了这个,您就可以获取所需的任何值并执行计算。请参见下面的代码。
def read_data(filename: str) -> dict:
with open(filename, 'r') as file:
next(file) # Skip the header
data = dict();
for line in file:
cleaned_line = line.rstrip()
# Skip blank lines
if cleaned_line:
data_item = (cleaned_line.split(','))
data[data_item[0]] = float(data_item[1])
return data
# Load data into python dictionaries
smokers_data = read_data('smokersData.txt')
cancer_data = read_data('lungCancerData.txt')
# Build one dictionary that is easy to work with
smokers_cancer_data = dict()
for (key, value) in smokers_data.items():
if key in cancer_data:
smokers_cancer_data[key] = {
'smokers': smokers_data[key],
'cancer' : cancer_data[key]
}
print(smokers_cancer_data)
例如,如果要计算吸烟者和癌症值的总和。
smokers_total = 0
cancer_total = 0
for (key, value) in smokers_cancer_data.items():
smokers_total += value['smokers']
cancer_total += value['cancer']
这将返回所有具有数据的国家/地区以及数据的列表:
l3 = []
with open('smoking.txt','r') as f1, open('cancer.txt','r') as f2:
l1, l2 = f1.readlines(), f2.readlines()
for s1 in l1:
for s2 in l2:
if s1.split(',')[0] == s2.split(',')[0]:
cty = s1.split(',')[0]
smk = s1.split(',')[1].strip()
cnr = s2.split(',')[1].strip()
l3.append(f"{cty}: smoking: {smk}, cancer: {cnr}")
print(l3)
输出:
['Spain: smoking: 77.9, cancer: 17', 'Russia: smoking: 60.8, cancer: 23.7', 'America: smoking: 73.3, cancer: 23.3', 'Italy: smoking: 43.9, cancer24.2']