我的任务是:
3 数据表示:
编写一个函数,代表一个国家或一组国家中最富有和最贫穷的 10% 人口的收入份额。代表性时期为 1960 年至 2022 年。 注意:如果使用方括号,可以将单个国家封装为列表。为图表添加信息标签并将其保存为 PNG 文件以在幻灯片中使用。
4 公平分配:
查明是否存在收入份额几乎均匀分配的国家。也就是说,10%的人口也应该拥有10%的收入份额。这个条件很难精确满足。考虑向上/向下的偏差多少是可以接受的。
5 前 10 个不平等:
计算所有国家最富有和最贫穷的 10% 人口的平均收入份额。哪十个国家的分配特别不公平?两个前 10 名名单中都有哪些国家/地区?
我现在的问题是图表中没有显示任何数据,并且每个国家/地区都会出现错误“行中元素不足。”
这是我现在的代码:
import csv
import matplotlib.pyplot as plt
def read_income_shares(file_name, wealthiest_file_name):
income_shares = {}
countries = []
years = [] # Initialize the years list
# Read data from the first file
try:
with open(file_name, 'r', encoding='utf-8') as file:
reader = csv.reader(file)
# Skip the first 4 rows
for _ in range(4):
next(reader)
# Read the header to get the years
header = next(reader)
years = [int(year.strip('"')) for year in header[4:] if year.strip('"').isdigit()] # Extract valid years
for i, line in enumerate(reader, start=5):
try:
country_name = line[0].strip('"')
# Replace """" with 0, then remove double quotes and extract the value from each line
values = []
for val in line[4:]:
val = val.replace('""""', '0').replace('"', '').strip()
if val and val.replace('.', '').isdigit():
values.append(float(val))
else:
values.append(0)
income_shares.setdefault(country_name, {}).update({'Values': values})
if country_name not in countries:
countries.append(country_name)
except Exception as e:
print(f"Error in line {i}: {e}")
print(f"Line content: {line}")
except FileNotFoundError:
print(f"Error: File '{file_name}' not found.")
except Exception as e:
print(f"Error: An unexpected error occurred: {e}")
# Read data from the second file
try:
with open(wealthiest_file_name, 'r', encoding='utf-8') as file:
reader = csv.reader(file)
# Skip the first 4 rows
for _ in range(4):
next(reader)
for i, line in enumerate(reader, start=5):
try:
country_name = line[0].strip('"')
# Replace """" with 0, then remove double quotes and extract the value from each line
values = []
for val in line[4:]:
val = val.replace('""""', '0').replace('"', '').strip()
if val and val.replace('.', '').isdigit():
values.append(float(val))
else:
values.append(0)
# Update the existing data or add new data for the country
if country_name in income_shares:
income_shares[country_name].setdefault('Values_Wealthiest', []).extend(values)
else:
income_shares.setdefault(country_name, {}).update({'Values_Wealthiest': values})
if country_name not in countries:
countries.append(country_name)
except Exception as e:
print(f"Error in line {i}: {e}")
print(f"Line content: {line}")
except FileNotFoundError:
print(f"Error: File '{wealthiest_file_name}' not found.")
except Exception as e:
print(f"Error: An unexpected error occurred: {e}")
return income_shares, countries, years
def plot_income_distribution(countries):
income_data, _, _ = read_income_shares('C:\\Users\\Fabian\\Desktop\\Python Ausarbeitung\\Bravo\\one.txt', 'C:\\Users\\Fabian\\Desktop\\Python Ausarbeitung\\Bravo\\two.txt')
formatted_countries = [] # Collect formatted country names
for country in countries:
# Cleaning up the country name
country_formatted = country.strip('" \ufeff')
formatted_countries.append(country_formatted) # Collect formatted country names
# Check if data for the country is available
if country_formatted in income_data:
income_data_country = income_data[country_formatted]['Values']
income_data_wealthiest = income_data[country_formatted].get('Values_Wealthiest', [])
# Choose only Years from 1960 to 2022
years_to_plot = list(range(1960, 2023))
# Convert values to percentage
income_data_country_percent = [val * 100 for val in income_data_country]
income_data_wealthiest_percent = [val * 100 for val in income_data_wealthiest]
# Filter out values equal to 0
non_zero_years = [year for year, val in zip(years_to_plot, income_data_country_percent) if val > 0]
non_zero_percentages = [val for val in income_data_country_percent if val > 0]
non_zero_percentages_wealthiest = [val for val in income_data_wealthiest_percent if val > 0]
# Print the data for debugging
print(f"Years for {country_formatted}: {non_zero_years}")
print(f"Total Percentages for {country_formatted}: {non_zero_percentages}")
print(f"Wealthiest Percentages for {country_formatted}: {non_zero_percentages_wealthiest}")
# Plot only if data is available for the year
plt.plot(non_zero_years, non_zero_percentages, label=f'{country_formatted} - Total')
plt.plot(non_zero_years, non_zero_percentages_wealthiest, label=f'{country_formatted} - Wealthiest 10%', linestyle='dashed')
plt.title('Income Distribution Over Years')
plt.xlabel('Year')
plt.ylabel('Income Share (%)')
plt.ylim(0, 100) # Set the Y-axis to 0 to 100 percent
plt.axis([1960, 2022, 0, 100])
plt.grid(True)
# Display legend only if data is present.
if any(formatted_country in income_data for formatted_country in formatted_countries):
plt.legend(loc='upper left', bbox_to_anchor=(1, 1)) # Move legend outside the plot area
plt.savefig('income_distribution_plot.png', bbox_inches='tight') # Save the plot as a PNG file
plt.show()
# Example call
countries_to_plot = ['"Germany"']
plot_income_distribution(countries_to_plot)
A data Sample, starting in line 4, looks like this:
"国家名称,""国家代码"",""指标名称"",""指标代码"",""1960"",""1961"",""1962"",""1963"", ""1964""、""1965""、""1966""、""1967""、""1968""、""1969""、""1970""、""1971""、"" 1972 年""、""1973""、""1974""、""1975""、""1976""、""1977""、""1978""、""1979""、""1980" ",""1981"",""1982"",""1983"",""1984"",""1985"",""1986"",""1987"",""1988"", ""1989""、""1990""、""1991""、""1992""、""1993""、""1994""、""1995""、""1996""、"" 1997""、""1998""、""1999""、""2000""、""2001""、""2002""、""2003""、""2004""、""2005" ",""2006"",""2007"",""2008"",""2009"",""2010"",""2011"",""2012"",""2013"", ""2014"",""2015"",""2016"",""2017"",""2018"",""2019"",""2020"",""2021"","" 2022年“”,“
“德国、”“DEU””、“最高 10% 持有的收入份额”、“”SI.DST.10TH.10”、””””、””””、””””、” """、""""、""""、""""、""""、""""、""""、""""、""""、""""、" """、""""、""""、""""、""""、""""、""""、""""、""""、""""、" """、""""、""""、""""、""""、""""、""""、""""、""23.2""、""23.1"" ,""22.8"",""22.9"",""22.7"",""22.3"",""22.4"",""22.3"",""23.1"",""22.9""," "23.9""、""23.7""、""23.9""、""24""、""25.1""、""24.7""、""25.1""、""24.7""、""24 “”、“24”、“24.5”、“24.4”、“25”、“24.1”、“24.8”、“24.6”、“24.8” ,""25.2"",""25.2"","""","""","""","
Thanks for your help!!!
I already try `strip()` and even `del` the data without any Number, but this doesn't seems to work. For now i am having the issue that no Data is shown in the figure.
我可以(部分)重现并修复。
首先,您的数据文件已损坏,无法由 csv 模块以当前格式处理。 csv 模块非常适合处理复杂数据只要它们尊重 csv 规则,但这里的引号没有正确平衡。因此,每一行都被视为一个单引号字段,这不是您所期望的。 正确的方法是修复数据文件,但作为解决方法,您可以要求 csv 模块忽略任何引号并将它们从数据字段中删除,这是您的代码已经执行的操作。只需使用
quoting=csv.QUOTE_NONE
打开阅读器(对于这两个文件...):
...
reader = csv.reader(file, quoting=csv.QUOTE_NONE)
...
这应该足以正确获取每行的预期字段数。
但是你有第二个问题:文件包含 0-100 范围内的数据,但你将它们乘以 100。结果你的数据在 0-10000 范围内并绘制在图之外......
作为解决方法,您可以使用:
# Convert values to percentage
income_data_country_percent = [val for val in income_data_country]
income_data_wealthiest_percent = [val for val in income_data_wealthiest]
或者直接对原始值进行处理。
但是经过这两种解决方法后,我可以获得一个情节。
您应该从中学到什么:您的代码已经有一些调试打印。如果您添加了更多内容,特别是如果您打印了标题行,您会立即明白您只有一个字段 - 这就是我所做的...