尝试在 Python 中创建图表时如何解决“行中元素不足”错误?

问题描述 投票:0回答:1

我的任务是:

3 数据表示:

编写一个函数,代表一个国家或一组国家中最富有和最贫穷的 10% 人口的收入份额。代表性时期为 1960 年至 2022 年。 注意:如果使用方括号,可以将单个国家封装为列表。为图表添加信息标签并将其保存为 PNG 文件以在幻灯片中使用。

4 公平分配:

查明是否存在收入份额几乎均匀分配的国家。也就是说,10%的人口也应该拥有10%的收入份额。这个条件很难精确满足。考虑向上/向下的偏差多少是可以接受的。

5 前 10 个不平等:

计算所有国家最富有和最贫穷的 10% 人口的平均收入份额。哪十个国家的分配特别不公平?两个前 10 名名单中都有哪些国家/地区?

我现在的问题是图表中没有显示任何数据,并且每个国家/地区都会出现错误“行中元素不足。”

这是我现在的代码:

import csv
import matplotlib.pyplot as plt

def read_income_shares(file_name, wealthiest_file_name):
    income_shares = {}
    countries = []
    years = []  # Initialize the years list

    # Read data from the first file
    try:
        with open(file_name, 'r', encoding='utf-8') as file:
            reader = csv.reader(file)

            # Skip the first 4 rows
            for _ in range(4):
                next(reader)

            # Read the header to get the years
            header = next(reader)
            years = [int(year.strip('"')) for year in header[4:] if year.strip('"').isdigit()]  # Extract valid years

            for i, line in enumerate(reader, start=5):
                try:
                    country_name = line[0].strip('"')

                    # Replace """" with 0, then remove double quotes and extract the value from each line
                    values = []
                    for val in line[4:]:
                        val = val.replace('""""', '0').replace('"', '').strip()
                        if val and val.replace('.', '').isdigit():
                            values.append(float(val))
                        else:
                            values.append(0)

                    income_shares.setdefault(country_name, {}).update({'Values': values})
                    if country_name not in countries:
                        countries.append(country_name)
                except Exception as e:
                    print(f"Error in line {i}: {e}")
                    print(f"Line content: {line}")
    except FileNotFoundError:
        print(f"Error: File '{file_name}' not found.")
    except Exception as e:
        print(f"Error: An unexpected error occurred: {e}")

    # Read data from the second file
    try:
        with open(wealthiest_file_name, 'r', encoding='utf-8') as file:
            reader = csv.reader(file)

            # Skip the first 4 rows
            for _ in range(4):
                next(reader)

            for i, line in enumerate(reader, start=5):
                try:
                    country_name = line[0].strip('"')

                    # Replace """" with 0, then remove double quotes and extract the value from each line
                    values = []
                    for val in line[4:]:
                        val = val.replace('""""', '0').replace('"', '').strip()
                        if val and val.replace('.', '').isdigit():
                            values.append(float(val))
                        else:
                            values.append(0)

                    # Update the existing data or add new data for the country
                    if country_name in income_shares:
                        income_shares[country_name].setdefault('Values_Wealthiest', []).extend(values)
                    else:
                        income_shares.setdefault(country_name, {}).update({'Values_Wealthiest': values})
                        if country_name not in countries:
                            countries.append(country_name)
                except Exception as e:
                    print(f"Error in line {i}: {e}")
                    print(f"Line content: {line}")
    except FileNotFoundError:
        print(f"Error: File '{wealthiest_file_name}' not found.")
    except Exception as e:
        print(f"Error: An unexpected error occurred: {e}")

    return income_shares, countries, years

def plot_income_distribution(countries):
    income_data, _, _ = read_income_shares('C:\\Users\\Fabian\\Desktop\\Python Ausarbeitung\\Bravo\\one.txt', 'C:\\Users\\Fabian\\Desktop\\Python Ausarbeitung\\Bravo\\two.txt')

    formatted_countries = []  # Collect formatted country names

    for country in countries:
        # Cleaning up the country name
        country_formatted = country.strip('" \ufeff')
        formatted_countries.append(country_formatted)  # Collect formatted country names

        # Check if data for the country is available
        if country_formatted in income_data:
            income_data_country = income_data[country_formatted]['Values']
            income_data_wealthiest = income_data[country_formatted].get('Values_Wealthiest', [])

            # Choose only Years from 1960 to 2022
            years_to_plot = list(range(1960, 2023))

            # Convert values to percentage
            income_data_country_percent = [val * 100 for val in income_data_country]
            income_data_wealthiest_percent = [val * 100 for val in income_data_wealthiest]

            # Filter out values equal to 0
            non_zero_years = [year for year, val in zip(years_to_plot, income_data_country_percent) if val > 0]
            non_zero_percentages = [val for val in income_data_country_percent if val > 0]
            non_zero_percentages_wealthiest = [val for val in income_data_wealthiest_percent if val > 0]

            # Print the data for debugging
            print(f"Years for {country_formatted}: {non_zero_years}")
            print(f"Total Percentages for {country_formatted}: {non_zero_percentages}")
            print(f"Wealthiest Percentages for {country_formatted}: {non_zero_percentages_wealthiest}")

            # Plot only if data is available for the year
            plt.plot(non_zero_years, non_zero_percentages, label=f'{country_formatted} - Total')
            plt.plot(non_zero_years, non_zero_percentages_wealthiest, label=f'{country_formatted} - Wealthiest 10%', linestyle='dashed')

    plt.title('Income Distribution Over Years')
    plt.xlabel('Year')
    plt.ylabel('Income Share (%)')
    plt.ylim(0, 100)  # Set the Y-axis to 0 to 100 percent
    plt.axis([1960, 2022, 0, 100])
    plt.grid(True)

    # Display legend only if data is present.
    if any(formatted_country in income_data for formatted_country in formatted_countries):
        plt.legend(loc='upper left', bbox_to_anchor=(1, 1))  # Move legend outside the plot area

    plt.savefig('income_distribution_plot.png', bbox_inches='tight')  # Save the plot as a PNG file
    plt.show()

# Example call
countries_to_plot = ['"Germany"']
plot_income_distribution(countries_to_plot)





A data Sample, starting in line 4, looks like this:

"国家名称,""国家代码"",""指标名称"",""指标代码"",""1960"",""1961"",""1962"",""1963"", ""1964""、""1965""、""1966""、""1967""、""1968""、""1969""、""1970""、""1971""、"" 1972 年""、""1973""、""1974""、""1975""、""1976""、""1977""、""1978""、""1979""、""1980" ",""1981"",""1982"",""1983"",""1984"",""1985"",""1986"",""1987"",""1988"", ""1989""、""1990""、""1991""、""1992""、""1993""、""1994""、""1995""、""1996""、"" 1997""、""1998""、""1999""、""2000""、""2001""、""2002""、""2003""、""2004""、""2005" ",""2006"",""2007"",""2008"",""2009"",""2010"",""2011"",""2012"",""2013"", ""2014"",""2015"",""2016"",""2017"",""2018"",""2019"",""2020"",""2021"","" 2022年“”,“

“德国、”“DEU””、“最高 10% 持有的收入份额”、“”SI.DST.10TH.10”、””””、””””、””””、” """、""""、""""、""""、""""、""""、""""、""""、""""、""""、" """、""""、""""、""""、""""、""""、""""、""""、""""、""""、" """、""""、""""、""""、""""、""""、""""、""""、""23.2""、""23.1"" ,""22.8"",""22.9"",""22.7"",""22.3"",""22.4"",""22.3"",""23.1"",""22.9""," "23.9""、""23.7""、""23.9""、""24""、""25.1""、""24.7""、""25.1""、""24.7""、""24 “”、“24”、“24.5”、“24.4”、“25”、“24.1”、“24.8”、“24.6”、“24.8” ,""25.2"",""25.2"","""","""","""","


Thanks for your help!!!

I already try `strip()` and even `del` the data without any Number, but this doesn't seems to work. For now i am having the issue that no Data is shown in the figure. 
python spyder
1个回答
0
投票

我可以(部分)重现并修复。

首先,您的数据文件已损坏,无法由 csv 模块以当前格式处理。 csv 模块非常适合处理复杂数据只要它们尊重 csv 规则,但这里的引号没有正确平衡。因此,每一行都被视为一个单引号字段,这不是您所期望的。 正确的方法是修复数据文件,但作为解决方法,您可以要求 csv 模块忽略任何引号并将它们从数据字段中删除,这是您的代码已经执行的操作。只需使用

quoting=csv.QUOTE_NONE

打开阅读器(对于这两个文件...):

...
            reader = csv.reader(file, quoting=csv.QUOTE_NONE)
...

这应该足以正确获取每行的预期字段数。

但是你有第二个问题:文件包含 0-100 范围内的数据,但你将它们乘以 100。结果你的数据在 0-10000 范围内并绘制在图之外......

作为解决方法,您可以使用:

# Convert values to percentage income_data_country_percent = [val for val in income_data_country] income_data_wealthiest_percent = [val for val in income_data_wealthiest]

或者直接对原始值进行处理。

但是经过这两种解决方法后,我可以获得一个情节。

您应该从中学到什么:您的代码已经有一些调试打印。如果您添加了更多内容,特别是如果您打印了标题行,您会立即明白您只有一个字段 - 这就是我所做的...

© www.soinside.com 2019 - 2024. All rights reserved.