如何多次拆分输入文本

问题描述 投票:0回答:2

我有一个输入文件,该文件的结构如下:年份用'-'分隔,'=',学生分隔,而表现则用'\ t'分隔。我的目标是解析输入文件以获取数字。当我最终确定数字时,我需要最后两个数字以百分比表示学生的表现。问题是,当我用连字符(例如,连字符)分割输入时,我会得到一个列表,但那时我不知道该怎么办,因为自从它现在成为列表以来,我无法再剥离它了。我将在此处输入输入文件,因为否则很难理解我的问题。

输入文件:https://filebin.net/z1x0oetddav3sh52

基本上,它是一长串名称,其执行时间以毫秒为单位,然后以百分比表示,如下所示:弗兰克·皮埃尔(Frank Pierre)1398 81.这是我要从列表中所有不同名称中检索的第二个数字,因为该数字代表一个百分比。

我已经能够通过使用for循环遍历输入文件中的所有项目,然后将它们附加到新列表(如果它们是整数)来检索数字,但是问题是我的解决方案使用了以下事实:最后的所有数字都小于或等于一百(因为它们是百分比),以便最终将它们从新列表中删除并将它们添加到新的百分比列表中。但是,我希望程序对具有相同结构的输入文件以更通用的方式工作。

想象一个具有相同结构的文件,但是在某些情况下,在学生姓名之后的第一个数字小于100。我的程序会将其识别为百分比,因为它小于100,但不是案子!仅第一个数字之后的第二个数字代表一个百分比。这就是为什么我认为最好解析输入文件,以便您将数字与其他所有内容分开,然后例如使用索引来检索第二个数字。我只是不知道如何执行此操作。

[如果有人知道如何完成此操作,那将是很好的。代码必须在python 2.7中,并且我不能使用任何外部模块,我必须自己定义函数。我只需要能够获得第二个数字的列表,这样我就可以使用它们对它们进行分析。

我目前有以下代码:

'with open("statistics_input.txt", "r") as input:
    information = input.read()
    splitted = information.split('-')
first = splitted[0]'

问题是,我现在最终得到一张包含6个不同索引的列表,每年一个,而我不知道如何进一步解析它。我首先将第一年作为变量,但是现在如何获取该年份的数字并每年重复该过程?

python input text structure
2个回答
0
投票

有许多不同的方法可以完成您尝试做的事情。但是,我有两个建议:

  • 除以'-'后,您将得到一个列表。但是,该列表中的条目都是字符串,如果您想在一年的记录中拆分为研究,则可以采用列表中的字符串之一,并在'='上拆分它。这将为您提供另一个列表,但再次输入的是字符串,可以适当地对其进行处理。
  • 要获得一行中的最后一个数字,您可以将该行拆分为空格(' '),并获取结果列表的最后一个元素。您可能需要知道这条线是一名学生(而不是一年或学习标记),但是听起来您可能已经知道了这一点。

0
投票

到目前为止,您已经确定了打开和阅读文件的位置,所以我跳过了那部分。假设您的文件内容是在变量text中读取的,则此代码:

data = {}
years = text.split('\n-')  # text -- is your source text

for y in years :
    year = y.split('\n') [0]
    subj = y.split('\n') [1:]

    data[year] = {}

    subject = 'none'
    for s in subj :
        if len(s) < 5 or s[0] == '=' :
            subject = s
            data[year][subject] = []
            continue
        name, result = s.split('\t')
        data[year][subject].append( (name, result) )

    print json.dumps( data, indent=4)

给出以下结果:

{
    "1999": {
        "I": [
            [
                "Willem Jan van Steen", 
                "9859 77"
            ], 
            [
                "Guillaume Kielmann", 
                "5264 77"
            ], 
            [
                "Guillaume Bos", 
                "8200 6"
            ], 
            [
                "Matty Klop", 
                "9066 42"
            ], 
            [
                "Atze Klop", 
                "3318 45"
            ], 
            [
                "Sven Kielmann", 
                "1160 63"
            ], 
            [
                "Wartie Hijma", 
                "1904 65"
            ], 
            [
                "Matty Evers", 
                "2516 100"
            ], 
            [
                "Matty Bos", 
                "2941 99"
            ], 
            [
                "Pieter van der Ploeg", 
                "8873 80"
            ], 
            [
                "Jan Willem van Zeist", 
                "3934 95"
            ], 
            [
                "Thilo van Steen", 
                "9665 61"
            ], 
            [
                "Wan van Raamsdonk", 
                "1771 86"
            ], 
            [
                "Henri Fokkink", 
                "7484 59"
            ], 
            [
                "Jan Willem Evers", 
                "9709 82"
            ]
        ], 
        "=AI": [
            [
                "Sven Swarttouw", 
                "2604 73"
            ], 
            [
                "Eline van Raamsdonk", 
                "9771 60"
            ], 
            [
                "Herbert van der Ploeg", 
                "9325 41"
            ], 
            [
                "Eline Hijma", 
                "430 23"
            ], 
            [
                "Pieter Hijma", 
                "8203 65"
            ], 
            [
                "Eline Silvis Cividjian", 
                "2700 79"
            ]
        ], 
        "=W": [
            [
                "Guillaume Zeggers", 
                "290 47"
            ], 
            [
                "Natalia van Raamsdonk", 
                "2751 55"
            ], 
            [
                "Wartie Zeggers", 
                "3079 92"
            ], 
            [
                "Atze Swarttouw", 
                "9474 30"
            ], 
            [
                "Rene Pierre", 
                "2125 62"
            ], 
            [
                "Pieter van Mantgem", 
                "3023 67"
            ], 
            [
                "Jan Willem Hijma", 
                "7441 86"
            ]
        ], 
        "=BWI": [
            [
                "Rene Zeggers", 
                "7679 8"
            ], 
            [
                "Matty van Mantgem", 
                "7431 44"
            ], 
            [
                "Sven van Raamsdonk", 
                "7248 46"
            ], 
            [
                "Eline Pierre", 
                "5731 86"
            ], 
            [
                "Maarten Kielmann", 
                "7162 59"
            ], 
            [
                "Atze Zeggers", 
                "7065 72"
            ], 
            [
                "Eline van Mantgem", 
                "830 78"
            ], 
            [
                "Natalia van Steen", 
                "6321 49"
            ], 
            [
                "Frank van Raamsdonk", 
                "1380 31"
            ], 
            [
                "Pieter Bos", 
                "9639 94"
            ], 
            [
                "Andy Zeggers", 
                "5232 78"
            ], 
            [
                "Andy van Raamsdonk", 
                "1256 69"
            ], 
            [
                "Eline Gude", 
                "4101 40"
            ], 
            [
                "Matty Fokkink", 
                "9839 89"
            ], 
            [
                "Natalia Hijma", 
                "203 11"
            ], 
            [
                "Henri Bos", 
                "6728 66"
            ], 
            [
                "Guillaume van der Ploeg", 
                "9998 48"
            ], 
            [
                "Jan Willem van Steen", 
                "760 79"
            ], 
            [
                "Matty Pierre", 
                "337 96"
            ], 
            [
                "Wan Gude", 
                "3811 39"
            ]
        ], 
        "=ECTR": [
            [
                "Frank Swarttouw", 
                "6484 49"
            ], 
            [
                "Wan Hijma", 
                "9845 36"
            ], 
            [
                "Herbert Silvis Cividjian", 
                "1544 84"
            ], 
            [
                "Natalia Kielmann", 
                "646 21"
            ]
        ]
    }, 
    "2002": {
        "I": [
            [
                "Eline van Steen", 
                "7817 11"
            ], 
            [
                "Andy van Steen", 
                "9212 51"
            ], 
            [
                "Frank van Zeist", 
                "233 27"
            ], 
            [
                "Rene Swarttouw", 
                "5695 68"
            ], 
            [
                "Wan Bos", 
                "7039 29"
            ], 
            [
                "Eline van der Ploeg", 
                "4410 99"
            ], 
            [
                "Wartie van der Ploeg", 
                "2526 20"
            ], 
            [
                "Sven Bos", 
                "4694 98"
            ], 
            [
                "Wartie Swarttouw", 
                "5371 70"
            ], 
            [
                "Thilo van Zeist", 
                "10009 77"
            ], 
            [
                "Guillaume Fokkink", 
                "4125 86"
            ], 
            [
                "Atze Bos", 
                "4227 97"
            ], 
            [
                "Pieter Silvis Cividjian", 
                "9491 15"
            ], 
            [
                "Sven Evers", 
                "6994 41"
            ]
        ], 
        "=AI": [
            [
                "Matty van Steen", 
                "9702 40"
            ], 
            [
                "Thilo Silvis Cividjian", 
                "5553 42"
            ], 
            [
                "Herbert van Raamsdonk", 
                "6867 90"
            ], 
            [
                "Wartie Evers", 
                "2086 81"
            ], 
            [
                "Jan Willem Bos", 
                "1566 92"
            ], 
            [
                "Maarten van Mantgem", 
                "8960 92"
            ], 
            [
                "Sven van Zeist", 
                "8629 74"
            ], 
            [
                "Matty van Raamsdonk", 
                "496 41"
            ], 
            [
                "Willem Jan Evers", 
                "1853 11"
            ], 
            [
                "Guillaume van Zeist", 
                "9729 62"
            ], 
            [
                "Maarten Klop", 
                "8653 74"
            ], 
            [
                "Henri van der Ploeg", 
                "6755 39"
            ]
        ], 
        "=W": [
            [
                "Herbert Kielmann", 
                "2135 99"
            ], 
            [
                "Andy van Mantgem", 
                "8033 49"
            ], 
            [
                "Guillaume Gude", 
                "5356 52"
            ], 
            [
                "Herbert Bos", 
                "1435 47"
            ], 
            [
                "Pieter Gude", 
                "9460 36"
            ], 
            [
                "Jan Willem van der Ploeg", 
                "8403 25"
            ], 
            [
                "Wan van Mantgem", 
                "9672 68"
            ]
        ], 

这里是您打印姓名和分数的方式:

for subject in data.values() :
    for student in subject.values() :
        print student[0], student[1].split()[1]  # only the last number
© www.soinside.com 2019 - 2024. All rights reserved.