与给定条件匹配的索引的返回范围

问题描述 投票:0回答:2

我有大数据文件,其格式如下:

    1 M * 0.86
    2 S * 0.81
    3 M * 0.68
    4 S * 0.53
    5 T . 0.40
    6 S . 0.34
    7 T . 0.25
    8 E . 0.36
    9 V . 0.32
   10 I . 0.26
   11 A . 0.17
   12 H . 0.15
   13 H . 0.12
   14 W . 0.14
   15 A . 0.16
   16 F . 0.13
   17 A . 0.12
   18 I . 0.12
   19 F . 0.22
   20 L . 0.44
   21 I * 0.68
   22 V * 0.79
   23 A * 0.88
   24 I * 0.88
   25 G * 0.89
   26 L * 0.88
   27 C * 0.81
   28 C * 0.82
   29 L * 0.79
   30 M * 0.80
   31 L * 0.74
   32 V * 0.72
   33 G * 0.62

我试图弄清楚怎么做是遍历文件中的每一行,如果该行包含星号,则开始查找满足此条件的后续范围。另外,最好在文件中输出最大范围。

因此,在此示例中,所需的输出看起来像:

1-4,21-33 13

感谢您的协助!

python loops conditional-statements
2个回答
0
投票

有几种方法可以执行此操作。

一种解决方案是逐行读取文件。我建议您看一下关于如何读取文件的非常好的tutorial

一旦完成,您可以尝试以下操作:

  • 遍历文件的每一行:
    • 如果该行中有*
    • 然后:
      • 保持索引(这是一个起点)
      • 在行中有“ *”的情况下阅读行
      • 保留索引(此为终点)
    • 阅读下一行

在Python中:

# your file path
filepath = 'test.txt'

with open(filepath) as fp:
    line = fp.readline()
    # Count the line index
    cnt = 1

    # Output storing deb and end index
    output = []

    # While there are lines in the file (e.g. the end of file not reached)
    while line:
        # Check if the current line has a "*"
        if "*" in line:
            # If yes, keep the count value, it's the starting point
            deb = cnt
            # Iterate while there are "*" in line
            while "*" in line:
                cnt += 1
                line = fp.readline()
            # END while (e.g end of file or there is no "*" in the line
            # Add the starting index and end index to the output 
            output.append({"deb" : deb, "end": cnt - 1})

        # Read next line
        cnt += 1
        line = fp.readline()

    print(output)
    # [{'deb': 1, 'end': 4}, {'deb': 21, 'end': 33}]

0
投票

由于人们正忙于回答,因此此人使用一种生成器来生成范围:

def find_ranges(fn):
    with open(fn) as f:
        start = None
        for line_no, line in enumerate(f):
            if start is None:
                if '*' in line:
                    start = line_no + 1 # start of a range
            elif '*' not in line:
                yield [start, line_no]  # seen end of range
                start = None
        if start is not None: # end of file without seeing end of a range
            yield [start, line_no + 1]

ranges = [range for range in find_ranges('test.txt')]
max_range = max(ranges, key = lambda x: x[1] - x[0]) # largest range seen
print(ranges, max_range[1] - max_range[0] + 1)

打印:

[[1, 4], [21, 33]] 13

当然,您可以随意设置范围的格式。

没有使用生成器的相同算法:

def find_ranges(fn):
    ranges = []
    with open(fn) as f:
        start = None
        for line_no, line in enumerate(f):
            if start is None:
                if '*' in line:
                    start = line_no + 1 # start of a range
            elif '*' not in line:
                ranges.append([start, line_no]) # end of a range
                start = None
        if start is not None: # end of file without seeing end of a range
            ranges.append([start, line_no + 1])
        max_range = max(ranges, key = lambda x: x[1] - x[0])
        return ranges, max_range[1] - max_range[0] + 1

ranges, max_range = find_ranges('test.txt')
print(ranges, max_range)
© www.soinside.com 2019 - 2024. All rights reserved.