为什么输入文件在被argparse传递后经过for循环后被读取为列表

Question

我正在尝试制作一个脚本，该脚本将采用一个文件和 2 个附加参数，然后使用 start_point 和 end_point 参数来提取之间的文本。

但是，当运行此命令时，我收到错误（第 35 行）“TypeError: can only concatenate str (not “list”) to str”。我不明白这一点，因为输入文件被传递到 for 循环，其中应从输入文件中读取每一行，在该行上执行正则表达式查询，然后将字符串打印出来/附加到文件中。

import re
import argparse
#import requests

parser = argparse.ArgumentParser(description='Extracts text between a start string and a end string. It also writes the results to a file calle search_output.')
parser.add_argument('--input','-i',
    type = str,
    nargs = '?',
    dest = 'input_file',
    help='Input file name.'
)
parser.add_argument('--start','-s',
    type = str,
    nargs = '+',
    dest = 'start_point',
    help='The string (within quotes) you want to search from.'
)
parser.add_argument('--end','-e',
    type = str,
    nargs = '+',
    dest = 'end_point',
    help='The string (within quotes) you want to search up to.'
)

args = parser.parse_args()

fileName = args.input_file
start_string = args.start_point
end_string = args.end_point

content = open(fileName,'r')
for line in content:
   result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
    if result:
        print(result.group(1))
        f = open("search_output","a")
        f.write(result.group(1)+"\n")
        f.close()

我已经查看了 argparse 的文档，并尝试使用不同的方法来读取文件，例如将 input_file 参数的类型设置为 'argparse.FileType('r')' ，然后使用 (args.input_file.readlines())并将其设置为“内容”变量。然而，我认为我一定是误解了，因为我在网上看到的所有内容都表明这应该有效。

在这个脚本的先前版本中，我没有使用标志而只是使用位置参数，它按预期工作，但是我想扩展它的功能，这样我就可以传递 URL 并让它直接在网页上工作。

完整错误消息

$python3 betweeny_grabber2.py -i test -s '.asp">' -e '</a></td>'
Traceback (most recent call last):
  File "/home/george/Tools/Scripts/Python/betweeny_grabber2.py", line 35, in <module>
    result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
                   ~~~~~~^~~~~~~~~~~~~
TypeError: can only concatenate str (not "list") to str

上一个版本

import re
import argparse


parser = argparse.ArgumentParser(description='Extracts text between a start string and a end string. It also writes the results to a file calle search_output.')
parser.add_argument('input', type=str, help='Input file name.')
parser.add_argument('start_point', type=str, help='The string (within quotes) you want to search from.')
parser.add_argument('end_point', type=str, help='The string (within quotes) you want the search to end at.')
args = parser.parse_args()

input_file = args.input
start_string = args.start_point
end_string = args.end_point

content = open(input_file,"r")
for line in content:
    result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
    if result:
        print(result.group(1))
        f = open("search_output","a")
        f.write(result.group(1)+"\n")
        f.close()

Answer 1

对于命令行

-i test -s '.asp">' -e '</a></td>'

args

是

Namespace(input_file='test', start_point=['.asp">'], end_point=['</a></td>'])

注意

start_point

和

end_point

是列表，而不是字符串；这就是你得到这个错误的原因。要解决此问题，您需要修复参数以删除

nargs

。您不必指定

type=str

，因为这是默认值。

parser.add_argument("--input", "-i", dest="input_file", help="Input file name.")
parser.add_argument(
    "--start",
    "-s",
    dest="start_point",
    help="The string (within quotes) you want to search from.",
)
parser.add_argument(
    "--end",
    "-e",
    dest="end_point",
    help="The string (within quotes) you want to search up to.",
)

为什么输入文件在被argparse传递后经过for循环后被读取为列表

问题描述投票：0回答：1

1个回答

最新问题

为什么输入文件在被argparse传递后经过for循环后被读取为列表

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1