为什么输入文件在被argparse传递后经过for循环后被读取为列表

问题描述 投票:0回答:1

我正在尝试制作一个脚本,该脚本将采用一个文件和 2 个附加参数,然后使用 start_point 和 end_point 参数来提取之间的文本。

但是,当运行此命令时,我收到错误(第 35 行)“TypeError: can only concatenate str (not “list”) to str”。我不明白这一点,因为输入文件被传递到 for 循环,其中应从输入文件中读取每一行,在该行上执行正则表达式查询,然后将字符串打印出来/附加到文件中。

import re
import argparse
#import requests

parser = argparse.ArgumentParser(description='Extracts text between a start string and a end string. It also writes the results to a file calle search_output.')
parser.add_argument('--input','-i',
    type = str,
    nargs = '?',
    dest = 'input_file',
    help='Input file name.'
)
parser.add_argument('--start','-s',
    type = str,
    nargs = '+',
    dest = 'start_point',
    help='The string (within quotes) you want to search from.'
)
parser.add_argument('--end','-e',
    type = str,
    nargs = '+',
    dest = 'end_point',
    help='The string (within quotes) you want to search up to.'
)

args = parser.parse_args()

fileName = args.input_file
start_string = args.start_point
end_string = args.end_point

content = open(fileName,'r')
for line in content:
   result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
    if result:
        print(result.group(1))
        f = open("search_output","a")
        f.write(result.group(1)+"\n")
        f.close()

我已经查看了 argparse 的文档,并尝试使用不同的方法来读取文件,例如将 input_file 参数的类型设置为 'argparse.FileType('r')' ,然后使用 (args.input_file.readlines())并将其设置为“内容”变量。然而,我认为我一定是误解了,因为我在网上看到的所有内容都表明这应该有效。

在这个脚本的先前版本中,我没有使用标志而只是使用位置参数,它按预期工作,但是我想扩展它的功能,这样我就可以传递 URL 并让它直接在网页上工作。

完整错误消息

$python3 betweeny_grabber2.py -i test -s '.asp">' -e '</a></td>'
Traceback (most recent call last):
  File "/home/george/Tools/Scripts/Python/betweeny_grabber2.py", line 35, in <module>
    result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
                   ~~~~~~^~~~~~~~~~~~~
TypeError: can only concatenate str (not "list") to str

上一个版本

import re
import argparse


parser = argparse.ArgumentParser(description='Extracts text between a start string and a end string. It also writes the results to a file calle search_output.')
parser.add_argument('input', type=str, help='Input file name.')
parser.add_argument('start_point', type=str, help='The string (within quotes) you want to search from.')
parser.add_argument('end_point', type=str, help='The string (within quotes) you want the search to end at.')
args = parser.parse_args()

input_file = args.input
start_string = args.start_point
end_string = args.end_point

content = open(input_file,"r")
for line in content:
    result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
    if result:
        print(result.group(1))
        f = open("search_output","a")
        f.write(result.group(1)+"\n")
        f.close()
python-3.x argparse
1个回答
0
投票

对于命令行

-i test -s '.asp">' -e '</a></td>'

args

Namespace(input_file='test', start_point=['.asp">'], end_point=['</a></td>'])

注意

start_point
end_point
是列表,而不是字符串;这就是你得到这个错误的原因。要解决此问题,您需要修复参数以删除
nargs
。您不必指定
type=str
,因为这是默认值。

parser.add_argument("--input", "-i", dest="input_file", help="Input file name.")
parser.add_argument(
    "--start",
    "-s",
    dest="start_point",
    help="The string (within quotes) you want to search from.",
)
parser.add_argument(
    "--end",
    "-e",
    dest="end_point",
    help="The string (within quotes) you want to search up to.",
)
© www.soinside.com 2019 - 2024. All rights reserved.