我正在尝试制作一个脚本,该脚本将采用一个文件和 2 个附加参数,然后使用 start_point 和 end_point 参数来提取之间的文本。
但是,当运行此命令时,我收到错误(第 35 行)“TypeError: can only concatenate str (not “list”) to str”。我不明白这一点,因为输入文件被传递到 for 循环,其中应从输入文件中读取每一行,在该行上执行正则表达式查询,然后将字符串打印出来/附加到文件中。
import re
import argparse
#import requests
parser = argparse.ArgumentParser(description='Extracts text between a start string and a end string. It also writes the results to a file calle search_output.')
parser.add_argument('--input','-i',
type = str,
nargs = '?',
dest = 'input_file',
help='Input file name.'
)
parser.add_argument('--start','-s',
type = str,
nargs = '+',
dest = 'start_point',
help='The string (within quotes) you want to search from.'
)
parser.add_argument('--end','-e',
type = str,
nargs = '+',
dest = 'end_point',
help='The string (within quotes) you want to search up to.'
)
args = parser.parse_args()
fileName = args.input_file
start_string = args.start_point
end_string = args.end_point
content = open(fileName,'r')
for line in content:
result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
if result:
print(result.group(1))
f = open("search_output","a")
f.write(result.group(1)+"\n")
f.close()
我已经查看了 argparse 的文档,并尝试使用不同的方法来读取文件,例如将 input_file 参数的类型设置为 'argparse.FileType('r')' ,然后使用 (args.input_file.readlines())并将其设置为“内容”变量。然而,我认为我一定是误解了,因为我在网上看到的所有内容都表明这应该有效。
在这个脚本的先前版本中,我没有使用标志而只是使用位置参数,它按预期工作,但是我想扩展它的功能,这样我就可以传递 URL 并让它直接在网页上工作。
完整错误消息
$python3 betweeny_grabber2.py -i test -s '.asp">' -e '</a></td>'
Traceback (most recent call last):
File "/home/george/Tools/Scripts/Python/betweeny_grabber2.py", line 35, in <module>
result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
~~~~~~^~~~~~~~~~~~~
TypeError: can only concatenate str (not "list") to str
上一个版本
import re
import argparse
parser = argparse.ArgumentParser(description='Extracts text between a start string and a end string. It also writes the results to a file calle search_output.')
parser.add_argument('input', type=str, help='Input file name.')
parser.add_argument('start_point', type=str, help='The string (within quotes) you want to search from.')
parser.add_argument('end_point', type=str, help='The string (within quotes) you want the search to end at.')
args = parser.parse_args()
input_file = args.input
start_string = args.start_point
end_string = args.end_point
content = open(input_file,"r")
for line in content:
result = re.search("(?<="+start_string+")(.*?)(?="+end_string+")",line)
if result:
print(result.group(1))
f = open("search_output","a")
f.write(result.group(1)+"\n")
f.close()
对于命令行
-i test -s '.asp">' -e '</a></td>'
args
是
Namespace(input_file='test', start_point=['.asp">'], end_point=['</a></td>'])
注意
start_point
和 end_point
是列表,而不是字符串;这就是你得到这个错误的原因。要解决此问题,您需要修复参数以删除 nargs
。您不必指定 type=str
,因为这是默认值。
parser.add_argument("--input", "-i", dest="input_file", help="Input file name.")
parser.add_argument(
"--start",
"-s",
dest="start_point",
help="The string (within quotes) you want to search from.",
)
parser.add_argument(
"--end",
"-e",
dest="end_point",
help="The string (within quotes) you want to search up to.",
)