如何在命令行参数中使用popen包含单引号和双引号?

问题描述 投票:1回答:1

我想在python3中使用jq跟随subprocess.Popen()命令运行。

$ jq  'INDEX(.images[]; .id) as $imgs | {
    "filename_with_label":[
         .annotations[]
        | select(.attributes.type=="letter" )
        | $imgs[.image_id] + {label:.text}
        | {id:.id} + {filename:.file_name} + {label:.label}
     ]
   }' image_data_annotation.json > image_data_annotation_with_label.json

请注意,第一个命令行参数包含点,美元符号,单引号内的双引号。仅供参考,jq是用于处理json文件的JSON处理器实用程序。

[我编写了以下python3脚本,用于使用jq实用程序自动执行JSON文件处理。

#!python3
# file name: letter_image_tool.py

import os, subprocess

"""
command line example to automate
$ jq  'INDEX(.images[]; .id) as $imgs | {
    "filename_with_label":[
         .annotations[]
        | select(.attributes.type=="letter" )
        | $imgs[.image_id] + {label:.text}
        | {id:.id} + {filename:.file_name} + {label:.label}
     ]
   }' image_data_annotation.json > image_data_annotation_with_label.json
"""

# define first command line argument
jq_filter='\'INDEX(.images[]; .id) as $imgs | { "filename_with_label" : [ .annotations[] | select(.attributes.type=="letter" ) | $imgs[.image_id] + {label:.text} | {id:.id} + {filename:.file_name} + {label:.label} ] }\''

input_json_files= [ "image_data_annotation.json"]
output_json_files= []

for input_json in input_json_files:
    print("Processing %s" %(input_json))
    filename, ext = os.path.splitext(input_json)
    output_json = filename + "_with_label" + ext
    output_json_files.append(output_json)
    print("output file is : %s" %(output_json))

    #jq_command ='jq' + " " +  jq_filter, input_json + ' > ' +  output_json
    jq_command =['jq', jq_filter,  input_json + ' > ' +  output_json]
    print(jq_command)
    subprocess.Popen(jq_command, shell=True)

在bash上运行上述python脚本会导致以下情况:

$ ./letter_image_tool.py
Processing image_data_annotation.json
output file is : image_data_annotation_with_label.json
['jq', '\'INDEX(.images[]; .id) as $imgs | { "filename_with_label" : [ .annotations[] | select(.attributes.type=="letter" ) | $imgs[.image_id] + {label:.text} | {id:.id} + {filename:.file_name} + {label:.label} ] }\'', 'image_data_annotation.json > image_data_annotation_with_label.json']
jq - commandline JSON processor [version 1.6-124-gccc79e5-dirty]

Usage:  jq [options] <jq filter> [file...]
        jq [options] --args <jq filter> [strings...]
        jq [options] --jsonargs <jq filter> [JSON_TEXTS...]

jq is a tool for processing JSON inputs, applying the given filter to
its JSON text inputs and producing the filter's results as JSON on
standard output.

The simplest filter is ., which copies jq's input to its output
unmodified (except for formatting, but note that IEEE754 is used
for number representation internally, with all that that implies).

For more advanced filters see the jq(1) manpage ("man jq")
and/or https://stedolan.github.io/jq

Example:

        $ echo '{"foo": 0}' | jq .
        {
                "foo": 0
        }

For a listing of options, use jq --help.

它不处理jq实用程序的第一个参数:

'INDEX(.images[]; .id) as $imgs | {
    "filename_with_label":[
         .annotations[]
        | select(.attributes.type=="letter" )
        | $imgs[.image_id] + {label:.text}
        | {id:.id} + {filename:.file_name} + {label:.label}
     ]
   }'

第一个参数应该像上面的片段一样用单引号括起来,但我的脚本无法处理。

我认为主要问题与第一个命令行参数(上述python脚本中的jq_filter)所使用的点,美元符号,单引号和双引号有关。但是我不知道该如何对待这种与bash相关的复杂元字符。

我该如何解决上述问题?

感谢您的阅读。

使用我的解决方案更新

用三引号表示jq_filter,定义和空格分隔的连接,如下所示

#!python3
# file name: letter_image_tool.py

import os, subprocess

"""
command line example to automate
$ jq  'INDEX(.images[]; .id) as $imgs | {
    "filename_with_label":[
         .annotations[]
        | select(.attributes.type=="letter" )
        | $imgs[.image_id] + {label:.text}
        | {id:.id} + {filename:.file_name} + {label:.label}
     ]
   }' image_data_annotation.json > image_data_annotation_with_label.json
"""

# define first command line argument with triple quotes
jq_filter=""" 'INDEX(.images[]; .id) as $imgs | { 
       "filename_with_label" : [ 
        .annotations[] 
       | select(.attributes.type=="letter" ) 
       | $imgs[.image_id] + {label:.text} 
       | {id:.id} + {filename:.file_name} + {label:.label} ] } ' """

input_json_files= [ "image_data_annotation.json"]
output_json_files= []

for input_json in input_json_files:
    print("Processing %s" %(input_json))
    filename, ext = os.path.splitext(input_json)
    output_json = filename + "_with_label" + ext
    output_json_files.append(output_json)
    print("output file is : %s" %(output_json))

    #jq_command ='jq' + " " +  jq_filter, input_json + ' > ' +  output_json
    # jq command composed with space separated join
    jq_command =' '.join['jq', jq_filter,  input_json, ' > ',  output_json]
    print(jq_command)

    # shell keyword argument should be set True
    subprocess.Popen(jq_command, shell=True)

使用三重双引号,可以使用多行定义而不是单行定义来使jq_filter更具可读性。

python bash escaping popen quote
1个回答
0
投票

需要单引号的原因是为了防止shell对参数进行任何扩展。仅当使用shell=True时,这是一个问题。如果未设置此选项,则外壳程序将永远不会触碰您的参数,也无需“保护”它们。

但是,shell也负责stdout重定向(即[... '>', output_json])。不使用外壳程序,则需要在Python代码中处理重定向。但是,这就像将参数stdout=...添加到Popen一样简单。

全部,这意味着您的代码可以重写为

import os
import subprocess

# Still define first command line argument with triple quotes for readability
# Note that there are no single quotes though
jq_filter = """INDEX(.images[]; .id) as $imgs | {
       "filename_with_label" : [
        .annotations[]
       | select(.attributes.type=="letter" )
       | $imgs[.image_id] + {label:.text}
       | {id:.id} + {filename:.file_name} + {label:.label} ] }"""

input_json_files = ["image_data_annotation.json"]
output_json_files = []

for input_json in input_json_files:
    print("Processing %s" % (input_json))
    filename, ext = os.path.splitext(input_json)
    output_json = filename + "_with_label" + ext
    output_json_files.append(output_json)
    print("output file is : %s" % (output_json))

    # Keep command as list, since this is what we need when NOT using shell=True
    # Note also that the redirect and the output file are not parts of the argument list
    jq_command = ['jq', jq_filter,  input_json]

    # shell keyword argument should NOT be set True
    # Instead redirect stdout to a out_file
    # (We must open the for writing before redirecting)
    with open(output_json, "w") as out_file:
        subprocess.Popen(jq_command, stdout=out_file)

通常,建议不要使用shell=True,因为这样会打开另一个针对代码的攻击媒介,因为注入攻击可以完全访问shell。另外,不使用shell的另一个小好处是,由于不需要额外的shell进程,它将减少创建的子进程的数量。

© www.soinside.com 2019 - 2024. All rights reserved.