Lex程序从输入文件中删除注释

问题描述 投票:1回答:2

目前我正在尝试从输入文件中删除所有形式的注释。但是,我无法弄清楚如何删除特定表单“{comment}”。我知道在这个网站上有很多正则表达式的例子来删除多行/单行注释,但我无法弄明白。

输入:

       int j=100;
       /* comment needs to be removed*/
       int c = 200;


      /*
       *comment needs to be removed 
       */

      count = count + 1;

     {comment needs to be removed}

      i++;

输出:

int j=100;
int c =200;
count = count +1;
i++;

我已经能够删除前两个注释,但不能删除最后一个注释。我尝试使用"{}".*的正则表达,但是这对我的上一次评论{comment}不起作用。是否有正则表达式可用于纠正这一点,还是我更好地在C中创建一个函数并以这种方式处理这种情况?

c regex comments lex
2个回答
0
投票

我不知道{}中包含什么样的评论,但你应该小心。

试试这个正则表达式。

\/\*[\s\S]*?\*\/|{[^{}]*?}

Try it online


0
投票

==请注意,对于下面的所有正则表达式,匹配必须由$2(捕获组2)替换,后者写回非注释。这有效地删除了所有注释==

这是一个标准的C ++注释解析器。 这是保留格式的扩展版本。

生的:

(?m)((?:(?:^[ \t]*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|/\*|//)))?|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|/\*|//))|(?=\r?\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|(?:\r?\n|[\S\s])[^/"'\\\s]*)

划定/ regex /

/(?m)((?:(?:^[ \t]*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/)))?|\/\/(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/))|(?=\r?\n))))+)|((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\r?\n(?:(?=(?:^[ \t]*)?(?:\/\*|\/\/))|[^\/"'\\\r\n]*))+|[^\/"'\\\r\n]+)+|[\S\s][^\/"'\\\r\n]*)/

演示PCRE:https://regex101.com/r/UldYK5/1 演示Python:https://regex101.com/r/avfSfB/1

----------------------------------------------------------

这是上面的修改版本,添加你的{ .. }评论。 (这不推荐,因为{}是C语言中的作用域)

生的:

(?m)((?:(?:^[ \t]*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|/\*|//|\{)))?|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|/\*|//|\{))|(?=\r?\n))|\{[\S\s]*?\}(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|/\*|//|\{)))?))+)|((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\r?\n(?:(?=(?:^[ \t]*)?(?:/\*|//|\{))|[^/"'\\\r\n{]*))+|[^/"'\\\r\n{]+)+|[\S\s][^/"'\\\r\n{]*)

划定/ regex /

/(?m)((?:(?:^[ \t]*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/|\{)))?|\/\/(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/|\{))|(?=\r?\n))|\{[\S\s]*?\}(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/|\{)))?))+)|((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\r?\n(?:(?=(?:^[ \t]*)?(?:\/\*|\/\/|\{))|[^\/"'\\\r\n{]*))+|[^\/"'\\\r\n{]+)+|[\S\s][^\/"'\\\r\n{]*)/

演示PCRE(使用示例文本):https://regex101.com/r/xHTua7/1

带注释的可读版本

    (?m)                             # Multi-line modifier
    (                                # (1 start), Comments 
         (?:
              (?: ^ [ \t]* )?                  # <- To preserve formatting
              (?:
                   /\*                              # Start /* .. */ comment
                   [^*]* \*+
                   (?: [^/*] [^*]* \*+ )*
                   /                                # End /* .. */ comment
                   (?:                              # <- To preserve formatting 
                        [ \t]* \r? \n                                      
                        (?=
                             [ \t]*                  
                             (?:
                                  \r? \n 
                               |  /\*
                               |  // 
                               |  \{                               # Added:  for {} comments
                             )
                        )
                   )?
                |                                 # or,
                   //                               # Start // comment
                   (?:                              # Possible line-continuation
                        [^\\] 
                     |  \\ 
                        (?: \r? \n )?
                   )*?
                   (?:                              # End // comment
                        \r? \n                               
                        (?=                              # <- To preserve formatting
                             [ \t]*                          
                             (?:
                                  \r? \n 
                               |  /\*
                               |  // 
                               |  \{                               # Added:  for {}  comments
                             )
                        )
                     |  (?= \r? \n )
                   )
                |                                 # or,
                   \{                               # Added:  Start { .. } comment
                   [\S\s]*? 
                   \}                               # Added:  End { .. } comment
                   (?:                              # <- To preserve formatting 
                        [ \t]* \r? \n                                      
                        (?=
                             [ \t]*                  
                             (?:
                                  \r? \n 
                               |  /\*
                               |  // 
                               |  \{                               # Added:  for {} comments
                             )
                        )
                   )?
              )
         )+                               # Grab multiple comment blocks if need be
    )                                # (1 end)

 |                                 ## OR

    (                                # (2 start), Non - comments 
         # Quotes
         # ======================
         (?:                              # Quote and Non-Comment blocks
              "
              [^"\\]*                          # Double quoted text
              (?: \\ [\S\s] [^"\\]* )*
              "
           |                                 # --------------
              '
              [^'\\]*                          # Single quoted text
              (?: \\ [\S\s] [^'\\]* )*
              ' 
           |                                 # --------------

              (?:                              # Qualified Linebreak's
                   \r? \n                           
                   (?:
                        (?=                              # If comment ahead just stop
                             (?: ^ [ \t]* )?
                             (?:
                                  /\*
                               |  // 
                               |  \{                               # Added:  for {} comments
                             )
                        )
                     |                                 # or,
                                                         # Added:  [^{] for {} comments
                        [^/"'\\\r\n{]*                   # Chars which doesn't start a comment, string, escape,
                                                         # or line continuation (escape + newline)
                   )
              )+
           |                                 # --------------
                                               # Added:  [^{] for {} comments
              [^/"'\\\r\n{]+                   # Chars which doesn't start a comment, string, escape,
                                               # or line continuation (escape + newline)

         )+                               # Grab multiple instances

      |                                 # or,
         # ======================
         # Pass through

         [\S\s]                           # Any other char
                                          # Added:  [^{] for {} comments
         [^/"'\\\r\n{]*                   # Chars which doesn't start a comment, string, escape,
                                          # or line continuation (escape + newline)

    )                                # (2 end), Non - comments 
© www.soinside.com 2019 - 2024. All rights reserved.