用C ++编写解析器以解析给定的HTML实体

Question

最近，我遇到了一个编码问题，我们不得不解析提到的HTML实体。以下这些实体需要解析-

"至“
'至-'
>至>
<至<
&至＆
&frasl;至⁄

将提供给定的字符串txt，必须根据上述规则进行解析。以下是我的方法，效果很好。

string parse(string txt){
    int n=txt.size();
    for(int i=0;i<n;i++){             //edit : why don't I get an error even though I loop for full length after erasing some elements of string?
            if(txt[i]=='&'){
                if(i+5<n&&txt.substr(i,6)=="&quot;"){
                    txt[i]='"';
                    txt.erase(i+1,5);
                }
                else if(i+5<n&&txt.substr(i,6)=="&apos;"){
                    txt[i]=(char)(39);            //I also wasn't able to do like this -txt[i]='\''; would be nice if someone tells why this gave error
                    txt.erase(i+1,5);
                }
                else if(i+4<n&&txt.substr(i,5)=="&amp;"){
                    txt[i]='&';
                    txt.erase(i+1,4);
                }
                else if(i+3<n&&txt.substr(i,4)=="&gt;"){
                    txt[i]='>';
                    txt.erase(i+1,3);
                }
                else if(i+3<n&&txt.substr(i,4)=="&lt;"){
                    txt[i]='<';
                    txt.erase(i+1,3);
                }
                else if(i+6<n&&txt.substr(i,7)=="&frasl;"){
                    txt[i]='/';
                    txt.erase(i+1,6);
                }       
            }
        }
    return txt;
}

我觉得我以最粗暴的方式做到了。但我想知道是否还有比我的代码更简单（可能是更短）的方法。

感谢任何帮助或方法！

EDIT

：正如我的评论所指出的那样，我的循环实际上使用了原始的字符串长度n，但是当循环减小txt字符串长度时，我正在擦除某些元素。令人惊讶的是，我没有收到任何错误，如果有人解释为什么会有所帮助？

[最近，我遇到了一个编码问题，我们必须解析提到的HTML实体。以下这些实体需要解析-“ to”'to-'> to>＆...

Answer 1

2
投票

这是我建议解决此问题的方式：

用C ++编写解析器以解析给定的HTML实体

问题描述投票：0回答：1

1个回答

最新问题

用C ++编写解析器以解析给定的HTML实体

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1