如何删除字符串中前两个匹配模式之间的内容?

问题描述 投票:0回答:1

说我有x

x <- 'This is the <span I want to remove this> text I would like. But we must check I can have another chevron, >, in the string.'

我想要

x <- 'This is the text I would like. But we must check I can have another chevron, >, in the string.'

我该怎么做?

到目前为止,我做了以下操作,但它删除了我想保留的文本:

sub("<span.*>", "", x)
#> [1] "This is the , in the string."

谢谢

r string substring
1个回答
0
投票

可能重复:从 R 中的字符串中删除 html 标签

您是否尝试用

html
解析
regex

如果是这样,那不是最好的方法:RegEx 匹配除 XHTML 自包含标签之外的开放标签

试试这个:

library(rvest)

x <- 'This is the <span I want to remove this> text I would like. But we must check I can have another chevron, >, in the string.'

html_text(read_html(x))

输出

x

[1] "This is the  text I would like. But we must check I can have another chevron, >, in the string."

来自 txt 文件:

x <- read_file("temp.txt") # Content bellow
cat(rvest::html_text(read_html(x)))

输出

temp.txt

Say I have x
x <- 'This is the <span I want to remove this> text I would like. But we must check I can have another chevron, >, in the string.'

and I want x <- 'This is the text I would like. But we must check I can have another chevron, >, in the string.'
How would I do that?
So far I did the following, but it got rid of text I wanted to keep:
sub("<span.*>", "", x)
#> [1] "This is the , in the string."

Thanks

temp.txt
内容:

<div class="postcell post-layout--right">
    
<div class="s-prose js-post-body" itemprop="text">
                
<p>Say I have x</p>
<pre class="lang-r s-code-block"><code class="hljs language-r">x <span class="hljs-operator">&lt;-</span> <span class="hljs-string">'This is the &lt;span I want to remove this&gt; text I would like. But we must check I can have another chevron, &gt;, in the string.'</span>
</code></pre>
<p>and I want <code>x &lt;- 'This is the text I would like. But we must check I can have another chevron, &gt;, in the string.'</code></p>
<p>How would I do that?</p>
<p>So far I did the following, but it got rid of text I wanted to keep:</p>
<pre class="lang-r s-code-block"><code class="hljs language-r">sub<span class="hljs-punctuation">(</span><span class="hljs-string">"&lt;span.*&gt;"</span><span class="hljs-punctuation">,</span> <span class="hljs-string">""</span><span class="hljs-punctuation">,</span> x<span class="hljs-punctuation">)</span>
<span class="hljs-comment">#&gt; [1] "This is the , in the string."</span>
</code></pre>
<p>Thanks</p>
</div>
© www.soinside.com 2019 - 2024. All rights reserved.