在保留换行符（使用JavaScript）的同时将HTML转换为纯文本最方便的方法是什么？

Question

基本上我只需要从浏览器窗口复制HTML并将其粘贴到textarea元素中。

例如，我想要这个：

<p>Some</p>
<div>text<br />Some</div>
<div>text</div>

成为这个：

Some
text
Some
text

Answer 1

如果HTML在您的网页中可见，您可以通过用户选择（或IE中的TextRange）来实现。这确实保留了换行符，如果不一定是前导和尾随空格。

更新2012年12月10日

然而，toString()对象的Selection方法是not yet standardized并且在浏览器之间工作不一致，所以这种方法基于不稳定的基础，我不建议现在使用它。如果没有被接受，我会删除这个答案。

但是：Kua zxsw指出

码：

http://jsfiddle.net/wv49v/

Answer 2

我试着找到一些我曾经写过的代码。它工作得很好。让我概述一下它做了什么，希望你可以复制它的行为。

用alt或标题文本替换图像。
用“text [link]”替换链接
替换通常产生垂直空白区域的东西。 h1-h6，div，p，br，hr等（我知道，我知道。这些实际上可能是内联元素，但效果很好。）
剥去剩余的标签并用空字符串替换。

您甚至可以扩展它以格式化有序和无序列表等内容。这实际上取决于你想走多远。

编辑

找到了代码！

function getInnerText(el) {
    var sel, range, innerText = "";
    if (typeof document.selection != "undefined" && typeof document.body.createTextRange != "undefined") {
        range = document.body.createTextRange();
        range.moveToElementText(el);
        innerText = range.text;
    } else if (typeof window.getSelection != "undefined" && typeof document.createRange != "undefined") {
        sel = window.getSelection();
        sel.selectAllChildren(el);
        innerText = "" + sel;
        sel.removeAllRanges();
    }
    return innerText;
}

Answer 3

我根据这个答案做了一个函数：public static string Convert(string template) { template = Regex.Replace(template, "<img .*?alt=[\"']?([^\"']*)[\"']?.*?/?>", "$1"); /* Use image alt text. */ template = Regex.Replace(template, "<a .*?href=[\"']?([^\"']*)[\"']?.*?>(.*)</a>", "$2 [$1]"); /* Convert links to something useful */ template = Regex.Replace(template, "<(/p|/div|/h\\d|br)\\w?/?>", "\n"); /* Let's try to keep vertical whitespace intact. */ template = Regex.Replace(template, "<[A-Za-z/][^<>]*>", ""); /* Remove the rest of the tags. */ return template; }

https://stackoverflow.com/a/42254787/3626940

Answer 4

基于function htmlToText(html){ //remove code brakes and tabs html = html.replace(/\n/g, ""); html = html.replace(/\t/g, ""); //keep html brakes and tabs html = html.replace(/<\/td>/g, "\t"); html = html.replace(/<\/table>/g, "\n"); html = html.replace(/<\/tr>/g, "\n"); html = html.replace(/<\/p>/g, "\n"); html = html.replace(/<\/div>/g, "\n"); html = html.replace(/<\/h>/g, "\n"); html = html.replace(/<br>/g, "\n"); html = html.replace(/<br( )*\/>/g, "\n"); //parse html into text var dom = (new DOMParser()).parseFromString('<!doctype html><body>' + html, 'text/html'); return dom.body.textContent; }答案，我必须将基本HTML电子邮件模板转换为纯文本版本，作为node.js中构建脚本的一部分。我不得不使用chrmcpn使它工作，但这是我的代码：

JSDOM

Answer 5

三个步骤。

const htmlToText = (html) => {
    html = html.replace(/\n/g, "");
    html = html.replace(/\t/g, "");

    html = html.replace(/<\/p>/g, "\n\n");
    html = html.replace(/<\/h1>/g, "\n\n");
    html = html.replace(/<br>/g, "\n");
    html = html.replace(/<br( )*\/>/g, "\n");

    const dom = new JSDOM(html);
    let text = dom.window.document.body.textContent;

    text = text.replace(/  /g, "");
    text = text.replace(/\n /g, "\n");
    text = text.trim();
    return text;
}

在保留换行符（使用JavaScript）的同时将HTML转换为纯文本最方便的方法是什么？

问题描述投票：11回答：5

5个回答

最新问题

在保留换行符（使用JavaScript）的同时将HTML转换为纯文本最方便的方法是什么？

问题描述 投票：11回答：5

5个回答

最新问题

问题描述投票：11回答：5