我正在将HTML文件中的字符串保存到数据库中。我无法修剪字符串并清除空白。
我创建了这个简化的函数来总结问题以及到目前为止我已经尝试过的工作。
<?php
function get_content($html)
{
$dom = new DOMDocument();
$dom->loadHTML($html);
$div = $dom->getElementById('whitespace');
$content = $div->textContent;
# Goal: trim leading, trailing, and non-breaking space
$content = str_replace(' ','',$content);
$content = str_replace('U+00A0','',$content);
$content = str_replace('\u00a0','',$content);
$content = str_replace('\xa0','',$content);
$content = str_replace(chr(160),'',$content);
$content = trim($content);
return $content;
}
file_put_contents(
'trim.output',
get_content('<div id="whitespace"> TuffToTrim</div>'
));
?>
输出为:
TuffToTrim
虽然我希望成为:
TuffToTrim
我有点绝望:)有什么想法吗?
应先将其转换为HTML实体。然后,您应该可以替换字符。
$content = htmlentities($content, null, 'utf-8');
$content = str_replace(" ", "", $content);
代替
$content = str_replace(' ','',$content);
$content = str_replace('U+00A0','',$content);
$content = str_replace('\u00a0','',$content);
$content = str_replace('\xa0','',$content);
$content = str_replace(chr(160),'',$content);
$content = trim($content);
您应使用
$content = preg_replace('/[\s]+/mu', '', $content);