希望大家一切都好。感谢您尝试提供帮助,我真的很感激。 我有一个自动抓取脚本,可以从 html 站点获取值。当我检查数据库时,我有一个“作者”字段。所以作者在数据库中有多余的空格,如下所示:
$string = "Kada Sarkozi
";
(并不是说此处的空格被消除为此表单上的 1 个额外空格,因此它没有显示,而仅显示 1 个空格) 所以我试图做的是在下面的代码中,然后当我获取我在mysql中使用的值时,但似乎有额外的空间,因为
TRIM()
被认为与带有额外空间的Kada Sarkozi
不同。因此,我需要您的帮助,了解如何删除自动化脚本中的空格,并且在获取数据时,我希望将作者分组在一起,但没有任何额外的空格,我将仅共享删除额外空格的部分并转换字符:这是获取作者值的自动化脚本部分:
Kada Sarkozi
character_coversion_entities 函数是:
$author = $node->textContent;
$author = trim($author);
$author = preg_replace( "/\r|\n/", "", $author);
$author = character_coversion_entities($author);
$author = str_replace("'", "'", $author);
$author = str_replace('"', """, $author);
$author = trim($author);
关于抓取功能:
function character_coversion_entities($string) {
return htmlentities($string, ENT_COMPAT,'UTF-8');
}
$query = "SELECT *, news.id AS news_id, feeds_list.id AS news_feed_id, COUNT(*) as count FROM news ";
$query .= "INNER JOIN feeds_list ON feeds_list.id = news.feed_id WHERE news.subject_id = 1 AND feeds_list.active = 1 ";
$query .= "AND feeds_list.insert_date <= UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY) AND news.insert_date >= UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY) ";
$query .= "AND TRIM(news.author) != '' AND LOWER(TRIM(news.author)) != LOWER(feeds_list.feed_name) ";
$query .= "AND TRIM(news.author) NOT LIKE '% and %' AND TRIM(news.author) NOT LIKE '% Editor %' ";
$query .= "AND TRIM(news.author) != 'Editor' ";
$query .= "AND TRIM(news.author) NOT LIKE '% F1 Desk %' AND TRIM(news.author) != 'F1 Desk' ";
$query .= "AND TRIM(news.author) NOT LIKE '% Motorsport Network %' AND TRIM(news.author) NOT LIKE 'Motorsport Network' AND TRIM(news.author) != 'Motorsport Network' ";
$query .= "AND TRIM(news.author) NOT LIKE '% Staff %' AND TRIM(news.author) NOT LIKE '% Staff' AND TRIM(news.author) NOT LIKE '% staff' ";
$query .= "AND TRIM(news.author) NOT LIKE '% & %' AND TRIM(news.author) NOT LIKE '%, %' ";
$query .= "AND TRIM(news.author) NOT LIKE '% RACER Staff %' AND TRIM(news.author) != 'RACER Staff' ";
$query .= "AND LOWER(TRIM(news.author)) != 'gpblog.com' AND LOWER(TRIM(news.author)) NOT LIKE '% gpblog.com %' AND LOWER(TRIM(news.author)) NOT LIKE '%gpblog.com%' ";
$query .= "AND TRIM(news.author) REGEXP '[^0-9]+$' ";
$query .= "GROUP BY TRIM(news.author) ORDER BY count DESC";
仅修剪正常空格。
尝试执行TRIM(col)
以确保您了解列数据中的内容。在十六进制中,
SELECT col, HEX(col)
是普通空格,20
是TAB,09
是换行,0A
是回车。尝试用类似的方法代替 0D
来消除前导和尾随换行符和空格。
TRIM(col)
这会将回车符和换行符转换为空格。然后,如果它们位于字符串的开头或结尾,则会被修剪。
如果您在将数据放入 MySQL 之前
使用自动化脚本清理数据,则没有必要这样做。但是你确实需要翻转第一个 phpTRIM(REPLACE(REPLACE(col, '\n', ' '), '\r', ' '))
和 trim()
的顺序才能得到正确的结果。