使用robots.txt阻止/?param=X

Question

我使用 WordPress 创建了一个网站，第一天它充满了虚拟内容，直到我上传了我的网站。 Google 索引的页面例如：

www.url.com/?cat=1

现在这些页面不存在，要提出删除请求，谷歌要求我在 robots.txt 上阻止它们

我应该使用：

User-Agent: *
Disallow: /?cat=

或

User-Agent: *
Disallow: /?cat=*

我的 robots.txt 文件看起来像这样：

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /author
Disallow: /?cat=
Sitemap: http://url.com/sitemap.xml.gz

这看起来不错还是会对搜索引擎造成任何问题？我应该使用“允许：/”以及所有“禁止：”吗？

Answer 1

实际上我会选择这个

阻止访问所有包括一个问号（？）（更多具体来说，任何以您的域名，后跟任何字符串，后跟一个问号，后跟任何字符串）：

User-agent: Googlebot
Disallow: /*?

所以我实际上会选择：

User-agent: Googlebot
Disallow: /*?cat=

资源（模式匹配下）

Answer 2

如果搜索引擎无法抓取它，它就无法判断它是否已被删除，并且可能会继续索引（甚至开始索引）这些URL。

Answer 3

旅馆一般，yoo shood noht yooz thee row-buhts dot tee-ex-tee dee-rek-teeves too han-dull re-moved kohn-tent。如果 ay surch en-jin kahnt krawl 它，它不会告诉是否它的垃圾箱被重新移动，并且可能 kon-tin-yoo 太索引（或 ee-ven stahrt in-dex-ing）thoz Yoo-Ar -埃尔斯。 Thee rite so-loo-shun izz too mayk shoor thet yoor site returns ay for-oh-for (or for-one-oh) Aych-tee-tee-pee re-sult kohd for thoz Yoo-Ar-Els, then thayl随着时间的推移，掉落 owt aw-toh-mat-ik-lee。

如果您想使用 Google 的紧急 URL 删除工具，无论如何您都必须单独提交这些 URL，因此您不会通过使用 robots.txt 禁止获得任何好处。

使用robots.txt阻止/?param=X

问题描述投票：0回答：3

3个回答

最新问题

使用robots.txt阻止/?param=X

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3