codeigniter中的robots.txt - 允许查看/功能

Question

我读了一下robots.txt，我读过我应该禁止我的网页应用程序中的所有文件夹，但我想让机器人阅读主页和一个视图（网址是例如：www.mywebapp / searchresults - 这是一个codeigniter route - 从应用程序/控制器/函数调用它。

文件夹结构例如是：

-index.php(should be able to read by bots)
-application
  -controllers
    -controller(here is a function which load view)
  -views
-public

我应该像这样创建robots.txt：

User-agent: *
Disallow: /application/
Disallow: /public/
Allow: /application/controllers/function

或使用类似的路线

User-agent: *
Disallow: /application/
Disallow: /public/
Allow: /www.mywebapp/searchresults

或者可能使用观点？

User-agent: *
Disallow: /application/
Disallow: /public/
Allow: /application/views/search/index.php

谢谢！

Answer 1

您不会阻止视图文件，因为爬网程序无法直接访问该文件。您需要阻止用于访问视图的URL

robots.txt文件必须放在主机的文档根目录中。它不适用于其他地方。

If your host is www.example.com, it needs to be accessible at http://www.example.com/robots.txt

要删除网站的目录或单个页面，可以在服务器的根目录下放置robots.txt文件。在创建robots.txt文件时，请记住以下几点：确定要在特定主机上抓取的网页，Googlebot将使用以“Googlebot”开头的用户代理服从robots.txt文件中的第一条记录。如果不存在此类条目，则它将遵循用户代理为“”的第一个条目。此外，Google还通过使用星号为robots.txt文件标准增加了灵活性。禁止模式可以包括“”以匹配任何字符序列，并且模式可以以“$”结尾以指示名称的结尾。

To remove all pages under a particular directory (for example, listings), you'd use the following robots.txt entry:

User-agent: Googlebot
Disallow: /listings
To remove all files of a specific file type (for example, .gif), you'd use the following robots.txt entry:

User-agent: Googlebot
Disallow: /*.gif$ 
To remove dynamically generated pages, you'd use this robots.txt entry:

User-agent: Googlebot
Disallow: /*? 
Option 2: Meta tags

Another standard, which can be more convenient for page-by-page use, involves adding a <META> tag to an HTML page to tell robots not to index the page. This standard is described at http://www.robotstxt.org/wc/exclusion.html#meta.

To prevent all robots from indexing a page on your site, you'd place the following meta tag into the <HEAD> section of your page:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

To allow other robots to index the page on your site, preventing only Search Engine's robots from indexing the page, you'd use the following tag:

<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">

To allow robots to index the page on your site but instruct them not to follow outgoing links, you'd use the following tag:

<META NAME="ROBOTS" CONTENT="NOFOLLOW">

供进一步参考

https://www.elegantthemes.com/blog/tips-tricks/how-to-create-and-configure-your-robots-txt-file

Answer 2

回答我自己的老问题：

当我们想让机器人阅读某些页面时，我们需要使用我们的URL（路由），所以在这种情况下：

Allow: /www.mywebapp/searchresults

在某些情况下，我们也可以通过HTML标记禁用某些页面（添加到标题）：

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

当我们想要阻止一些文件夹，即图片时，只需：

Disallow: /public/images

codeigniter中的robots.txt - 允许查看/功能

问题描述投票：1回答：2

2个回答

最新问题

codeigniter中的robots.txt - 允许查看/功能

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2