google-crawlers 相关问题

“Crawler”是用于通过跟踪从一个网页到另一个网页的链接自动发现和扫描网站的任何程序（例如机器人或蜘蛛）的通用术语。 Google的主要抓取工具称为Googlebot。

我希望页面中的某些内容可被抓取，但不应被索引

问题：一个像 https://websiteurl/person/{person} 这样的页面，其中有一些与该人相关的内容（包括图像、描述），并且在该页面内有一个部分调用...

indexing web-crawler seo google-crawlers google-index

回答 1 投票 0

如何获取与特定模式匹配的网站的所有现有 URL 的列表？

我正在尝试分析具有特定路径的网站的所有现有 URL。为了通过示例进行演示，URL 模式如下： https://www.example.com/users/john 我正在尝试...

web-crawler google-crawlers

回答 1 投票 0

如何在搜索引擎爬虫的页面加载之前执行javasript函数？

我的任务是从 API 中获取页面的标题。这些数据必须可供网络爬虫抓取。这就是我到目前为止所做的。我的任务是从 API 中获取页面的标题。这些数据必须可供网络爬虫抓取。这就是我到目前为止所做的。 <!DOCTYPE html> <html lang="en"> <head> <meta name="description" content="Test description" /> <title>Test title</title> <script> document.addEventListener('DOMContentLoaded', function () { const endPoint = 'endpoint.com' fetch(apiEndpoint).then(function (response) { return response.json(); }) .then(function (data) { // change title with javascript logic }).catch(function() { // fallback title }) }) </script> </head> <body></body> </html> 您不需要延迟脚本的运行，因为它已经在创建标题标签之后了。 Google 和其他一些搜索引擎确实会渲染页面，并且只要不花太长时间就会看到您的更改。我发现 5 秒差不多是极限了。

javascript web-crawler seo google-crawlers

回答 1 投票 0

仅禁止主页 ( / ) 并允许 robots.txt 的所有其他页面

我需要阻止 Google 网络爬虫仅爬行我的主页，位于 / 但我需要允许抓取所有其他页面。我怎样才能做到这一点？我尝试这样做：用户代理： * 迪萨洛...

html web-crawler seo google-search google-crawlers

回答 1 投票 0

如何阻止搜索引擎正确索引WP页面上的哈希链接

我注意到我博客内容表中的所有哈希链接都已在谷歌上建立了索引，我不希望这样.. 我不希望这样的链接在谷歌上建立索引： example.com/blog/post/#

wordpress https seo search-engine google-crawlers

回答 1 投票 0

无法安装Stormcrawler错误，连接拒绝端口7071

我正在 Ubuntu 上安装 Stormcrawler，一切正常，但无法注入 seeds.txt 文件。当我使用此命令运行注入器“java -cp target/crawler-1.0-SNAPSHOT.jarcrawlerc...

java web-crawler google-crawlers stormcrawler

回答 1 投票 0

Linkedin 后检查器 - 重定向 400 错误

我真的需要你的帮助，我有一个使用 Mern Stack 开发的网站，我希望能够与 linkedin 共享。我确实处理了 facebook 和 Whatsapp 爬虫的 ssr，它们都......

cors seo server-side-rendering mern google-crawlers

回答 1 投票 0

使用 javascript/google place 将评论动态加载到评论模式中（用于丰富的摘要）

我第一次将公司的评论硬编码到页面中，它通过使用以下代码在一天内被索引（用于评论丰富的片段）： {...</desc> <question vote="0"> <p>我第一次将我公司的评论硬编码到页面中，它通过使用以下代码在一天内被索引（用于评论丰富的片段）：</p> <p></p><div data-babel="false" data-lang="js" data-hide="false" data-console="true"> <div> <pre><code><script type="application/ld+json"> { "@context": "https://schema.org/", "@type": "CreativeWorkSeries", "name": "Review", "aggregateRating":{ "ratingValue": "0", "bestRating": "10", "worstRating": "1", "ratingCount": "0" } } </script></code></pre> </div> </div> <p></p> <p>我制作了一个脚本，基本上加载了我公司两个地点的评论。当我在开发工具元素中检查我的页面时，看起来一切正常，数字和评级都正确加载，但谷歌搜索结果中没有结果。这是我用来加载数据的完整代码：</p> <p></p><div data-babel="false" data-lang="js" data-hide="false" data-console="true"> <div> <pre><code><script> document.addEventListener('DOMContentLoaded', function() { var service = new google.maps.places.PlacesService(document.createElement('div')); let places = ['placeid1', 'placeid2'] var totalReviews = 0; var totalRating = 0; let numberOfPlaces = 0; getRecensies() async function getRecensies(){ let counter = 0; await new Promise(function(resolve) { places.forEach((place)=>{ service.getDetails({ placeId: place, },function(result, status) { console.log("result", result) if(result && result.rating && result.user_ratings_total){ totalRating = totalRating + result.rating; totalReviews = totalReviews + result.user_ratings_total; numberOfPlaces++ } counter++ if(counter==places.length){ resolve() } }); }) }) //totalRating = (totalRating / numberOfPlaces); // Get average rating var schemaElement = document.getElementById('review-schema-home'); var schema = JSON.parse(schemaElement.textContent); schema.aggregateRating.ratingValue = totalRating.toString(); schema.aggregateRating.ratingCount = totalReviews.toString(); schemaElement.textContent = JSON.stringify(schema, null, 2); } }); </script> <script type="application/ld+json" id="review-schema-home"> { "@context": "https://schema.org/", "@type": "CreativeWorkSeries", "name": "Review", "aggregateRating":{ "ratingValue": "0", "bestRating": "10", "worstRating": "1", "ratingCount": "0" } } </script></code></pre> </div> </div> <p></p> <p>下一个屏幕截图来自我的检查员，您可以看到数据正确加载。我首先制作了脚本，因此我不必手动更新我的评论量。看来我这周必须硬编码了。</p> <p>我很好奇谷歌蜘蛛是否可以爬行并执行这样的脚本？或者其他解释为什么评论片段没有出现？</p> <p><a href="https://i.stack.imgur.com/Heqmv.png" target="_blank"><img src="https://cdn.txt58.com/i/AWkuc3RhY2suaW1ndXIuY29tL0hlcW12LnBuZw==" alt=""/></a></p> </question> <answer tick="false" vote="0"> <p>首先，虽然 Google 爬虫<a href="https://ipullrank.com/javascript-seo-how-google-crawls-and-indexes-javascript-websites" rel="nofollow noreferrer">可以读取 JavaScript</a> 并且还可以“稍等”一下，直到页面上的 JS 加载，但我不会依赖上面显示的脚本。</p> <p>无法保证 <pre><code>getRecensies</code></pre> 会在 Googlebot 完成解析您的页面时解析。</p> <p>最重要的是，您的脚本将在<strong>每次页面加载时运行</strong>，这可能会影响您的 Google Cloud 账单。此外，如果用于将 API 加载到页面上的 API 密钥未受到保护，不良行为者可能会滥用它来运行自己的 Places API 查询！</p> <p>总的来说，这不是一个好主意。</p> <hr/> <p>现在，来自 Google 自己的<a href="https://developers.google.com/search/docs/appearance/structured-data/review-snippet#troubleshooting" rel="nofollow noreferrer">故障排除网站</a>：</p> <blockquote> <p>Google <strong>不保证</strong>使用结构化数据的功能将显示在搜索结果中。有关 Google 可能无法以丰富的结果显示您的内容的常见原因列表，请参阅<a href="https://developers.google.com/search/docs/appearance/structured-data/sd-policies" rel="nofollow noreferrer">一般结构化数据指南</a>。</p> </blockquote> <p>只要标记结构正确，硬编码并定期更新评论片段就完全没问题。您可以使用<a href="https://developers.google.com/search/docs/appearance/structured-data" rel="nofollow noreferrer">这两个工具</a>测试您的标记。</p> <p>Googlebot 使用完您的标记后，您将在 <a href="https://search.google.com/search-console/about" rel="nofollow noreferrer">GSC</a> 中看到类似以下内容： <a href="https://i.stack.imgur.com/06Nud.png" target="_blank"><img src="https://cdn.txt58.com/i/AWkuc3RhY2suaW1ndXIuY29tLzA2TnVkLnBuZw==" alt=""/></a></p> <p>但是，据我所知，这仍然<strong>并不能保证</strong>增强功能将出现在 SERP 中。</p> </answer> </body></html>

javascript google-maps google-crawlers rich-snippets

回答 0 投票 0

搜索控制台“页面未编入索引”显示我网站的竞争对手子域

我在“没有用户选择的规范的情况下重复”下发现了这个索引问题。从今年 10 月 21 日开始创建了 3 个新 URL。在您查看 URL 之前，这似乎是无害的。我还没去过...

subdomain google-search-console google-crawlers

回答 1 投票 0

通过带有身份验证令牌的链接无需密码即可登录

为了让用户体验更方便，我们实现了一项功能，允许用户在单击通过电子邮件收到的链接时自动登录。过去他必须

authentication meteor web google-crawlers

回答 1 投票 0

Google 索引 IP 地址（SiteGround 专用 IP）尝试了之前针对 htaccess 的重定向推荐，但似乎没有解决问题

Google 搜索已对我网站的 IP 地址建立了索引，并且总体上似乎在索引方面遇到了困难（它被 robots.txt 指令阻止了一点）。我研究了之前的解决方案并尝试了...