w推测“域名解析过程中,Google crawlers中首先是Googlebo中的Google Web search上阵”。

 +-----+----------------+---------------------+-------------------------+------------------+
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 119.147.32.253 | -- :: | Unidentified User Agent | |
| | 183.57.53.197 | -- :: | Mozilla 5.0 | iOS |
| | 123.56.233.103 | -- :: | Unidentified User Agent | |
| | 112.90.142.207 | -- :: | Firefox 3.0 | Windows XP |
| | 183.232.120.37 | -- :: | Firefox 3.0 | Windows XP |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | Safari 534.30 | Android |
| | 117.136.40.218 | -- :: | Safari 534.30 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 55.0.2883.87 | Windows |
| | 177.193.53.212 | -- :: | Googlebot | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 139.162.108.53 | -- :: | Chrome 50.0.2661.102 | Windows |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 61.142.176.19 | -- :: | Firefox 3.6. | Windows |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 61.142.176.20 | -- :: | Unidentified User Agent | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 125.39.207.33 | -- :: | Unidentified User Agent | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 183.60.48.110 | -- :: | Unidentified User Agent | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 101.226.51.229 | -- :: | Chrome 45.0.2454.101 | Windows XP |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
+-----+----------------+---------------------+-------------------------+------------------+

https://support.google.com/webmasters/answer/1061943?hl=en

Google crawlers

See which robots Google uses to crawl the web

"Crawler" is a generic term for any program (such as a robot or spider) used to automatically discover and scan websites by following links from one webpage to another. Google's main crawler is called Googlebot. This table lists information about the common Google crawlers you may see in your referrer logs, and how they should be specified in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP directives.

Crawler User agent token Full user agent string (as seen in website log files)
Googlebot (Google Web search) Googlebot Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
or
(rarely used): Googlebot/2.1 (+http://www.google.com/bot.html)
Googlebot News Googlebot-News
(Googlebot)
Googlebot-News
Googlebot Images Googlebot-Image
(Googlebot)
Googlebot-Image/1.0
Googlebot Video Googlebot-Video
(Googlebot)
Googlebot-Video/1.0
Google Smartphone Googlebot

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Google Mobile AdSense Mediapartners-Google

or

Mediapartners
(Googlebot)

[various mobile device types] (compatible; Mediapartners-Google/2.1+http://www.google.com/bot.html)
Google AdSense Mediapartners-Google
Mediapartners
(Googlebot)
Mediapartners-Google
Google AdsBot landing page quality check AdsBot-Google AdsBot-Google (+http://www.google.com/adsbot.html)

Google app crawler

(Used to fetch resources for mobile apps, obeys AdsBot-Google robots rules.)

AdsBot-Google-Mobile-Apps AdsBot-Google-Mobile-Apps

robots.txt

Where several user-agents are recognized in the robots.txt file, Google will follow the most specific. If you want all of Google to be able to crawl your pages, you don't need a robots.txt file at all. If you want to block or allow all of Google's crawlers from accessing some of your content, you can do this by specifying Googlebot as the user-agent. For example, if you want all your pages to appear in Google search, and if you want AdSense ads to appear on your pages, you don't need a robots.txt file. Similarly, if you want to block some pages from Google altogether, blocking the user-agent Googlebot will also block all Google's other user-agents.

But if you want more fine-grained control, you can get more specific. For example, you might want all your pages to appear in Google Search, but you don't want images in your personal directory to be crawled. In this case, use robots.txt to disallow the user-agent Googlebot-image from crawling the files in your /personal directory (while allowing Googlebot to crawl all files), like this:

User-agent: Googlebot
Disallow: User-agent: Googlebot-Image
Disallow: /personal

To take another example, say that you want ads on all your pages, but you don't want those pages to appear in Google Search. Here, you'd block Googlebot, but allow Mediapartners-Google, like this:

User-agent: Googlebot
Disallow: / User-agent: Mediapartners-Google
Disallow:

robots meta tag

Some pages use multiple robots meta tags to specify directives for different crawlers, like this:

<meta name="robots" content="nofollow"><meta name="googlebot" content="noindex">

In this case, Google will use the sum of the negative directives, and Googlebot will follow both the noindex and nofollow directives. More detailed information about controlling how Google crawls and indexes your site.

Googlebot (Google Web search)的更多相关文章

  1. Google Web Designer – 创建引人入胜的 HTML5 网站

    Google Web Designer 可以帮助你创建引人入胜,互动的基于 HTML5 的设计和动画,可以在任何设备上运行.如果你喜欢自己动手,设计背后的所有的代码都是可以手工编辑的. 虽然可视化工具 ...

  2. Angular JS | Closure | Google Web Toolkit | Dart | Polymer 概要汇集

    AngularJS | Closure | Google Web Toolkit | Dart | Polymer GWT https://code.google.com/p/google-web-t ...

  3. Google Web Toolkit (GWT)怎么制作多个用户界面

    Google Web Toolkit即GWT是目前基于AJAX技术开发的一个比较成功的框架包,但是其附带例程中只有单页面的实例,那么应该怎么样制作多个页面呢? 其实很简单,GWT的一个模块,就是一个页 ...

  4. GWT(Google Web Tookit) Eclipse Plugin的zip下载地址(同时提供GWT Designer下载地址)

    按照Eclipse Help->Install new software->....(这里是官方安装文档:http://code.google.com/intl/zh-CN/eclipse ...

  5. Mac效率:配置Alfred web search

    // 这是一篇导入进来的旧博客,可能有时效性问题. 想用搜索引擎搜东西,或者查字典时,一般的workflow是:打开浏览器-->打开搜索引擎/字典网站-->输入搜索关键字-->回车. ...

  6. Google Web Designer打开白屏问题的解决方案

    Google Web Designer是谷歌出品的一个可视化的  HTML5  网页和广告的设计开发工具  Google Web Designer . 官网地址:https://www.google. ...

  7. google web design html5制作工具

    Google 推出 Web Designer,帮助你做 HTML 5 设计的免费本地应用,支持 Windows 和 OS X 2013年10月1日        感谢读者 SamRaper 的提醒. ...

  8. 通过Google Custom Search API 进行站内搜索

    今天突然想把博客的搜索改为google的站内搜索,印象中google adsense中好像提高这个站内搜索的代码,但苦逼的是google adsense帐号一直审核不通过,所以只能通过google c ...

  9. AdMob设计工具google web designer

    一.google web designer工具中文文档: https://support.google.com/webdesigner?hl=zh-Hans#topic=3227692 我用的版本:应 ...

随机推荐

  1. saveFileDialog对话框

    private void button1_Click(object sender, EventArgs e) { saveFileDialog1.Filter = "*.txt|*.txt| ...

  2. php if语句判定my查询是否为空

    <?php header("Content-type: text/html; charset=utf-8"); $username=$_GET['username']; $p ...

  3. GitHub 上 57 款最流行的开源深度学习项目【转】

    GitHub 上 57 款最流行的开源深度学习项目[转] 2017-02-19 20:09 334人阅读 评论(0) 收藏 举报 分类: deeplearning(28) from: https:// ...

  4. Ubuntu libpng png++安装

    http://blog.csdn.net/xiaozhun07/article/details/49865785 png使用过程问题小结: (1) libpng “png_set_longjmp_fn ...

  5. 第二百五十三节,Bootstrap项目实战-资讯

    Bootstrap项目实战-资讯 html <!DOCTYPE html> <html lang="zh-cn"> <head> <met ...

  6. 关系运算符:instanceof

    关系运算符:instanceof a instanceof Animal;(这个式子的结果是一个布尔表达式) a为对象变量,Animal是类名. 上面语句是判定a是否可以贴Animal标签.如果可以贴 ...

  7. java开发总体知识复习

    上一篇发了一个找工作的面经, 找工作不宜, 希望这一篇的内容能够帮助到大家. 对于这次跳槽找工作, 我准备了挺长的时间, 其中也收集了很多比较好的笔试面试题, 大都是一些常用的基础, 很多都是由于时间 ...

  8. 【BZOJ】1681: [Usaco2005 Mar]Checking an Alibi 不在场的证明(spfa)

    http://www.lydsy.com/JudgeOnline/problem.php?id=1681 太裸了.. #include <cstdio> #include <cstr ...

  9. Codeforces Round #265 (Div. 2)

    http://codeforces.com/contest/465 rating+7,,简直... 感人肺腑...............蒟蒻就是蒟蒻......... 被虐瞎 a:inc ARG 题 ...

  10. IPL和SPL的区别

    IPL是英文Initial Program Loader的简称,意为初始程序的装入程序,其主要功能为负责主板.电源.硬件初始化程序.并把SPL装入RAM空间中,当IPL损坏则只能更换字库解决否则只能换 ...