w推测“域名解析过程中,Google crawlers中首先是Googlebo中的Google Web search上阵”。

 +-----+----------------+---------------------+-------------------------+------------------+
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 119.147.32.253 | -- :: | Unidentified User Agent | |
| | 183.57.53.197 | -- :: | Mozilla 5.0 | iOS |
| | 123.56.233.103 | -- :: | Unidentified User Agent | |
| | 112.90.142.207 | -- :: | Firefox 3.0 | Windows XP |
| | 183.232.120.37 | -- :: | Firefox 3.0 | Windows XP |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | Safari 534.30 | Android |
| | 117.136.40.218 | -- :: | Safari 534.30 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 55.0.2883.87 | Windows |
| | 177.193.53.212 | -- :: | Googlebot | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 139.162.108.53 | -- :: | Chrome 50.0.2661.102 | Windows |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 61.142.176.19 | -- :: | Firefox 3.6. | Windows |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 61.142.176.20 | -- :: | Unidentified User Agent | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 125.39.207.33 | -- :: | Unidentified User Agent | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 183.60.48.110 | -- :: | Unidentified User Agent | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 101.226.51.229 | -- :: | Chrome 45.0.2454.101 | Windows XP |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
+-----+----------------+---------------------+-------------------------+------------------+

https://support.google.com/webmasters/answer/1061943?hl=en

Google crawlers

See which robots Google uses to crawl the web

"Crawler" is a generic term for any program (such as a robot or spider) used to automatically discover and scan websites by following links from one webpage to another. Google's main crawler is called Googlebot. This table lists information about the common Google crawlers you may see in your referrer logs, and how they should be specified in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP directives.

Crawler User agent token Full user agent string (as seen in website log files)
Googlebot (Google Web search) Googlebot Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
or
(rarely used): Googlebot/2.1 (+http://www.google.com/bot.html)
Googlebot News Googlebot-News
(Googlebot)
Googlebot-News
Googlebot Images Googlebot-Image
(Googlebot)
Googlebot-Image/1.0
Googlebot Video Googlebot-Video
(Googlebot)
Googlebot-Video/1.0
Google Smartphone Googlebot

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Google Mobile AdSense Mediapartners-Google

or

Mediapartners
(Googlebot)

[various mobile device types] (compatible; Mediapartners-Google/2.1+http://www.google.com/bot.html)
Google AdSense Mediapartners-Google
Mediapartners
(Googlebot)
Mediapartners-Google
Google AdsBot landing page quality check AdsBot-Google AdsBot-Google (+http://www.google.com/adsbot.html)

Google app crawler

(Used to fetch resources for mobile apps, obeys AdsBot-Google robots rules.)

AdsBot-Google-Mobile-Apps AdsBot-Google-Mobile-Apps

robots.txt

Where several user-agents are recognized in the robots.txt file, Google will follow the most specific. If you want all of Google to be able to crawl your pages, you don't need a robots.txt file at all. If you want to block or allow all of Google's crawlers from accessing some of your content, you can do this by specifying Googlebot as the user-agent. For example, if you want all your pages to appear in Google search, and if you want AdSense ads to appear on your pages, you don't need a robots.txt file. Similarly, if you want to block some pages from Google altogether, blocking the user-agent Googlebot will also block all Google's other user-agents.

But if you want more fine-grained control, you can get more specific. For example, you might want all your pages to appear in Google Search, but you don't want images in your personal directory to be crawled. In this case, use robots.txt to disallow the user-agent Googlebot-image from crawling the files in your /personal directory (while allowing Googlebot to crawl all files), like this:

User-agent: Googlebot
Disallow: User-agent: Googlebot-Image
Disallow: /personal

To take another example, say that you want ads on all your pages, but you don't want those pages to appear in Google Search. Here, you'd block Googlebot, but allow Mediapartners-Google, like this:

User-agent: Googlebot
Disallow: / User-agent: Mediapartners-Google
Disallow:

robots meta tag

Some pages use multiple robots meta tags to specify directives for different crawlers, like this:

<meta name="robots" content="nofollow"><meta name="googlebot" content="noindex">

In this case, Google will use the sum of the negative directives, and Googlebot will follow both the noindex and nofollow directives. More detailed information about controlling how Google crawls and indexes your site.

Googlebot (Google Web search)的更多相关文章

  1. Google Web Designer – 创建引人入胜的 HTML5 网站

    Google Web Designer 可以帮助你创建引人入胜,互动的基于 HTML5 的设计和动画,可以在任何设备上运行.如果你喜欢自己动手,设计背后的所有的代码都是可以手工编辑的. 虽然可视化工具 ...

  2. Angular JS | Closure | Google Web Toolkit | Dart | Polymer 概要汇集

    AngularJS | Closure | Google Web Toolkit | Dart | Polymer GWT https://code.google.com/p/google-web-t ...

  3. Google Web Toolkit (GWT)怎么制作多个用户界面

    Google Web Toolkit即GWT是目前基于AJAX技术开发的一个比较成功的框架包,但是其附带例程中只有单页面的实例,那么应该怎么样制作多个页面呢? 其实很简单,GWT的一个模块,就是一个页 ...

  4. GWT(Google Web Tookit) Eclipse Plugin的zip下载地址(同时提供GWT Designer下载地址)

    按照Eclipse Help->Install new software->....(这里是官方安装文档:http://code.google.com/intl/zh-CN/eclipse ...

  5. Mac效率:配置Alfred web search

    // 这是一篇导入进来的旧博客,可能有时效性问题. 想用搜索引擎搜东西,或者查字典时,一般的workflow是:打开浏览器-->打开搜索引擎/字典网站-->输入搜索关键字-->回车. ...

  6. Google Web Designer打开白屏问题的解决方案

    Google Web Designer是谷歌出品的一个可视化的  HTML5  网页和广告的设计开发工具  Google Web Designer . 官网地址:https://www.google. ...

  7. google web design html5制作工具

    Google 推出 Web Designer,帮助你做 HTML 5 设计的免费本地应用,支持 Windows 和 OS X 2013年10月1日        感谢读者 SamRaper 的提醒. ...

  8. 通过Google Custom Search API 进行站内搜索

    今天突然想把博客的搜索改为google的站内搜索,印象中google adsense中好像提高这个站内搜索的代码,但苦逼的是google adsense帐号一直审核不通过,所以只能通过google c ...

  9. AdMob设计工具google web designer

    一.google web designer工具中文文档: https://support.google.com/webdesigner?hl=zh-Hans#topic=3227692 我用的版本:应 ...

随机推荐

  1. xcode下build release版本号的.a库

    1. 点击房子 图标button watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcnlmZGl6dW8=/font/5a6L5L2T/fontsize/40 ...

  2. C#实现麦克风採集与播放

    在网络聊天系统中.採集麦克风的声音并将其播放出来.是最基础的模块之中的一个.本文我们就介绍怎样高速地实现这个基础模块. 一. 基础知识 有几个与声音採集和播放相关的专业术语必需要先了解一下,否则.后面 ...

  3. 每日英语:Rethinking How We Watch TV

    To understand how much television could soon change, it helps to visit an Intel Corp. division here ...

  4. eclipse配置 嵌入式-基于linux

    Eclipse可以安装在各种操作系统.这里是安装到Ubuntu 10.10上.有两种方法实现安装,一是采用Ubuntu的软件源:二是从官方下载后解压. 1.  通过Ubuntu软件源安装 $ sudo ...

  5. MySQL 找回密码

    Windows: 1.关闭正在运行的MySQL. 2.打开DOS窗口,转到mysql\bin目录. 3.输入mysqld --skip-grant-tables回车.如果没有出现提示信息,那就对了. ...

  6. 纯css3实现的鼠标悬停动画按钮

    今天给大家带来一款纯css3实现的鼠标悬停动画按钮.这款按钮鼠标经过前以正方形的形式,当鼠标经过的时候以动画的形式变成圆形.效果图如下: 在线预览   源码下载 实现的代码. html代码: < ...

  7. [driver]简单地hello驱动加载

    转自:http://blog.chinaunix.net/uid-24264134-id-98061.html Linux设备驱动会以内核模块的方式出现,因此,内核模块也成了我们编写驱动的入门知识,这 ...

  8. hdu6143 Killer Names 容斥+排列组合

    /** 题目:hdu6143 Killer Names 链接:http://acm.hdu.edu.cn/showproblem.php?pid=6143 题意:有m种字符(可以不用完),组成两个长度 ...

  9. Libgdx多线程与渲染线程

    http://www.leestorm.com/post/115.html ——————————————————————————————————————————————————————————‘ 大部 ...

  10. HeadFisrt 设计模式03 装饰者

    类应该对扩展开放, 对修改关闭. 所谓装饰者模式, 是指用其他的类来装饰某个类, 装饰者说白了就是使用 has-a 来代替 is-a 隐喻 咖啡店, 有很多种咖啡, 咖啡里还要增加一些 milk, 面 ...