w推测“域名解析过程中,Google crawlers中首先是Googlebo中的Google Web search上阵”。

 +-----+----------------+---------------------+-------------------------+------------------+
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 119.147.32.253 | -- :: | Unidentified User Agent | |
| | 183.57.53.197 | -- :: | Mozilla 5.0 | iOS |
| | 123.56.233.103 | -- :: | Unidentified User Agent | |
| | 112.90.142.207 | -- :: | Firefox 3.0 | Windows XP |
| | 183.232.120.37 | -- :: | Firefox 3.0 | Windows XP |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | ZTE | Android |
| | 117.136.40.218 | -- :: | Safari 534.30 | Android |
| | 117.136.40.218 | -- :: | Safari 534.30 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 37.0.0.0 | Android |
| | 117.136.40.218 | -- :: | Chrome 55.0.2883.87 | Windows |
| | 177.193.53.212 | -- :: | Googlebot | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 139.162.108.53 | -- :: | Chrome 50.0.2661.102 | Windows |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 61.142.176.19 | -- :: | Firefox 3.6. | Windows |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 61.142.176.20 | -- :: | Unidentified User Agent | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 23.251.63.45 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 125.39.207.33 | -- :: | Unidentified User Agent | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 183.60.48.110 | -- :: | Unidentified User Agent | Unknown Platform |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 101.226.51.229 | -- :: | Chrome 45.0.2454.101 | Windows XP |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
| | 111.251.93.170 | -- :: | Unidentified User Agent | |
+-----+----------------+---------------------+-------------------------+------------------+

https://support.google.com/webmasters/answer/1061943?hl=en

Google crawlers

See which robots Google uses to crawl the web

"Crawler" is a generic term for any program (such as a robot or spider) used to automatically discover and scan websites by following links from one webpage to another. Google's main crawler is called Googlebot. This table lists information about the common Google crawlers you may see in your referrer logs, and how they should be specified in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP directives.

Crawler User agent token Full user agent string (as seen in website log files)
Googlebot (Google Web search) Googlebot Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
or
(rarely used): Googlebot/2.1 (+http://www.google.com/bot.html)
Googlebot News Googlebot-News
(Googlebot)
Googlebot-News
Googlebot Images Googlebot-Image
(Googlebot)
Googlebot-Image/1.0
Googlebot Video Googlebot-Video
(Googlebot)
Googlebot-Video/1.0
Google Smartphone Googlebot

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Google Mobile AdSense Mediapartners-Google

or

Mediapartners
(Googlebot)

[various mobile device types] (compatible; Mediapartners-Google/2.1+http://www.google.com/bot.html)
Google AdSense Mediapartners-Google
Mediapartners
(Googlebot)
Mediapartners-Google
Google AdsBot landing page quality check AdsBot-Google AdsBot-Google (+http://www.google.com/adsbot.html)

Google app crawler

(Used to fetch resources for mobile apps, obeys AdsBot-Google robots rules.)

AdsBot-Google-Mobile-Apps AdsBot-Google-Mobile-Apps

robots.txt

Where several user-agents are recognized in the robots.txt file, Google will follow the most specific. If you want all of Google to be able to crawl your pages, you don't need a robots.txt file at all. If you want to block or allow all of Google's crawlers from accessing some of your content, you can do this by specifying Googlebot as the user-agent. For example, if you want all your pages to appear in Google search, and if you want AdSense ads to appear on your pages, you don't need a robots.txt file. Similarly, if you want to block some pages from Google altogether, blocking the user-agent Googlebot will also block all Google's other user-agents.

But if you want more fine-grained control, you can get more specific. For example, you might want all your pages to appear in Google Search, but you don't want images in your personal directory to be crawled. In this case, use robots.txt to disallow the user-agent Googlebot-image from crawling the files in your /personal directory (while allowing Googlebot to crawl all files), like this:

User-agent: Googlebot
Disallow: User-agent: Googlebot-Image
Disallow: /personal

To take another example, say that you want ads on all your pages, but you don't want those pages to appear in Google Search. Here, you'd block Googlebot, but allow Mediapartners-Google, like this:

User-agent: Googlebot
Disallow: / User-agent: Mediapartners-Google
Disallow:

robots meta tag

Some pages use multiple robots meta tags to specify directives for different crawlers, like this:

<meta name="robots" content="nofollow"><meta name="googlebot" content="noindex">

In this case, Google will use the sum of the negative directives, and Googlebot will follow both the noindex and nofollow directives. More detailed information about controlling how Google crawls and indexes your site.

Googlebot (Google Web search)的更多相关文章

  1. Google Web Designer – 创建引人入胜的 HTML5 网站

    Google Web Designer 可以帮助你创建引人入胜,互动的基于 HTML5 的设计和动画,可以在任何设备上运行.如果你喜欢自己动手,设计背后的所有的代码都是可以手工编辑的. 虽然可视化工具 ...

  2. Angular JS | Closure | Google Web Toolkit | Dart | Polymer 概要汇集

    AngularJS | Closure | Google Web Toolkit | Dart | Polymer GWT https://code.google.com/p/google-web-t ...

  3. Google Web Toolkit (GWT)怎么制作多个用户界面

    Google Web Toolkit即GWT是目前基于AJAX技术开发的一个比较成功的框架包,但是其附带例程中只有单页面的实例,那么应该怎么样制作多个页面呢? 其实很简单,GWT的一个模块,就是一个页 ...

  4. GWT(Google Web Tookit) Eclipse Plugin的zip下载地址(同时提供GWT Designer下载地址)

    按照Eclipse Help->Install new software->....(这里是官方安装文档:http://code.google.com/intl/zh-CN/eclipse ...

  5. Mac效率:配置Alfred web search

    // 这是一篇导入进来的旧博客,可能有时效性问题. 想用搜索引擎搜东西,或者查字典时,一般的workflow是:打开浏览器-->打开搜索引擎/字典网站-->输入搜索关键字-->回车. ...

  6. Google Web Designer打开白屏问题的解决方案

    Google Web Designer是谷歌出品的一个可视化的  HTML5  网页和广告的设计开发工具  Google Web Designer . 官网地址:https://www.google. ...

  7. google web design html5制作工具

    Google 推出 Web Designer,帮助你做 HTML 5 设计的免费本地应用,支持 Windows 和 OS X 2013年10月1日        感谢读者 SamRaper 的提醒. ...

  8. 通过Google Custom Search API 进行站内搜索

    今天突然想把博客的搜索改为google的站内搜索,印象中google adsense中好像提高这个站内搜索的代码,但苦逼的是google adsense帐号一直审核不通过,所以只能通过google c ...

  9. AdMob设计工具google web designer

    一.google web designer工具中文文档: https://support.google.com/webdesigner?hl=zh-Hans#topic=3227692 我用的版本:应 ...

随机推荐

  1. 【手把手教你全文检索】Apache Lucene初探 (zhuan)

    http://www.cnblogs.com/xing901022/p/3933675.html *************************************************** ...

  2. Python解析xml文件遇到的编码解析的问题

    使用python对xml文件进行解析的时候,假设xml文件的头文件是utf-8格式的编码,那么解析是ok的,但假设是其它格式将会出现例如以下异常: xml.parsers.expat.ExpatErr ...

  3. [gpio]Linux GPIO简单使用方式1-sysfs

    转自:http://blog.csdn.net/drivermonkey/article/details/20132241 1.1.References 1.2.GPIO Usage from a L ...

  4. [内核]procfs和sysfs

    转自:https://www.ibm.com/developerworks/cn/linux/l-cn-sysfs/ 使用 sys 文件系统访问 Linux 内核 sysfs 的历史其与 proc 的 ...

  5. vmware无法打开内核设备“\\.\Global\vmx86”: 系统找不到指定的文件

    原因: 是虚拟机服务没有开启 解决方法:(以管理员的方式运行) 点击“开始→运行”,在运行框中输入 CMD  回车打开命令提示符,然后依次执行以下命令. net start vmcinet start ...

  6. JS调用asp.net后台方法:PageMethods

    先帮朋友宣传一下程序人生(http://www.manong123.com)的网站,里面都是开发感悟,开发人员创业,支持一下吧~ 原来是通过PageMethods来实现的. 举个列子: Default ...

  7. C++ 构造函数的对象初始化列表

    //构造函数的对象初始化列表 #define _CRT_SECURE_NO_WARNINGS #include<iostream> using namespace std; class P ...

  8. Oracel 数据库面试题

    1.取出表中第31到40行的记录mysql方案: , oracle方案: select t2.* ) t2 2.truncate和delete有什么区别TRUNCATE TABLE在功能上与不带WHE ...

  9. Appium移动自动化测试(一)--工具软件安装

    Appium移动自动化测试(一)--工具软件安装 详情参考-- http://www.cnblogs.com/fnng/p/4552438.html 第一节  安装node.js Appium 官方网 ...

  10. Laravel5.1 搭建博客 --构建标签

    博客的每篇文章都是需要有标签的,它与文章也是多对多的关系 这篇笔记也是记录了实现标签的步骤逻辑. 在我们之前的笔记中创建了Tag的控制器和路由了 所以这篇笔记不在重复 1 创建模型与迁移文件 迁移文件 ...