Search Engine Hacking – Manual and Automation
Search Engine Hacking – Manual and Automation
Skillset
Practice for certification success with the Skillset library of over 100,000 practice test questions. We analyze your responses and can determine when you are ready to sit for the test.
Introduction:
We are all aware of Google/Yahoo/Bing Search engines; they need no introduction. We use them every now and then to solve our day-to-day queries.
Google and other search engines use automated programs called spiders or crawlers. Also, these search engines have a large index of keywords, and where those words can be found. Powerful crawling and indexing features make these search engines not only powerful but also opens doors for hackers to use for identifying vulnerable targets over the internet. This is called Search Engine Hacking.
Search Engine Hacking involves using advanced operator-based searching to identify exploitable targets and sensitive data using the search engines.
In this article, we learn to use various Google search operators to identify vulnerable targets over the Internet and also check out a new tool that can be used to automate this process.
Special Search Characters:
Google search engine provides its users with various special search characters for advanced searching. See a partial list below:
- Quotes [“search query”]: Quotes are used to search for specific phrase or set of words.
E.g. The query [“The monk who sold his Ferrari”] will search for the specific phrase —The monk who sold his Ferrari.
- Minus Sign [-]: The minus sign tells Google search engine to exclude the word that follows the minus operator.
E.g. [-red apple] will display the search results which will exclude the word red.
- Tilde operator [~]: Adding a tilde operator in front of a word will search for results containing that word as well as even more synonyms.
E.g. [~jokes] will display search results which will include the word jokes as well as its synonyms like funny, humor, etc.
- OR operator or vertical bar [|]: Using OR (in uppercase) or the vertical bar with two or more keywords, tells Google to search for pages that contain either of the words.
E.g. [Android OR Apple] will display search results containing either of the words.
- Asterisk operator [*]: The asterisk is a computer symbol for a wildcard, which allows the search engine, such as Google, to fill in that space with any text string. You can also use it within double quotes for more precise searches.
E.g. The query [“today is * day”] will display search results like “today is a good day” or “today is mother’s day”, etc.
Basic Searching Techniques:
Google search engine provides various operators to customize our search results.
The basic syntax of a Google advanced operator is
operator:search_term
The list below provides some of the key operators useful in creating search queries to retrieve valuable information from the web.
- Intitle operator:
The query [intitle:keyword] in the search engine will return pages containing the keyword in the title.
E.g. 1: The query [intitle:Google] will return all the web pages containing Google in the title.
E.g. 2: Google Hacking using intitle operator
Using the query [intitle:”Index of”] will return all the web pages containing “Index of” in the title. This can be used to identify if Directory Listing (Directory Listing displays a list of the directory contents) is enabled on the web server.
- Site operator:
The query [site:www.site.com] narrows a search to a particular site, domain or sub-domain.
E.g. 1: The query [news site:yahoo.com] will search for the keyword “news” on the site and the sub-domains of Yahoo.com.
E.g. 2: Google Hacking – Information gathering on sub domains
The query [site:yahoo.com] will display search results containing all the sub-domains of yahoo.com. This operator is useful for gathering information on the sub-domains of a specific target site.
- Inurl operator:
The query inurl:keyword in the search engine will return pages containing the keyword in the URL.
E.g. 1 – The query [inurl:contactus site:www.MySite.com] will search for pages on MySite in the URL containing the word “contactus”.
E.g. 2 – Google Hacking – Looking for Admin Portals
The query [inurl:admin.php] will search for all the websites that might have admin login pages. These pages attract the hackers and they might brute force the login page to gain access to the admin interface.
- Cache operator:
Google keeps the snapshot of the pages it has crawled. The query [cache:keyword] in the search engine displays Google’s cached version of the page.
E.g. – The query [cache:www.yahoo.com] will display cached pages of the website Yahoo.com. The above directive can be useful in gathering information from the previously cached pages.
Another very useful website that can be used to obtain the cached pages is http://archive.org/
This websites stores a snapshot of the websites in a calendar format, and can be used to view the pages of any previous date. The screenshot below displays a cached page of Yahoo.com dated 9 Feb 2010.
Click to Enlarge
- Filetype operator:
The query [filetype:file extension] searches for pages that end in a particular file extension. Google can search for many different types of files like pdf, doc, image, rtf, ppt, xls, etc.
E.g. The query [filetype:pdf site:yahoo.com] will return all the links to pdf files found on Yahoo.com.
Google Hacking through keyword search
Let’s look at some of the keyword searches and the operators that can be used to build search queries to carry out Google Hacking.
- Digging Google for Configuration Files:
Configuration files are used to configure the initial settings for some computer programs. An attacker having access to the configuration file can get a complete understanding of the program deployed.
For e.g. a Google query like [filetype:ini inurl:ws_ftp.ini] would retrieve the configuration file used by the WS_FTP client program as shown in the screenshot below:
- Digging Google for Log Files:
The web servers log information like IP address, timestamps, HTTP request, usernames and password in to the log files. These log files are usually stored with the extension .log on the server side and may be accessible over the internet due to inadequate protection.
For e.g. a Google query like [filetype:log cron.log] would retrieve the UNIX cron log as shown in the screenshot below:
Click to Enlarge
- Digging Google for database leakage information from web applications:
Google Hackers search Google for pieces of database information leaked from vulnerable servers. This information can be used to identify a vulnerable target and launch a more sophisticated attack against the target.
For example, a Google query like [filetype:inc intext:mysql_connect
] will retrieve the .inc file that contains the mysql user credentials and other functions details that are used to connect to the database.
- Digging Google for leakage of information though error messages:
Information leakage through error messages are very much useful for information gathering and launching further attacks on the websites. If the application does not have exception/error handling mechanisms, it might leak sensitive details in the error messages like database details, error stack trace details, etc.
E.g. a Google query like [intitle:”Apache Tomcat” “Error Report”] will display search results containing the Apache Tomcat error messages.
We discussed a brief on the directives that can be used to carry out search engine hacking. Manually trying out each of these directives can be a cumbersome task. To automate the process of search engine hacking and retrieving juicy information, we make use of automated tools.
Automated tools available for Google Hacking:
- Gooscan – Gooscan is a tool that automates queries against Google search appliances, but with a twist. These particular queries are designed to find potential vulnerabilities on web pages.
Ref:
http://www.securitytube-tools.net/index.php@title=Gooscan.html - Sitedigger – SiteDigger searches Google’s cache to look for vulnerabilities, errors, configuration issues, proprietary information, and interesting security nuggets on web sites.
Ref:
http://www.mcafee.com/in/downloads/free-tools/sitedigger.aspx - Wikto – This is a multipurpose tool developed by Sensepost which can be used for automating Google Hacking.
The above tools provide are useful for Google Hacking. However, let’s look at a new tool called Search Diggity, which provides a graphical user interface and is useful in retrieving lot information from both Bing as well as Google search engine.
Search Diggity:
It is Stach & Liu’s MS Windows GUI application that serves as a front-end to the most recent versions of the Diggity tools:
- GoogleDiggity
- BingDiggity
- Bing LinkFromDomainDiggity
- CodeSearchDiggity, DLPDiggity
- FlashDiggity
- MalwareDiggity
- PortScanDiggity
- SHODANDiggity
- BingBinaryMalwareSearch
- NotInMyBackYard Diggity
More information on these modules can be found here: Ref:
http://www.stachliu.com/resources/tools/google-hacking-diggity-project/attack-tools/
Let’s explore a few of the above key modules of interest to learn about the art of search engine hacking.
GoogleDiggity:
The Google Diggity tool automates the Google Hacking process. It queries the search engine using the Google JSON/ATOM Custom Search API to identify vulnerabilities and information disclosures.
The Google Search engine uses a bot detection technique. As a result querying Google using automated tools for Google hacking. This is overcome with the use of Google JSON/ATOM Custom Search API, which uses an API key. A user can register for an API key against a valid Gmail account and get a free 100 requests/day. Additional queries are available at a cost (Google charges $5 per 1000 queries).
The tool provides a well-structured interface that allows the user to:
- Select the search queries from the list
- Feed the API key
- Specify the target site/domain/IP address
- Scan button to kick of the scan, etc.
Bing Diggity:
Similar to GoogleDiggity, Bing Diggity is a Bing search engine hacking tool. It utilizes the Bing 2.0 API (The Bing 2.0 API allows 1000 results per query) and the Stach & Liu’s newly developed Bing Hacking Database (BHDB) to find vulnerabilities and sensitive information disclosures related to your organization that are exposed via Microsoft’s Bing search engine.
The tool provides a well-structured interface that allows the user to:
- Select the search queries specific to Bing search engine from the list
- Feed the API key
- Specify the target site/domain/IP address
- Scan button to kick of the scan, etc.
DLPDiggity:
DLPDiggity is a data loss prevention tool that leverages Google/Bing to identify exposures of sensitive info (e.g. SSNs, credit card numbers, etc.) via common document formats such as .doc, .xls, and .pdf. First, GoogleDiggity and BingDiggity are used to locate and download files belonging to target domains/sites on the Internet. Then, DLPDiggity is used to analyze those downloaded files for sensitive information disclosures.
DLPDiggity utilizes IFilters
(An IFilter is a plugin that allows the Windows Indexing Service and the newer Windows Desktop Search to index different file formats so that they become searchable) to search through the actual contents of files, as opposed to just the meta-data. Using .NET regular expressions, DLPDiggity can find almost any type of sensitive data within common document file formats.
Over the last few years, there has been a tremendous increase in the volume of office documents that have been indexed and made searchable by Google and Bing. DLPDiggity taps into that in order to find documents containing sensitive information.
The tool provides a well-structured interface that allows the user to:
- Select the DLPDiggity search queries from the list that can be used to dig Google/Bing search engine for querying for documents.
- Select the regular expressions that will be used to search through the documents in the target directory for data leaks of sensitive information such as SSN, credit card numbers
- Search button to analyze through the documents
FlashDiggity:
FlashDiggity automates Google searching/downloading/decompiling/analysis of SWF files to identify Flash vulnerabilities and information disclosures.
FlashDiggity first leverages the GoogleDiggity tool in order to identify Adobe Flash SWF applications for target domains via Google searches, such as ext:swf. Next, the tool is used to download all of the SWF files in bulk for analysis. The SWF files are disassembled back to their original ActionScript source code, and then analyzed for code-based vulnerabilities.
The tool provides a well-structured interface that allows the user to:
- Select the FlashDiggity search queries from the list that can be used to dig Google search engine for querying for documents
- Select the regular expressions that will be used to search through the ActionScript of decompiled SWF Flash files for code-based vulnerabilities and information disclosures.
- Search button to decompile and analyze the SWF files
Search Engine Hacking – Manual and Automation的更多相关文章
- [DataMining]WEEK1 - text-retrieval and search engine
What does a computer have to do in order to understand a natural language sentence? What is ambiguit ...
- [Search Engine] 搜索引擎分类和基础架构概述
大家一定不会多搜索引擎感到陌生,搜索引擎是互联网发展的最直接的产物,它可以帮助我们从海量的互联网资料中找到我们查询的内容,也是我们日常学习.工作和娱乐不可或缺的查询工具.之前本人也是经常使用Googl ...
- [CareerCup] 10.7 Simplified Search Engine 简单的搜索引擎
10.7 Imagine a web server for a simplified search engine. This system has 100 machines to respond to ...
- 开源搜索 Iveely Search Engine 0.6.0 发布 -- 黎明前的娇嫩
快两年了,Iveely Search Engine已经走过了5个版本的岁月,虽出生“贫寒”,没有任何开源基金会的支持,没有优秀的“干爹.干妈”,它凭着它的爱好者的支持,0.6.0终于破壳而出,7年前, ...
- 101+ Manual and Automation Software Testing Interview Questions and Answers
101+ Manual and Automation Software Testing Interview Questions and Answers http://www.softwaretesti ...
- [0.0]Analysis of Baidu search engine
Rencently, my two teammates and I is doing a project, a simplified Chinese search engine for childre ...
- irefox 34的"Manage Search Engine"去哪了
博客搬到了fresky.github.io - Dawei XU,请各位看官挪步.最新的一篇是:irefox 34的"Manage Search Engine"去哪了.
- Iveely Search Engine 0.4.0 的发布
千呼万唤始出来,Iveely Search Engine 0.4.0 的发布 经过无数个夜晚的奋战,以及无数个夜晚的失眠,Iveely Search Engine 0.4.0 终于熬出来了,这其中 ...
- python JSON API duckduckgo search engine 使用duckduckgo API 尝试搜索引擎
The duckduckgo.com's search engine is very neat to use. Acutally it has many things to do with other ...
随机推荐
- arduino相关文献阅读
首推这个 https://wenku.baidu.com/view/e657b1f0bcd126fff6050baf.html 用Arduino IDE开发程序流程 当程序编写好之后,关闭前需要将文件 ...
- Bootstrap 学习笔记 项目实战 首页内容介绍 下
最终效果: HTML代码 <!DOCTYPE html> <html lang="zh-cn"> <head> <meta charset ...
- python-backports.ssl-match-hostname 安装问题
转载请标明本文链接:(https://www.cnblogs.com/softwarecb/p/python-backports-ssl.html) 系统版本Ubuntu14.04 因为要用Conta ...
- spring-data-redis的使用/redis缓存
1.导入依赖 <properties> <junit.version>4.12</junit.version> <spring.version>4.2. ...
- 5105 pa1 MapReduce
Programming Assignment 1: A simple MapReduce-like compute framework Yuanli Wang wang8662 ...
- es6 promise 结束回调地狱
promise的三种状态: pending---进行中 fulfiled---执行成功 rejected---执行失败 var promise = new Promise(function(resol ...
- JVM Heap Memory和Native Memory
JVM管理的内存可以总体划分为两部分:Heap Memory和Native Memory.前者我们比较熟悉,是供Java应用程序使用的:后者也称为C-Heap,是供JVM自身进程使用的.Heap Me ...
- Dubbo源码学习总结系列三 dubbo-cluster集群模块
Dubbo集群模块的目的是将集群Invokers构造一个透明的Invoker对象,其中包含了容错机制.负载均衡.目录服务(服务地址集合).路由机制等,为RPC层提供高可用.高并发.自动发现.可治理的S ...
- vue的请求数据方式
一,vue-resource请求数据 介绍:vue-resource请求数据方式是官方提供的一个插件 步骤: 1,npm安装 npm install vue-resource --save ...
- union 横向组合
select sum(zs) zs,sum(zl) zl,sum(ts) ts,sum(lxcbw) lxcbw,sum(bz) bz,sum(sfzqt) sfzqtfrom (select cou ...