Parsing Document Metadata with Bing Scaping

Set up the environment - install goquery package.

https://github.com/PuerkitoBio/goquery

go get github.com/PuerkitoBio/goquery

Modify the Proxy setting if in China. Refer to: https://sum.golang.org/

Unzip an Office file and analyze the Open XML file struct. "creator", "lastModifiedBy" in core.xml and "Application", "Company", "AppVersion" in app.xml are of primary interest.

Defining the metadata Package and mapping the data to structs in GO to open, parse, and extract Office Open XML documents.

package metadata

import (
"archive/zip"
"encoding/xml"
"strings"
) // Open XML type definition and version mapping
type OfficeCoreProperty struct {
XMLName xml.Name `xml:"coreProperties"`
Creator string `xml:"creator"`
LastModifiedBy string `xml:"lastModifiedBy"`
} type OfficeAppProperty struct {
XMLName xml.Name `xml:"Properties"`
Application string `xml:"Application"`
Company string `xml:"Company"`
Version string `xml:"AppVersion"`
} var OfficeVersion = map[string]string{
"16": "2016",
"15": "2013",
"14": "2010",
"12": "2007",
"11": "2003",
} func (a *OfficeAppProperty) GetMajorVersion() string {
tokens := strings.Split(a.Version, ".") if len(tokens) < 2 {
return "Unknown"
}
v, ok := OfficeVersion[tokens[0]]
if !ok {
return "Unknown"
}
return v
} // Processing Open XML archives and embedded XML documents
func NewProperties(r *zip.Reader) (*OfficeCoreProperty, *OfficeAppProperty, error) {
var coreProps OfficeCoreProperty
var appProps OfficeAppProperty for _, f := range r.File {
switch f.Name {
case "docProps/core.xml":
if err := process(f, &coreProps); err != nil {
return nil, nil, err
}
case "docProps/app.xml":
if err := process(f, &appProps); err != nil {
return nil, nil, err
}
default:
continue
}
}
return &coreProps, &appProps, nil
} func process(f *zip.File, prop interface{}) error {
rc, err := f.Open()
if err != nil {
return err
}
defer rc.Close() if err := xml.NewDecoder(rc).Decode(&prop); err != nil {
return err
} return nil
}

Figure out how to search for and retrieve files by using Bing.

1. Submit a search request to Bing with proper filters to retrieve targeted results.

2. Scrape the HTML response, extracting the HRER(link) data to obtain direct URLs for documents.

3. Submit an HTTP request for each direct document URL.

4. Parse the response body to create a zip.Reader.

5. Pass the zip.Reader into the code you already developed to extract metadata.

Analyze the search result elements in Bing.

Now scrap Bing results and parse the document metadata.

package metadata

import (
"archive/zip"
"encoding/xml"
"strings"
) // Open XML type definition and version mapping
type OfficeCoreProperty struct {
XMLName xml.Name `xml:"coreProperties"`
Creator string `xml:"creator"`
LastModifiedBy string `xml:"lastModifiedBy"`
} type OfficeAppProperty struct {
XMLName xml.Name `xml:"Properties"`
Application string `xml:"Application"`
Company string `xml:"Company"`
Version string `xml:"AppVersion"`
} var OfficeVersion = map[string]string{
"16": "2016",
"15": "2013",
"14": "2010",
"12": "2007",
"11": "2003",
} func (a *OfficeAppProperty) GetMajorVersion() string {
tokens := strings.Split(a.Version, ".") if len(tokens) < 2 {
return "Unknown"
}
v, ok := OfficeVersion[tokens[0]]
if !ok {
return "Unknown"
}
return v
} // Processing Open XML archives and embedded XML documents
func NewProperties(r *zip.Reader) (*OfficeCoreProperty, *OfficeAppProperty, error) {
var coreProps OfficeCoreProperty
var appProps OfficeAppProperty for _, f := range r.File {
switch f.Name {
case "docProps/core.xml":
if err := process(f, &coreProps); err != nil {
return nil, nil, err
}
case "docProps/app.xml":
if err := process(f, &appProps); err != nil {
return nil, nil, err
}
default:
continue
}
}
return &coreProps, &appProps, nil
} func process(f *zip.File, prop interface{}) error {
rc, err := f.Open()
if err != nil {
return err
}
defer rc.Close() if err := xml.NewDecoder(rc).Decode(&prop); err != nil {
return err
} return nil
}

Go Pentester - HTTP CLIENTS(5)的更多相关文章

  1. Go Pentester - HTTP CLIENTS(1)

    Building HTTP Clients that interact with a variety of security tools and resources. Basic Preparatio ...

  2. Go Pentester - HTTP CLIENTS(4)

    Interacting with Metasploit msf.go package rpc import ( "bytes" "fmt" "gopk ...

  3. Go Pentester - HTTP CLIENTS(3)

    Interacting with Metasploit Early-stage Preparation: Setting up your environment - start the Metaspl ...

  4. Go Pentester - HTTP CLIENTS(2)

    Building an HTTP Client That Interacts with Shodan Shadon(URL:https://www.shodan.io/)  is the world' ...

  5. Creating a radius based VPN with support for Windows clients

    This article discusses setting up up an integrated IPSec/L2TP VPN using Radius and integrating it wi ...

  6. Deploying JRE (Native Plug-in) for Windows Clients in Oracle E-Business Suite Release 12 (文档 ID 393931.1)

    In This Document Section 1: Overview Section 2: Pre-Upgrade Steps Section 3: Upgrade and Configurati ...

  7. ZK 使用Clients.response

    参考: http://stackoverflow.com/questions/11416386/how-to-access-au-response-sent-from-server-side-at-c ...

  8. MySQL之aborted connections和aborted clients

    影响Aborted_clients 值的可能是客户端连接异常关闭,或wait_timeout值过小. 最近线上遇到一个问题,接口日志发现有很多超时报错,根据日志定位到数据库实例之后发现一切正常,一般来 ...

  9. 【渗透测试学习平台】 web for pentester -2.SQL注入

    Example 1 字符类型的注入,无过滤 http://192.168.91.139/sqli/example1.php?name=root http://192.168.91.139/sqli/e ...

随机推荐

  1. 力扣:二叉树着色游戏(DFS详解)

    有两位极客玩家参与了一场「二叉树着色」的游戏.游戏中,给出二叉树的根节点 root,树上总共有 n 个节点,且 n 为奇数,其中每个节点上的值从 1 到 n 各不相同. 游戏从「一号」玩家开始(「一号 ...

  2. cb07a_c++_迭代器和迭代器的范围

    cb07a_c++_迭代器和迭代器的范围c++primer第4版https://www.cnblogs.com/txwtech/p/12309989.html--每一种容器都有自己的迭代器--所有的迭 ...

  3. 全网最全fiddler使用教程和fiddler如何抓包(fiddler手机抓包)-笔者亲测

    一.前言 抓包工具有很多,比如常用的抓包工具Httpwatch,通用的强大的抓包工具Wireshark.为什么使用fiddler?原因如下:1.Wireshark是通用的抓包工具,但是比较庞大,对于只 ...

  4. 12个Python游戏中的龙穴探险,快速掌握基础,其实很简单

    越来越多的人学习python编程,但更多的人,拿着教程却不知道该怎么学. 今天我给大家举一个例子,是我自己学习python时,用到的方法.     首先,我是一名普通的程序员,相对于十几年开发经验的程 ...

  5. C#数据结构与算法系列(十二):递归(Recursion)

    1.介绍 简单的说:递归就是方法自己调用自己,每次调用时传入不同的变量,递归有助于编程者解决复杂的问题,同时也让代码变得整洁 2.规则 执行一个方法时,就创建一个新的受保护的独立空间(栈空间) 方法的 ...

  6. elasticserach数据库深度分页查询的原理

    深度分页存在的问题 https://segmentfault.com/a/1190000019004316?utm_source=tag-newest 在实际应用中,分页是必不可少的,例如,前端页面展 ...

  7. @Inherited 注解的作用

    @Inherited 用于放在注解上,例如 @Inherited @Documented @Target(ElementType.TYPE) public @interface InheritedAn ...

  8. viewerjs 在html打开图片或打开pdf文件使用案例

    开发者常用到在线访问pdf,txt,浏览图片的插件,这里推荐viewer.js这个插件,简单好用.它的核心亮点就是查看图片和pdf功能.老早以前就用过的,昨天一个小伙伴问我Android开发在线浏览p ...

  9. Python3-设计模式-装饰器模式

    装饰器模式 动态的给原有对象添加一些额外的职责,面向切面编程(AOP),多用于和主业务无关,但又必须的业务,如:登录认证.加锁.权限检查等 Python代码实现示例 需求点: 1.在old_func( ...

  10. 02【熟悉】springboot和微服务的介绍

    1,springboot简介 Spring Boot 是由 Pivotal 团队提供的全新框架,其设计目的是用来简化新 Spring 应用的初始搭建以及开发过程. 该框架使用了特定的方式来进行配置,从 ...