Html Agility Pack - API
Parser
Selectors
Manipulation
Traversing
Writer
Utilities
Attributes

HTML Parser

HTML Parser allow you to parse HTML and return an HtmlDocument.

Html Parser
Name Description
From File Loads an HTML document from a file.
From String Loads the HTML document from the specified string.
From Web Gets an HTML document from an Internet resource.
From Browser Gets an HTML document from a WebBrowser.

Load Html From String

HtmlDocument.LoadHtml method loads the HTML document from the specified string.

Example

The following example loads an Html from the specified string.

var html = @"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> "; var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html); var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body"); Console.WriteLine(htmlBody.OuterHtml);

HTML Selectors

Selectors allow you to select HTML node from HtmlDocument.

Methods
Name Description
SelectNodes() Selects a list of nodes matching the XPath expression.
SelectSingleNode(String) Selects the first XmlNode that matches the XPath expression.

HTML SelectSingleNode

SelectSingleNode Method

Selects first HtmlNode matching the HtmlAgilityPack.HtmlNode.XPath expression.

Parameters:

xpath: The XPath expression. May not be null.

Returns:

The first HtmlAgilityPack.HtmlNode that matches the XPath query or a null reference if no matching node was found.

Examples

The following example selects the first node matching the XPath expression using SelectNodes method.

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html); string name = htmlDoc.DocumentNode
.SelectSingleNode("//td/input")
.Attributes["value"].Value;

///如果用child.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText这样的方式查询,是永远以整个document为基准来查询,
///这点就不好,理应以当前child节点的html为基准才对。 Write(sw, String.Format("推荐:{0}", hn.SelectSingleNode("//*[@class=\"diggnum\"]").InnerText));
Write(sw, String.Format("标题:{0}", hn.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText));
Write(sw, String.Format("介绍:{0}", hn.SelectSingleNode("//*[@class=\"post_item_summary\"]").InnerText));
Write(sw, String.Format("信息:{0}", hn.SelectSingleNode("//*[@class=\"post_item_foot\"]").InnerText));

HTML Manipulation

Traversing allow you to traverse through HTML node.

Properties
Name Description
InnerHtml Gets or Sets the HTML between the start and end tags of the object.
InnerText Gets the text between the start and end tags of the object.
OuterHtml Gets the object and its content in HTML.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
AppendChild() Adds the specified node to the end of the list of children of this node.
AppendChildren() Adds the specified node to the end of the list of children of this node.
Clone() Creates a duplicate of the node
CloneNode(Boolean) Creates a duplicate of the node.
CloneNode(String) Creates a duplicate of the node and changes its name at the same time.
CloneNode(String, Boolean) Creates a duplicate of the node and changes its name at the same time.
CopyFrom(HtmlNode) Creates a duplicate of the node and the subtree under it.
CopyFrom(HtmlNode, Boolean) Creates a duplicate of the node.
CreateNode() Creates an HTML node from a string representing literal HTML.
InsertAfter() Inserts the specified node immediately after the specified reference node.
InsertBefore Inserts the specified node immediately before the specified reference node.
PrependChild Adds the specified node to the beginning of the list of children of this node.
PrependChildren Adds the specified node list to the beginning of the list of children of this node.
Remove Removes node from parent collection
RemoveAll Removes all the children and/or attributes of the current node.
RemoveAllChildren Removes all the children of the current node.
RemoveChild(HtmlNode) Removes the specified child node.
RemoveChild(HtmlNode, Boolean) Removes the specified child node.
ReplaceChild() Replaces the child node oldChild with newChild node.

HTML Traversing

Traversing allow you to traverse through HTML node.

Properties
Name Description
ChildNodes Gets all the children of the node.
FirstChild Gets the first child of the node.
LastChild Gets the last child of the node.
NextSibling Gets the HTML node immediately following this element.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
Ancestors() Gets all the ancestor of the node.
Ancestors(String) Gets ancestors with matching name.
AncestorsAndSelf() Gets all anscestor nodes and the current node.
AncestorsAndSelf(String) Gets all anscestor nodes and the current node with matching name.
DescendantNodes Gets all Descendant nodes for this node and each of child nodes
DescendantNodesAndSelf Returns a collection of all descendant nodes of this element, in document order
Descendants() Gets all Descendant nodes in enumerated list
Descendants(String) Get all descendant nodes with matching name
DescendantsAndSelf() Returns a collection of all descendant nodes of this element, in document order
DescendantsAndSelf(String) Gets all descendant nodes including this node
Element Gets first generation child node matching name
Elements Gets matching first generation child nodes matching name

HTML Writer

Save HtmlDocument && Write HtmlNode

HtmlDocument - Methods
Name Description
Save(Stream) Saves the HTML document to the specified stream.
Save(StreamWriter) Saves the HTML document to the specified StreamWriter.
Save(TextWriter) Saves the HTML document to the specified TextWriter.
Save(String) Saves the mixed document to the specified file.
Save(XmlWriter) Saves the HTML document to the specified XmlWriter.
Save(Stream, Encoding) Saves the HTML document to the specified stream.
Save(String, Encoding) Saves the mixed document to the specified file.
HtmlNode - Methods
Name Description
WriteContentTo() Saves all the children of the node to a string.
WriteContentTo(TextWriter) Saves all the children of the node to the specified TextWriter.
WriteTo() Saves the current node to a string.
WriteTo(TextWriter) Saves the current node to the specified TextWriter.
WriteTo(XmlWriter) Saves the current node to the specified XmlWriter.

HTML Utilities

HtmlDocument Utilities

HtmlDocument Methods
Name Description
DetectEncoding(Stream) Detects the encoding of an HTML stream.
DetectEncoding(TextReader) Detects the encoding of an HTML text provided on a TextReader.
DetectEncoding(String) Detects the encoding of an HTML file.
DetectEncodingAndLoad(String) Detects the encoding of an HTML document from a file first, and then loads the file.
DetectEncodingAndLoad(String, Boolean) Detects the encoding of an HTML document from a file first, and then loads the file.

HTML Attributes

Traversing allow you to traverse through HTML node.

Methods
Name Description
Add(HtmlAttribute) Adds supplied item to collection
Add(String, String) Adds a new attribute to the collection with the given values
Append(String) Creates and inserts a new attribute as the last attribute in the collection.
Append(HtmlAttribute) Inserts the specified attribute as the last attribute in the collection.
Append(String, string) Creates and inserts a new attribute as the last attribute in the collection.
Remove() Removes all attributes from the collection
Remove(String) Removes an attribute from the list, using its name. If there are more than one attributes with this name, they will all be removed.
Remove(HtmlAttribute) Removes a given attribute from the list.
RemoveAll() Remove all attributes in the list.
RemoveAt() Removes the attribute at the specified index.
SetAttributeValue() Helper method to set the value of an attribute of this node. If the attribute is not found, it will be created automatically.

Html Agility Pack - API的更多相关文章

  1. C# 网络爬虫利器之Html Agility Pack如何快速实现解析Html

    简介 现在越来越多的场景需要我们使用网络爬虫,抓取相关数据便于我们使用,今天我们要讲的主角Html Agility Pack是在爬取的过程当中,能够高效的解析我们抓取到的html数据. 优势 在.NE ...

  2. Html Agility Pack 解析Html

    Hello 好久不见 哈哈,今天给大家分享一个解析Html的类库 Html Agility Pack.这个适用于想获取某网页里面的部分内容.今天就拿我的Csdn的博客列表来举例. 打开页面  用Fir ...

  3. 开源项目Html Agility Pack实现快速解析Html

    这是个很好的的东西,以前做Html解析都是在用htmlparser,用的虽然顺手,但解析速度较慢,碰巧今天找到了这个,就拿过来试,一切出乎意料,非常爽,推荐给各位使用. 下面是一些简单的使用技巧,希望 ...

  4. Html Agility Pack基础类介绍及运用

    第一篇只对Html Agility Pack做了一个大概的介绍,在接下来的章节会比较深入的介绍Html Agility Pack. Html Agility Pack 源码中的类大概有28个左右,其实 ...

  5. HTML WEB 和HTML Agility Pack结合

    现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分析筛选的过程.比如,有的比较购物网站,会同时去抓取不同购物网 ...

  6. 一款很不错的html转xml工具-Html Agility Pack

    之前发个一篇关于实现html转成xml的劣作<实现html转Xml>,受到不少网友的关心.该实现方法是借助htmlparser去分解html内容,然后按照dom的结构逐个生成xml字符串. ...

  7. Html Agility Pack解析HTML页

    文章来源:Html Agility Pack解析HTML页 现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分 ...

  8. C#解析HTML利器-Html Agility Pack

    今天刚开始做毕设....好吧,的确有点晚.我的毕设设计需要爬取豆瓣的电影推荐,于是就需要解析爬取下来的html,之前用Python玩过解析,但目前我使用的是C#,我觉得C#不比python差,有微软大 ...

  9. 强大而灵活的的Html解析器——Html Agility Pack

    一.概述 Html Agility Pack 简称HAP,是一个强大而灵活的解析Html DOM的.Net类库. 二.官方链接 官网:http://html-agility-pack.net/ NuG ...

随机推荐

  1. Shell bc命令进行数学运算

    通常情况做简单的运算,很多命令里面都是支持的.比如for, awk等. #!/bin/bash num= #for循环这里的数字也是运算 #也可以使用 #也可以使用数组 ;i<=;++i)) d ...

  2. 一个简易Asp.net网站日志系统

    前不久在网站上看到了网站日志访问记录组件UserVisitLogsHelp开源了! 这篇博客感觉还不错,就把源码download了下来,学习一下,发现里面的代码书写和设计并不是很好,于是自己改了改.自 ...

  3. Valid Number leetcode java

    题目: Validate if a given string is numeric. Some examples: "0" => true " 0.1 " ...

  4. Combination Sum II leetcode java

    题目: Given a collection of candidate numbers (C) and a target number (T), find all unique combination ...

  5. PL/SQL Developer执行.sql文件的几种方法

    1.复制SQL 第一种方法非常常见,也非常简单,先用文本编辑器打开.sql文件,然后把sql复制到PL/SQL Developer的SQL窗口或者命令窗口中运行就行了,本来我也是这么做的,但是我将SQ ...

  6. IntelIj IDEA运行JUnit Test OutOfMemoryError

    好久没看到OutOfMemoryError这种错误了,今天跑测试的时候发现总是报错.针对IDEA需要修改几个配置. JUnit Test在运行前,IDEA会build整个项目,这个是喜欢eclipse ...

  7. 微信小程序阿里云服务器https搭建

    已更新 2018-11-20 1.什么是https?HTTPS(全称:安全套接字层上的超文本传输​​协议),是以安全为目标的HTTP通道,简单讲是HTTP的安全版.即HTTP下加入SSL层,HTTPS ...

  8. 利用shell脚本批量提交网站404死链给百度

    网站运营人员对于死链这个概念一定不陌生,网站的一些数据删除或页面改版等都容易制造死链,影响用户体验不说,过多的死链还会影响到网站的整体权重或排名. 百度站长平台提供的死链提交工具,可将网站存在的死链( ...

  9. 如何屏蔽ctrl + v 粘贴事件,鼠标右键粘贴事件

    通常在自己的APP里的密码框,验证码框需要屏蔽复制,粘贴,怎么办呢? 有三种方法: 1 hook 此方法是最完全的,但由于hook是全局的,容易影响到其它代码. 2 子类化文本框, 重写OnPaste ...

  10. IIS中的application总是报404错误

      在IIS的一个站点下面建立了一个application,访问其中页面的时候总是报404(找不到页面)的错误,哪怕是最简单只包含一个简单html页面的application也是如此,而其他同级的ap ...