Html Agility Pack - API
Parser
Selectors
Manipulation
Traversing
Writer
Utilities
Attributes

HTML Parser

HTML Parser allow you to parse HTML and return an HtmlDocument.

Html Parser
Name Description
From File Loads an HTML document from a file.
From String Loads the HTML document from the specified string.
From Web Gets an HTML document from an Internet resource.
From Browser Gets an HTML document from a WebBrowser.

Load Html From String

HtmlDocument.LoadHtml method loads the HTML document from the specified string.

Example

The following example loads an Html from the specified string.

var html = @"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> "; var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html); var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body"); Console.WriteLine(htmlBody.OuterHtml);

HTML Selectors

Selectors allow you to select HTML node from HtmlDocument.

Methods
Name Description
SelectNodes() Selects a list of nodes matching the XPath expression.
SelectSingleNode(String) Selects the first XmlNode that matches the XPath expression.

HTML SelectSingleNode

SelectSingleNode Method

Selects first HtmlNode matching the HtmlAgilityPack.HtmlNode.XPath expression.

Parameters:

xpath: The XPath expression. May not be null.

Returns:

The first HtmlAgilityPack.HtmlNode that matches the XPath query or a null reference if no matching node was found.

Examples

The following example selects the first node matching the XPath expression using SelectNodes method.

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html); string name = htmlDoc.DocumentNode
.SelectSingleNode("//td/input")
.Attributes["value"].Value;

///如果用child.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText这样的方式查询,是永远以整个document为基准来查询,
///这点就不好,理应以当前child节点的html为基准才对。 Write(sw, String.Format("推荐:{0}", hn.SelectSingleNode("//*[@class=\"diggnum\"]").InnerText));
Write(sw, String.Format("标题:{0}", hn.SelectSingleNode("//*[@class=\"titlelnk\"]").InnerText));
Write(sw, String.Format("介绍:{0}", hn.SelectSingleNode("//*[@class=\"post_item_summary\"]").InnerText));
Write(sw, String.Format("信息:{0}", hn.SelectSingleNode("//*[@class=\"post_item_foot\"]").InnerText));

HTML Manipulation

Traversing allow you to traverse through HTML node.

Properties
Name Description
InnerHtml Gets or Sets the HTML between the start and end tags of the object.
InnerText Gets the text between the start and end tags of the object.
OuterHtml Gets the object and its content in HTML.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
AppendChild() Adds the specified node to the end of the list of children of this node.
AppendChildren() Adds the specified node to the end of the list of children of this node.
Clone() Creates a duplicate of the node
CloneNode(Boolean) Creates a duplicate of the node.
CloneNode(String) Creates a duplicate of the node and changes its name at the same time.
CloneNode(String, Boolean) Creates a duplicate of the node and changes its name at the same time.
CopyFrom(HtmlNode) Creates a duplicate of the node and the subtree under it.
CopyFrom(HtmlNode, Boolean) Creates a duplicate of the node.
CreateNode() Creates an HTML node from a string representing literal HTML.
InsertAfter() Inserts the specified node immediately after the specified reference node.
InsertBefore Inserts the specified node immediately before the specified reference node.
PrependChild Adds the specified node to the beginning of the list of children of this node.
PrependChildren Adds the specified node list to the beginning of the list of children of this node.
Remove Removes node from parent collection
RemoveAll Removes all the children and/or attributes of the current node.
RemoveAllChildren Removes all the children of the current node.
RemoveChild(HtmlNode) Removes the specified child node.
RemoveChild(HtmlNode, Boolean) Removes the specified child node.
ReplaceChild() Replaces the child node oldChild with newChild node.

HTML Traversing

Traversing allow you to traverse through HTML node.

Properties
Name Description
ChildNodes Gets all the children of the node.
FirstChild Gets the first child of the node.
LastChild Gets the last child of the node.
NextSibling Gets the HTML node immediately following this element.
ParentNode Gets the parent of this node (for nodes that can have parents).
Methods
Name Description
Ancestors() Gets all the ancestor of the node.
Ancestors(String) Gets ancestors with matching name.
AncestorsAndSelf() Gets all anscestor nodes and the current node.
AncestorsAndSelf(String) Gets all anscestor nodes and the current node with matching name.
DescendantNodes Gets all Descendant nodes for this node and each of child nodes
DescendantNodesAndSelf Returns a collection of all descendant nodes of this element, in document order
Descendants() Gets all Descendant nodes in enumerated list
Descendants(String) Get all descendant nodes with matching name
DescendantsAndSelf() Returns a collection of all descendant nodes of this element, in document order
DescendantsAndSelf(String) Gets all descendant nodes including this node
Element Gets first generation child node matching name
Elements Gets matching first generation child nodes matching name

HTML Writer

Save HtmlDocument && Write HtmlNode

HtmlDocument - Methods
Name Description
Save(Stream) Saves the HTML document to the specified stream.
Save(StreamWriter) Saves the HTML document to the specified StreamWriter.
Save(TextWriter) Saves the HTML document to the specified TextWriter.
Save(String) Saves the mixed document to the specified file.
Save(XmlWriter) Saves the HTML document to the specified XmlWriter.
Save(Stream, Encoding) Saves the HTML document to the specified stream.
Save(String, Encoding) Saves the mixed document to the specified file.
HtmlNode - Methods
Name Description
WriteContentTo() Saves all the children of the node to a string.
WriteContentTo(TextWriter) Saves all the children of the node to the specified TextWriter.
WriteTo() Saves the current node to a string.
WriteTo(TextWriter) Saves the current node to the specified TextWriter.
WriteTo(XmlWriter) Saves the current node to the specified XmlWriter.

HTML Utilities

HtmlDocument Utilities

HtmlDocument Methods
Name Description
DetectEncoding(Stream) Detects the encoding of an HTML stream.
DetectEncoding(TextReader) Detects the encoding of an HTML text provided on a TextReader.
DetectEncoding(String) Detects the encoding of an HTML file.
DetectEncodingAndLoad(String) Detects the encoding of an HTML document from a file first, and then loads the file.
DetectEncodingAndLoad(String, Boolean) Detects the encoding of an HTML document from a file first, and then loads the file.

HTML Attributes

Traversing allow you to traverse through HTML node.

Methods
Name Description
Add(HtmlAttribute) Adds supplied item to collection
Add(String, String) Adds a new attribute to the collection with the given values
Append(String) Creates and inserts a new attribute as the last attribute in the collection.
Append(HtmlAttribute) Inserts the specified attribute as the last attribute in the collection.
Append(String, string) Creates and inserts a new attribute as the last attribute in the collection.
Remove() Removes all attributes from the collection
Remove(String) Removes an attribute from the list, using its name. If there are more than one attributes with this name, they will all be removed.
Remove(HtmlAttribute) Removes a given attribute from the list.
RemoveAll() Remove all attributes in the list.
RemoveAt() Removes the attribute at the specified index.
SetAttributeValue() Helper method to set the value of an attribute of this node. If the attribute is not found, it will be created automatically.

Html Agility Pack - API的更多相关文章

  1. C# 网络爬虫利器之Html Agility Pack如何快速实现解析Html

    简介 现在越来越多的场景需要我们使用网络爬虫,抓取相关数据便于我们使用,今天我们要讲的主角Html Agility Pack是在爬取的过程当中,能够高效的解析我们抓取到的html数据. 优势 在.NE ...

  2. Html Agility Pack 解析Html

    Hello 好久不见 哈哈,今天给大家分享一个解析Html的类库 Html Agility Pack.这个适用于想获取某网页里面的部分内容.今天就拿我的Csdn的博客列表来举例. 打开页面  用Fir ...

  3. 开源项目Html Agility Pack实现快速解析Html

    这是个很好的的东西,以前做Html解析都是在用htmlparser,用的虽然顺手,但解析速度较慢,碰巧今天找到了这个,就拿过来试,一切出乎意料,非常爽,推荐给各位使用. 下面是一些简单的使用技巧,希望 ...

  4. Html Agility Pack基础类介绍及运用

    第一篇只对Html Agility Pack做了一个大概的介绍,在接下来的章节会比较深入的介绍Html Agility Pack. Html Agility Pack 源码中的类大概有28个左右,其实 ...

  5. HTML WEB 和HTML Agility Pack结合

    现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分析筛选的过程.比如,有的比较购物网站,会同时去抓取不同购物网 ...

  6. 一款很不错的html转xml工具-Html Agility Pack

    之前发个一篇关于实现html转成xml的劣作<实现html转Xml>,受到不少网友的关心.该实现方法是借助htmlparser去分解html内容,然后按照dom的结构逐个生成xml字符串. ...

  7. Html Agility Pack解析HTML页

    文章来源:Html Agility Pack解析HTML页 现在,在不少应用场合中都希望做到数据抓取,特别是基于网页部分的抓取.其实网页抓取的过程实际上是通过编程的方法,去抓取不同网站网页后,再进行分 ...

  8. C#解析HTML利器-Html Agility Pack

    今天刚开始做毕设....好吧,的确有点晚.我的毕设设计需要爬取豆瓣的电影推荐,于是就需要解析爬取下来的html,之前用Python玩过解析,但目前我使用的是C#,我觉得C#不比python差,有微软大 ...

  9. 强大而灵活的的Html解析器——Html Agility Pack

    一.概述 Html Agility Pack 简称HAP,是一个强大而灵活的解析Html DOM的.Net类库. 二.官方链接 官网:http://html-agility-pack.net/ NuG ...

随机推荐

  1. C# Newtonsoft.Json解析数组的小例子[转]

    https://blog.csdn.net/Sayesan/article/details/79756738 C# Newtonsoft.Json解析数组的小例子  http://www.cnblog ...

  2. [leetcode]Word Ladder II @ Python

    [leetcode]Word Ladder II @ Python 原题地址:http://oj.leetcode.com/problems/word-ladder-ii/ 参考文献:http://b ...

  3. Linq的延迟加载问题

    什么是延迟加载:所谓延迟加载就是当在真正需要数据的时候,才真正执行数据加载操作.可以简单理解为,只有在使用的时候,才会发出sql语句进行查询,数据是分N次读取. 什么是立即加载:所谓立即加载既是所有的 ...

  4. java判断一个字符串是否包含某个字符

    一.contains方法 1:描述 java.lang.String.contains() 方法返回true,当且仅当此字符串包含指定的char值序列 2:声明 public boolean cont ...

  5. Arduino教程:MPU6050的数据获取、分析与处理

    Arduino教程:MPU6050的数据获取.分析与处理 转载 摘要 MPU6050是一种非常流行的空间运动传感器芯片,可以获取器件当前的三个加速度分量和三个旋转角速度.由于其体积小巧,功能强大,精度 ...

  6. 使用 GNU Libtool 创建库

    这篇文档向大家介绍 GNU Libtool 的用途及基本使用方法,同时描述如何结合 GNU Autoconf 和 Automake 来使用 Libtool. 3 评论: 吴 小虎, 程序员, 天用唯勤 ...

  7. How to center body on a page?

      [提问] I'm trying to center the body element on my HTML page. Basically, in the CSS I set the body e ...

  8. ivr

    /************************************************************* 北京高阳圣思园信息技术有限公司IVR业务: 流程说明:公司介绍子流程 发布 ...

  9. javascript常用的方法(二)

    //判断页面加载完毕 document.onreadystatechange = function () { if (document.readyState == "complete&quo ...

  10. asp.net 字符串反序列化

    //在大多数情况下,dynamic 类型与 object 类型的行为是一样的. //Dynamic 在通过 dynamic 类型实现的操作中,该类型的作用是绕过编译时类型检查, 改为在运行时解析这些操 ...