【python】lxml

lxml是python中处理xml的一个非常强大的库，可以非常方便的解析和生成xml文件。下面的内容翻译了链接中的一部分

1.生成空xml节点

from lxml import etree

root = etree.Element("root")

print(etree.tostring(root, pretty_print=True))

<root/>

2.生成xml子节点

from lxml import etree

root = etree.Element("root")

root.append(etree.Element("child1"))     #方法一

child2 = etree.SubElement(root, "child2")  #方法二

child2 = etree.SubElement(root, "child3")

print(etree.tostring(root))

<root>

  <child1/>

  <child2/>

  <child3/>

</root>

3.生成带内容的xml节点

from lxml import etree

root = etree.Element("root")

root.text = "Hello World"

print(etree.tostring(root, pretty_print=True))

<root>Hello World</root>

4.属性

lxml中将属性以字典的形式存储

生成属性

from lxml import etree

root = etree.Element("root", intersting = "totally")  #方法一

root.set("hello","huhu")  #方法二

root.text = "Hello World"

print(etree.tostring(root))

<root intersting="totally" hello="huhu">Hello World</root>

获取属性

方法一：

root.get("interesting")

root.get("hello")

totally

huhu

方法二：

attributes = root.attrib

print(attributes["interesting"])

遍历属性

for name, value in sorted(root.items()):

     print('%s = %r' % (name, value))

5.生成特殊内容

如下xml，中间的文字被 分割，需要用到.tail

<html><body>Hello<br/>World</body></html>

html = etree.Element("html")

body = etree.SubElement(html, "body")

body.text = "TEXT"

br = etree.SubElement(body, "br")

br.tail = "TAIL"

etree.tostring(html)

6.遍历

遍历节点

for element in root.iter():

     print("%s - %s" % (element.tag, element.text))

遍历指定子节点，将子节点名写入iter()

for element in root.iter("child"):

     print("%s - %s" % (element.tag, element.text))

7.用XPath查找节点内容

build_text_list = etree.XPath("//text()") # lxml.etree only!

print(build_text_list(html))

8.查找节点

iterfind():遍历所有节点匹配表达式

findall():返回满足匹配的节点列表

find():返回满足匹配的第一个

findtext():返回第一个满足匹配条件的.text内容

设有以下xml内容

root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>")

查找子节点

>>> print(root.find("b"))

None

>>> print(root.find("a").tag)

a

查找树中任意节点

>>> print(root.find(".//b").tag)

b

>>> [ b.tag for b in root.iterfind(".//b") ]

['b', 'b']

查找具有指定属性的节点

>>> print(root.findall(".//a[@x]")[0].tag)

a

>>> print(root.findall(".//a[@y]"))

[]

9.字符串解析为XML

>>> some_xml_data = "<root>data</root>"

>>> root = etree.fromstring(some_xml_data)

>>> print(root.tag)

root

>>> etree.tostring(root)

b'<root>data</root>'

10.使用E-factory快速生成XML和HTML

>>> from lxml.builder import E

>>> def CLASS(*args): # class is a reserved word in Python

        return {"class":' '.join(args)}

>>> html = page = (

    E.html(       # create an Element called "html"

      E.head(

        E.title("This is a sample document")

      ),

      E.body(

        E.h1("Hello!", CLASS("title")),

        E.p("This is a paragraph with ", E.b("bold"), " text in it!"),

        E.p("This is another paragraph, with a", "\n      ",

          E.a("link", href="http://www.python.org"), "."),

        E.p("Here are some reservered characters: <spam&egg>."),

        etree.XML("<p>And finally an embedded XHTML fragment.</p>"),

      )

    )

  )

>>> print(etree.tostring(page, pretty_print=True))

<html>

  <head>

    <title>This is a sample document</title>

  </head>

  <body>

    <h1 class="title">Hello!</h1>

    <p>This is a paragraph with <b>bold</b> text in it!</p>

    <p>This is another paragraph, with a

      <a href="http://www.python.org">link</a>.</p>

    <p>Here are some reservered characters: &lt;spam&amp;egg&gt;.</p>

    <p>And finally an embedded XHTML fragment.</p>

  </body>

</html>

【python】lxml的更多相关文章

【python】lxml中多个xml采用相同节点时出现的问题
今天突然发现了一个lxml的坑. 假设我们有一个节点 <id>123</id> 有两个父节点都要用上述节点,则必须把上面的节点写两遍!用同一个会出错! 出错例子: #!/usr ...
【python】lxml查找属性为指定值的节点
假设有如下xml在/home/abc.xml位置 <A> <B id=" name= ...
【python】lxml处理命名空间
有如下xml <A xmlns="http://This/is/a/namespace"> dataB1 dat ...
【python】自动更新pu口袋校园活动
[python]自动更新pu口袋校园活动脚本目标: 1. 自动爬取pu口袋校园活动,筛选出需要的活动,此处我的筛选条件是线上活动,因为可以不用去就可以白嫖学时 2. 自动发送邮件到QQ邮箱,每次只发 ...
【Python②】python之首秀
第一个python程序再次说明:后面所有代码均为Python 3.3.2版本(运行环境:Windows7)编写. 安装配置好python后,我们先来写第一个python程序.打开IDLE (P ...
【python】多进程锁multiprocess.Lock
[python]多进程锁multiprocess.Lock 2013-09-13 13:48 11613人阅读评论(2) 收藏举报分类: Python(38) 同步的方法基本与多线程相同. ...
【python】SQLAlchemy
来源:廖雪峰对比:[python]在python中调用mysql 注意连接数据库方式和数据操作方式! 今天发现了个处理数据库的好东西:SQLAlchemy 一般python处理mysql之类的数据库 ...
【Python】如何安装easy_install?
[Python]如何安装easy_install? http://jingyan.baidu.com/article/b907e627e78fe146e7891c25.html easy_instal ...

随机推荐

HDU 2014
#define _CRT_SECURE_NO_WARNINGS #include <stdio.h> typedef float ElementType; void Select_Sort ...
【PHP面向对象(OOP)编程入门教程】11.类的继承
继承作为面向对象的三个重要特性的一个方面,在面向对象的领域有着及其重要的作用,好像没听说哪个面向对象的语言不支持继承. 继承是PHP5面象对象程序设计的重要特性之一,它是指建立一个新的派生类,从一个或 ...
C#之正则表达式、异常处理和委托与事件
正则表达式主要是为了处理和模式匹配复杂的字符串. int myInteger = 5; string intergerString = myInteger.ToString(); 就是将myInteg ...
在Xcode6.4中使用OpenCV
XCode版本6.4,OpenCV版本3.0.0 昨天我安装完OpenCV之后,兴奋地按照这篇文章Mac平台上OpenCV开发环境搭建的步骤,在XCode上建了一个Demo工程,结果编译一直不成功.一 ...
c#中两种不同的存储过程调用与比较
存储过程简介简单的说,存储过程是由一些SQL语句和控制语句组成的被封装起来的过程,它驻留在数据库中,可以被客户应用程序调用,也可以从另一个过程或触发器调用.它的参数可以被传递和返回.与应用程序中的函 ...
Javascript高级程序设计——引用类型
对象在javascript中被称为引用类型的值,而且有一些内置的引用类型可以创建特定的对象: 引用类型与传统面向对象中的程序设计的类相似,但实现不同: Object是一个基础类型,其他所有类型都从Ob ...
类加载器ClassLoader之jar包隔离
小引子最近做了一个根据同一模块的不同jar版本做同时测试的工具,感觉挺有意思,特此记录. 类加载器(ClassLoader)是啥? 把类加载阶段中的"通过一个类的全限定名(博主注:绝对路径 ...
MongoDB的安全（五）
MongoDB用户管理操作: MongoDB开启权限认证的方式有两种一种是auth形式,一种是keyfile形式 MongoDB创建用户: 1. 创建用户语法:在MongoDB2.6版本之后使用cre ...
maven工程通过命令打包
dos下cd到pom.xml所在的目录,输入maven命令:mvn clean package,回车即可. 会打成一个.war包在target文件夹下.
Sqli-LABS通关笔录-16
这个关卡之前我还使用了一下工具跑,发现居然跑不出来.这就尴尬了.行吧手工试试. payload admin") and If(ascii(substr(database(),1,1))=11 ...

【python】lxml

【python】lxml的更多相关文章

随机推荐

热门专题