一.简介

xml是实现不通语言或程序之间进行数据交换的协议,可扩展标记语言,标准通用标记语言的子集。是一种用于标记电子文件使其具有结构性的标记语言。xml格式如下,是通过<>节点来区别数据结构的。

XML文件示例:

<?xml version="1.0"?>
<data> # data 是根节点 <>开头<>结尾就表示为一个节点
<country name="Liechtenstein"> # country节点,name是节点的属性,属性可以有多个,属性值必须用引号
<rank updated="yes"></rank> # rank节点,updated是节点的属性,2表示文本内容
<year></year>
<gdppc></gdppc>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes"></rank>
<year></year>
<gdppc></gdppc>
<neighbor direction="N" name="Malaysia" />
</country>
<country name="Panama">
<rank updated="yes"></rank>
<year></year>
<gdppc></gdppc>
<neighbor direction="W" name="Costa Rica" />
<neighbor direction="E" name="Colombia" />
</country>
</data>

二.XML文件处理

XML文件解析的两种方式:

  1. 解析字符串方式
  2. 解析文件方式
a.解析字符串方式
# 加载模块,设置一个别名
from xml.etree import ElementTree as ET # 打开文件,读取XML内容,print(str_xml)则获得整个xml文件中的内容,str_xml是个字符串
str_xml = open('xo.xml', 'r').read() # 将字符串解析成xml特殊对象,root代指xml文件的根节点,print(root)将获取根节点的内存地址,print(root.tag)可以获取根节点的名称
root = ET.XML(str_xml) b.解析文件方式
from xml.etree import ElementTree as ET # 直接解析xml文件
tree = ET.parse("xo.xml") # 获取xml文件的根节点
root = tree.getroot()

2.XML文件操作

  XML格式类型是节点嵌套节点,对于每一个节点均有以下功能,以便对当前节点进行操

class Element:
"""An XML element. This class is the reference implementation of the Element interface. An element's length is its number of subelements. That means if you
want to check if an element is truly empty, you should check BOTH
its length AND its text attribute. The element tag, attribute names, and attribute values can be either
bytes or strings. *tag* is the element name. *attrib* is an optional dictionary containing
element attributes. *extra* are additional element attributes given as
keyword arguments. Example form:
<tag attrib>text<child/>...</tag>tail """ 当前节点的标签名
tag = None
"""The element's name.""" 当前节点的属性 attrib = None
"""Dictionary of the element's attributes.""" 当前节点的内容
text = None
"""
Text before first subelement. This is either a string or the value None.
Note that if there is no text, this attribute may be either
None or the empty string, depending on the parser. """ tail = None
"""
Text after this element's end tag, but before the next sibling element's
start tag. This is either a string or the value None. Note that if there
was no text, this attribute may be either None or an empty string,
depending on the parser. """ def __init__(self, tag, attrib={}, **extra):
if not isinstance(attrib, dict):
raise TypeError("attrib must be dict, not %s" % (
attrib.__class__.__name__,))
attrib = attrib.copy()
attrib.update(extra)
self.tag = tag
self.attrib = attrib
self._children = [] def __repr__(self):
return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self)) def makeelement(self, tag, attrib):
创建一个新节点
"""Create a new element with the same type. *tag* is a string containing the element name.
*attrib* is a dictionary containing the element attributes. Do not call this method, use the SubElement factory function instead. """
return self.__class__(tag, attrib) def copy(self):
"""Return copy of current element. This creates a shallow copy. Subelements will be shared with the
original tree. """
elem = self.makeelement(self.tag, self.attrib)
elem.text = self.text
elem.tail = self.tail
elem[:] = self
return elem def __len__(self):
return len(self._children) def __bool__(self):
warnings.warn(
"The behavior of this method will change in future versions. "
"Use specific 'len(elem)' or 'elem is not None' test instead.",
FutureWarning, stacklevel=
)
return len(self._children) != # emulate old behaviour, for now def __getitem__(self, index):
return self._children[index] def __setitem__(self, index, element):
# if isinstance(index, slice):
# for elt in element:
# assert iselement(elt)
# else:
# assert iselement(element)
self._children[index] = element def __delitem__(self, index):
del self._children[index] def append(self, subelement):
为当前节点追加一个子节点
"""Add *subelement* to the end of this element. The new element will appear in document order after the last existing
subelement (or directly after the text, if it's the first subelement),
but before the end tag for this element. """
self._assert_is_element(subelement)
self._children.append(subelement) def extend(self, elements):
为当前节点扩展 n 个子节点
"""Append subelements from a sequence. *elements* is a sequence with zero or more elements. """
for element in elements:
self._assert_is_element(element)
self._children.extend(elements) def insert(self, index, subelement):
在当前节点的子节点中插入某个节点,即:为当前节点创建子节点,然后插入指定位置
"""Insert *subelement* at position *index*."""
self._assert_is_element(subelement)
self._children.insert(index, subelement) def _assert_is_element(self, e):
# Need to refer to the actual Python implementation, not the
# shadowing C implementation.
if not isinstance(e, _Element_Py):
raise TypeError('expected an Element, not %s' % type(e).__name__) def remove(self, subelement):
在当前节点在子节点中删除某个节点
"""Remove matching subelement. Unlike the find methods, this method compares elements based on
identity, NOT ON tag value or contents. To remove subelements by
other means, the easiest way is to use a list comprehension to
select what elements to keep, and then use slice assignment to update
the parent element. ValueError is raised if a matching element could not be found. """
# assert iselement(element)
self._children.remove(subelement) def getchildren(self):
获取所有的子节点(废弃)
"""(Deprecated) Return all subelements. Elements are returned in document order. """
warnings.warn(
"This method will be removed in future versions. "
"Use 'list(elem)' or iteration over elem instead.",
DeprecationWarning, stacklevel=
)
return self._children def find(self, path, namespaces=None):
获取第一个寻找到的子节点
"""Find first matching element by tag name or path. *path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name. Return the first matching element, or None if no element was found. """
return ElementPath.find(self, path, namespaces) def findtext(self, path, default=None, namespaces=None):
获取第一个寻找到的子节点的内容
"""Find text for first matching element by tag name or path. *path* is a string having either an element tag or an XPath,
*default* is the value to return if the element was not found,
*namespaces* is an optional mapping from namespace prefix to full name. Return text content of first matching element, or default value if
none was found. Note that if an element is found having no text
content, the empty string is returned. """
return ElementPath.findtext(self, path, default, namespaces) def findall(self, path, namespaces=None):
获取所有的子节点
"""Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name. Returns list containing all matching elements in document order. """
return ElementPath.findall(self, path, namespaces) def iterfind(self, path, namespaces=None):
获取所有指定的节点,并创建一个迭代器(可以被for循环)
"""Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name. Return an iterable yielding all matching elements in document order. """
return ElementPath.iterfind(self, path, namespaces) def clear(self):
清空节点
"""Reset element. This function removes all subelements, clears all attributes, and sets
the text and tail attributes to None. """
self.attrib.clear()
self._children = []
self.text = self.tail = None def get(self, key, default=None):
获取当前节点的属性值
"""Get element attribute. Equivalent to attrib.get, but some implementations may handle this a
bit more efficiently. *key* is what attribute to look for, and
*default* is what to return if the attribute was not found. Returns a string containing the attribute value, or the default if
attribute was not found. """
return self.attrib.get(key, default) def set(self, key, value):
为当前节点设置属性值
"""Set element attribute. Equivalent to attrib[key] = value, but some implementations may handle
this a bit more efficiently. *key* is what attribute to set, and
*value* is the attribute value to set it to. """
self.attrib[key] = value def keys(self):
获取当前节点的所有属性的 key """Get list of attribute names. Names are returned in an arbitrary order, just like an ordinary
Python dict. Equivalent to attrib.keys() """
return self.attrib.keys() def items(self):
获取当前节点的所有属性值,每个属性都是一个键值对
"""Get element attributes as a sequence. The attributes are returned in arbitrary order. Equivalent to
attrib.items(). Return a list of (name, value) tuples. """
return self.attrib.items() def iter(self, tag=None):
在当前节点的子孙中根据节点名称寻找所有指定的节点,并返回一个迭代器(可以被for循环)。
"""Create tree iterator. The iterator loops over the element and all subelements in document
order, returning all elements with a matching tag. If the tree structure is modified during iteration, new or removed
elements may or may not be included. To get a stable set, use the
list() function on the iterator, and loop over the resulting list. *tag* is what tags to look for (default is to return all elements) Return an iterator containing all the matching elements. """
if tag == "*":
tag = None
if tag is None or self.tag == tag:
yield self
for e in self._children:
yield from e.iter(tag) # compatibility
def getiterator(self, tag=None):
# Change for a DeprecationWarning in 1.4
warnings.warn(
"This method will be removed in future versions. "
"Use 'elem.iter()' or 'list(elem.iter())' instead.",
PendingDeprecationWarning, stacklevel=
)
return list(self.iter(tag)) def itertext(self):
在当前节点的子孙中根据节点名称寻找所有指定的节点的内容,并返回一个迭代器(可以被for循环)。
"""Create text iterator. The iterator loops over the element and all subelements in document
order, returning all inner text. """
tag = self.tag
if not isinstance(tag, str) and tag is not None:
return
if self.text:
yield self.text
for e in self:
yield from e.itertext()
if e.tail:
yield e.tail

1.遍历XML文档中的所有内容

from xml.etree import ElementTree as ET
############ 解析方式一 ############
# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()
# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)
############ 解析方式二 ############
# 直接解析xml文件
tree = ET.parse("xo.xml")
# 获取xml文件的根节点
root = tree.getroot()
# 顶层标签
print(root.tag)
# 遍历XML文档的第二层
for child in root:
# 第二层节点的标签名称和标签属性
print(child.tag, child.attrib)
# 遍历XML文档的第三层
for i in child:
# 第二层节点的标签名称和内容
print(i.tag,i.text)
# 输出
country {'name': 'Liechtenstein'}
rank
year
gdppc
neighbor None
neighbor None
country {'name': 'Singapore'}
rank
year
gdppc
neighbor None
country {'name': 'Panama'}
rank
year
gdppc
neighbor None
neighbor None

2.遍历XML中指定的节点

from xml.etree import ElementTree as ET
############ 解析方式一 ############
# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()
# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)
############ 解析方式二 ############
# 直接解析xml文件
tree = ET.parse("xo.xml")
# 获取xml文件的根节点
root = tree.getroot() # 顶层标签
print(root.tag)
# 遍历XML中所有的year节点
for node in root.iter('year'):
# 节点的标签名称和内容
print(node.tag, node.text)
# 输出
year
year
year

3.修改节点内容

  由于修改节点时,都在内存中进行,不会影响原文件中的内容。如果想要保存修改,则会要将内存中的内容写入文件。

(1)解析字符串方式的修改和保存

from xml.etree import ElementTree as ET
############ 解析方式一 ############
# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read() # 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml) ############ 操作 ############
# 顶层标签
print(root.tag)
# 循环所有的year节点
for node in root.iter('year'):
# 将year节点中的内容自增一
new_year = int(node.text) +
node.text = str(new_year) # 设置属性
node.set('name', 'alex')
node.set('age', '')
# 删除属性
del node.attrib['name'] ############ 保存文件 ############
tree = ET.ElementTree(root)
tree.write("newnew.xml", encoding='utf-8')

(2)解析文件方式的修改和保存

from xml.etree import ElementTree as ET
############ 解析方式二 ############
# 直接解析xml文件
tree = ET.parse("xo.xml")
# 获取xml文件的根节点
root = tree.getroot()
# 顶层标签
print(root.tag)
# 循环所有的year节点
for node in root.iter('year'):
# 将year节点中的内容自增一
new_year = int(node.text) +
node.text = str(new_year) # 设置属性
node.set('name', 'alex')
node.set('age', '')
# 删除属性
del node.attrib['name']
############ 保存文件 ############
tree.write("newnew.xml", encoding='utf-8')

4.删除节点

(1)以解析字符串方式打开XML文件的删除

from xml.etree import ElementTree as ET
############ 解析字符串方式打开 ############
# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()
# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)
# 顶层标签
print(root.tag)
# 遍历data下的所有country节点
for country in root.findall('country'):
# 获取每一个country节点下rank节点的内容
rank = int(country.find('rank').text) if rank > :
# 删除指定country节点
root.remove(country)
############ 保存文件 ############
tree = ET.ElementTree(root)
tree.write("newnew.xml", encoding='utf-8')

(2)以解析文件方式打开的删除

from xml.etree import ElementTree as ET
############ 解析文件方式 ############
# 直接解析xml文件
tree = ET.parse("xo.xml")
# 获取xml文件的根节点
root = tree.getroot()
# 顶层标签
print(root.tag)
# 遍历data下的所有country节点
for country in root.findall('country'):
# 获取每一个country节点下rank节点的内容
rank = int(country.find('rank').text) if rank > :
# 删除指定country节点
root.remove(country)
############ 保存文件 ############
tree.write("newnew.xml", encoding='utf-8')

5.创建XML文件

(1)创建方式一:

from xml.etree import ElementTree as ET
# 创建根节点
root = ET.Element("famliy")
# 创建节点大儿子
son1 = ET.Element('son', {'name': '儿1'})
# 创建小儿子
son2 = ET.Element('son', {"name": '儿2'})
# 在大儿子中创建两个孙子
grandson1 = ET.Element('grandson', {'name': '儿11'})
grandson2 = ET.Element('grandson', {'name': '儿12'})
son1.append(grandson1)
son1.append(grandson2)
# 把儿子添加到根节点中
root.append(son1)
root.append(son1)
tree = ET.ElementTree(root)
tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)

(2)创建方式二:

from xml.etree import ElementTree as ET
# 创建根节点
root = ET.Element("famliy")
# 创建大儿子
# son1 = ET.Element('son', {'name': '儿1'})
son1 = root.makeelement('son', {'name': '儿1'})
# 创建小儿子
# son2 = ET.Element('son', {"name": '儿2'})
son2 = root.makeelement('son', {"name": '儿2'})
# 在大儿子中创建两个孙子
# grandson1 = ET.Element('grandson', {'name': '儿11'})
grandson1 = son1.makeelement('grandson', {'name': '儿11'})
# grandson2 = ET.Element('grandson', {'name': '儿12'})
grandson2 = son1.makeelement('grandson', {'name': '儿12'})
son1.append(grandson1)
son1.append(grandson2)
# 把儿子添加到根节点中
root.append(son1)
root.append(son1)
tree = ET.ElementTree(root)
tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)

(3)创建方式三:

from xml.etree import ElementTree as ET
# 创建根节点
root = ET.Element("famliy")
# 创建节点大儿子
son1 = ET.SubElement(root, "son", attrib={'name': '儿1'})
# 创建小儿子
son2 = ET.SubElement(root, "son", attrib={"name": "儿2"})
# 在大儿子中创建一个孙子
grandson1 = ET.SubElement(son1, "age", attrib={'name': '儿11'})
grandson1.text = '孙子'
et = ET.ElementTree(root) #生成文档对象
et.write("test.xml", encoding="utf-8", xml_declaration=True, short_empty_elements=False)

以上创建方式的XML默认无缩减,如果要设置缩减的话,需要修改保存方式:

from xml.etree import ElementTree as ET
from xml.dom import minidom def prettify(elem):
"""将节点转换成字符串,并添加缩进。
"""
rough_string = ET.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent="\t") # 创建根节点
root = ET.Element("famliy")
# 创建大儿子
# son1 = ET.Element('son', {'name': '儿1'})
son1 = root.makeelement('son', {'name': '儿1'})
# 创建小儿子
# son2 = ET.Element('son', {"name": '儿2'})
son2 = root.makeelement('son', {"name": '儿2'})
# 在大儿子中创建两个孙子
# grandson1 = ET.Element('grandson', {'name': '儿11'})
grandson1 = son1.makeelement('grandson', {'name': '儿11'})
# grandson2 = ET.Element('grandson', {'name': '儿12'})
grandson2 = son1.makeelement('grandson', {'name': '儿12'})
son1.append(grandson1)
son1.append(grandson2)
# 把儿子添加到根节点中
root.append(son1)
root.append(son1)
raw_str = prettify(root)
f = open("xxxoo.xml",'w',encoding='utf-8')
f.write(raw_str)
f.close()

更详细学习:http://www.runoob.com/xml/xml-tutorial.html

python3 xml模块的更多相关文章

  1. Python3 xml模块的增删改查

    xml数据示例 ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 <data>     < ...

  2. oop、configparser、xml模块

    本节大纲:一:在python中,有两种编程思想.1:函数式编程.2:oop.无论是函数式编程还是oop都是相辅相成.并不是说oop比函数式编程就好.各有各的优缺点.在其他语言中java等只能以面向对象 ...

  3. s14 第5天 时间模块 随机模块 String模块 shutil模块(文件操作) 文件压缩(zipfile和tarfile)shelve模块 XML模块 ConfigParser配置文件操作模块 hashlib散列模块 Subprocess模块(调用shell) logging模块 正则表达式模块 r字符串和转译

    时间模块 time datatime time.clock(2.7) time.process_time(3.3) 测量处理器运算时间,不包括sleep时间 time.altzone 返回与UTC时间 ...

  4. day36-常见内置模块五(collections、xml模块)

    一.collections模块 在内置数据类型(dict.list.set.tuple)的基础上,collections模块还提供了几个额外的数据类型:namedtuple.deque.Counter ...

  5. python 常用模块 time random os模块 sys模块 json & pickle shelve模块 xml模块 configparser hashlib subprocess logging re正则

    python 常用模块 time random os模块 sys模块 json & pickle shelve模块 xml模块 configparser hashlib  subprocess ...

  6. python解析xml模块封装代码

    在python中解析xml文件的模块用法,以及对模块封装的方法.原文转自:http://www.jbxue.com/article/16586.html 有如下的xml文件:<?xml vers ...

  7. python全栈开发-hashlib模块(数据加密)、suprocess模块、xml模块

    一.hashlib模块 1.什么叫hash:hash是一种算法(3.x里代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法 ...

  8. 【python标准库模块五】Xml模块学习

    Xml模块 xml本身是一种格式规范,是一种包含了数据以及数据说明的文本格式规范.在json没有兴起之前各行各业进行数据交换的时候用的就是这个.目前在金融行业也在广泛在运用. 举个简单的例子,xml是 ...

  9. Python xml 模块

    Python xml 模块 TOC 什么是xml? xml和json的区别 xml现今的应用 xml的解析方式 xml.etree.ElementTree SAX(xml.parsers.expat) ...

随机推荐

  1. str

    print('字符串操作') s='abc DEF hij' print('首字母大写') print(s.capitalize()) print('全大写') print(s.upper()) pr ...

  2. Event Recommendation Engine Challenge分步解析第三步

    一.请知晓 本文是基于: Event Recommendation Engine Challenge分步解析第一步 Event Recommendation Engine Challenge分步解析第 ...

  3. ansible基础-playbook剧本的使用

    ansible基础-playbook剧本的使用 作者:尹正杰  版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.YAML概述 1>.YAML的诞生 YAML是一个可读性高,用来表达数据序 ...

  4. Jvm threaddump,heapdump的分析及问题定位

    1 一.Thread Dump介绍 1.1 1.1什么是Thread Dump? 1.2 1.2 Thread Dump特点 1.3 1.3 Thread Dump 能诊断的问题 1.4 1.4如何抓 ...

  5. jQuery视频格式的验证

    $(document).on('change','#videofile',function() { var file = this.files[0]; if (!/video\/\w+/.test(f ...

  6. springboot 日志【转】【补】

    市面上的日志框架 日志门面 (日志的抽象层) 日志实现 JCL(Jakarta Commons Logging)(2014) SLF4j(Simple Logging Facade for Java) ...

  7. 【1】【leetcode-99】 恢复二叉搜索树

    (没思路) 99. 恢复二叉搜索树 二叉搜索树中的两个节点被错误地交换. 请在不改变其结构的情况下,恢复这棵树. 示例 1: 输入: [1,3,null,null,2]   1   /  3   \ ...

  8. HDU 1036(平均速度 **)

    题意是求出跑了 n 圈每圈 m km 的个人的平均速度. 控制格式,特别注意,题意是输出 -:--:-- 的该人成绩作废,但要把他其他的成绩输进去,不能直接就 break ,输出也就只有一个 - ,而 ...

  9. PHP7 学习笔记(十二)gRPC

    GitHub:https://github.com/grpc/grpc/tree/master/src/php 环境:Linux + php7 1.安装grpc pecl install grpc 编 ...

  10. Java秒杀系统方案优化 高性能高并发实战(1)

    首先先把 springboot +thymeleaf 搞起来 ,参考 springboot 官方文档 本次学习 使用 springboot + thymeleaf+mybatis+redis+Rabb ...