URL Parsing

urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)

Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:


Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.


The scheme argument gives the default addressing scheme.

If the allow_fragments argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path, parameters or query component, and fragment is set to the empty string in the return value.

urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name.

urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a list of name, value pairs.

urllib.parse.urlunparse(parts)

Construct a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.

urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)

This is similar to urlparse(), but does not split the params from the URL.

urllib.parse.urlunsplit(parts)

Combine the elements of a tuple as returned by urlsplit() into a complete URL as a string. The parts argument can be any five-item iterable.

urllib.parse.urldefrag(url)

If url contains a fragment identifier, return a modified version of url with no fragment identifier, and the fragment identifier as a separate string. If there is no fragment identifier in url, return url unmodified and an empty string.

URL Parsing的更多相关文章

  1. w3 parse a url

     最新链接:https://www.w3.org/TR/html53/ 2.6 URLs — HTML5 li, dd li { margin: 1em 0; } dt, dfn { font-wei ...

  2. GO语言的开源库

    Indexes and search engines These sites provide indexes and search engines for Go packages: godoc.org ...

  3. Python Web Crawler

    Python版本:3.5.2 pycharm URL Parsing¶ https://docs.python.org/3.5/library/urllib.parse.html?highlight= ...

  4. Curl的编译

    下载 curl的官网:https://curl.haxx.se/ libcurl就是一个库,curl就是使用libcurl实现的. curl是一个exe,也可以说是整个项目的名字,而libcurl就是 ...

  5. Go语言(golang)开源项目大全

    转http://www.open-open.com/lib/view/open1396063913278.html内容目录Astronomy构建工具缓存云计算命令行选项解析器命令行工具压缩配置文件解析 ...

  6. Go by Example

    Go by Example Go is an open source programming language designed for building simple, fast, and reli ...

  7. Node.js API

    Node.js v4.4.7 Documentation(官方文档) Buffer Prior to the introduction of TypedArray in ECMAScript 2015 ...

  8. [转]Go语言(golang)开源项目大全

    内容目录 Astronomy 构建工具 缓存 云计算 命令行选项解析器 命令行工具 压缩 配置文件解析器 控制台用户界面 加密 数据处理 数据结构 数据库和存储 开发工具 分布式/网格计算 文档 编辑 ...

  9. python爬虫从入门到放弃(三)之 Urllib库的基本使用

    官方文档地址:https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库包括以下模块urllib.r ...

随机推荐

  1. Python变量、数据类型6

    1.Python变量 变量,即代表某个value的名字. 变量的值存储在内存中,这意味着在创建变量时会在内存中开辟一个空间. !!!即值并没有保存在变量中,它们保存在计算机内存的深处,被变量引用.所以 ...

  2. ucos3的配置文件

    1,配置文件,用于系统的裁剪 均有详细的注释 为组件的开关 ​ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ...

  3. locate无法open mlocate.db

    # locate xxxx locate: can not open () `/var/lib/mlocate/mlocate.db': No such file or directory 如果出现此 ...

  4. truncate有外键约束的表,报ORA-02266处理。

    问题描述:当父表有子表的外键约束时,无法直接truncate父表.报ORA-02266: unique/primary keys in table referenced by enabled fore ...

  5. PHP、JAVA、C#、Object-C 通用的DES加密

    PHP.JAVA.C#.Object-C 通用的DES加密 PHP: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ...

  6. Window环境下 Git 下载Android源码

    1.需要的工具 git.vpn代理 2. 设置git代理(Google source 无法下载,git设置代理) git config --global http.proxy "localh ...

  7. PHP使用mysqldump备份数据库(以及还原)

    导出数据实例如下: <?php $mdb_host = $g_c["db"][0]["managertool"]["host"]; / ...

  8. python标准库xml.etree.ElementTree的bug

    使用python生成或者解析xml的方法用的最多的可能就数python标准库xml.etree.ElementTree和lxml了,在某些环境下使用xml.etree.ElementTree更方便一些 ...

  9. 在ios8中做的屏幕旋转功能

    http://www.cnblogs.com/smileEvday/archive/2013/04/24/Rotate2.html 思路出自这篇博主的文章. 直接上代码 -(void)willAnim ...

  10. java反射学习笔记

    1.java反射概念 JAVA反射机制是在运行状态中,对于任意一个类,都能够知道这个类的所有属性和方法:对于任意一个对象,都能够调用它的任意一个方法和属性:这种动态获取的信息以及动态调用对象的方法的功 ...