URL Parsing

urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)

Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:


Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.


The scheme argument gives the default addressing scheme.

If the allow_fragments argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path, parameters or query component, and fragment is set to the empty string in the return value.

urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name.

urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a list of name, value pairs.

urllib.parse.urlunparse(parts)

Construct a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.

urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)

This is similar to urlparse(), but does not split the params from the URL.

urllib.parse.urlunsplit(parts)

Combine the elements of a tuple as returned by urlsplit() into a complete URL as a string. The parts argument can be any five-item iterable.

urllib.parse.urldefrag(url)

If url contains a fragment identifier, return a modified version of url with no fragment identifier, and the fragment identifier as a separate string. If there is no fragment identifier in url, return url unmodified and an empty string.

URL Parsing的更多相关文章

  1. w3 parse a url

     最新链接:https://www.w3.org/TR/html53/ 2.6 URLs — HTML5 li, dd li { margin: 1em 0; } dt, dfn { font-wei ...

  2. GO语言的开源库

    Indexes and search engines These sites provide indexes and search engines for Go packages: godoc.org ...

  3. Python Web Crawler

    Python版本:3.5.2 pycharm URL Parsing¶ https://docs.python.org/3.5/library/urllib.parse.html?highlight= ...

  4. Curl的编译

    下载 curl的官网:https://curl.haxx.se/ libcurl就是一个库,curl就是使用libcurl实现的. curl是一个exe,也可以说是整个项目的名字,而libcurl就是 ...

  5. Go语言(golang)开源项目大全

    转http://www.open-open.com/lib/view/open1396063913278.html内容目录Astronomy构建工具缓存云计算命令行选项解析器命令行工具压缩配置文件解析 ...

  6. Go by Example

    Go by Example Go is an open source programming language designed for building simple, fast, and reli ...

  7. Node.js API

    Node.js v4.4.7 Documentation(官方文档) Buffer Prior to the introduction of TypedArray in ECMAScript 2015 ...

  8. [转]Go语言(golang)开源项目大全

    内容目录 Astronomy 构建工具 缓存 云计算 命令行选项解析器 命令行工具 压缩 配置文件解析器 控制台用户界面 加密 数据处理 数据结构 数据库和存储 开发工具 分布式/网格计算 文档 编辑 ...

  9. python爬虫从入门到放弃(三)之 Urllib库的基本使用

    官方文档地址:https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库包括以下模块urllib.r ...

随机推荐

  1. poj1026 Cipher ——置换群

    link:http://poj.org/problem?id=1026 其实这道题目和poj2369这道题目一样. 都是基础的置换群题目.把那道题目理解了,这道题就没问题了. 不过我的方法貌似比较挫, ...

  2. uva147 Dollars ——完全背包

    link:http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem& ...

  3. 升级vs工程到vs2010(以上)工程找不到OutputDebugStr报错

    原因是不同版本的系统宏的不同导致报错,OutputDebugStr,它在vs2005的头文件里定义在vs安装目录下的平台sdk目录下的mmsysytem.h, 而到vs2013下这个文件被放到了系统目 ...

  4. 黑马程序员——JAVA基础之标准输入输出流

    ------- android培训.java培训.期待与您交流! ---------- 标准输入输出流: System中的基本字段,in,out 它们各代表了系统标准的输入和输出设备. 默认输入设备是 ...

  5. EDIUS手绘遮罩功能如何用

    学了这么久的EDIUS视频编辑软件,你们学的怎么样了呢?你们知道EIDUS手绘遮罩的用法么,会熟练地使用它么?如果你们还没有学到这一知识点的话也不要着急,因为你们看完下面这篇文章就会明白了.事不宜迟, ...

  6. 一步步构建自己的AngularJS(1)——项目初始化

    Angular1距离2009年发布已经好多年了,Angular2也已经出了Beta版,估计今年就能正式发布.大多数人对于Angular1.X的认识仅限于能够在项目中使用,对于其中的深层原理知道的并不多 ...

  7. Win7 Update 遭遇8024200D

    可用办法如下: 1.点击开始菜单,点运行,输入cmd然后按回车.然后在命令行下输入: net stop WuAuServ 2.点击开始菜单,点运行,输入%windir%然后按回车. 3.在打开的文件夹 ...

  8. web项目引用Java项目,连接报错error HTTP Status 500 - Servlet execution threw an exception

    错误信息 项目背景: 一个web项目引用一个java Project,项目中添加了引用,但是打开页面访问,总报500错误.提示:servlet初始化错误. 环境:Eclipse luna JDK: 1 ...

  9. CentOS 6.8下安装MySQL 5.6.33

    此处操作,包含MySQL的客户端及服务端. MySQL下载地址: http://dev.mysql.com/downloads/mysql/5.6.html MySQL--.linux_glibc2. ...

  10. HA(High available)-Keepalived高可用性集群(双机热备)单点实验-菜鸟入门级

    HA(High available)-Keepalived高可用性集群   Keepalived 是一个基于VRRP虚拟路由冗余协议来实现的WEB 服务高可用方案,虚拟路由冗余协议 (Virtual ...