QUrl

Detailed Description

The QUrl class provides a convenient interface for working with URLs.

It can parse and construct URLs in both encoded and unencoded form. QUrl also has support for internationalized domain names (IDNs).

The most common way to use QUrl is to initialize it via the constructor by passing a QString. Otherwise, setUrl() can also be used.

URLs can be represented in two forms: encoded or unencoded. The unencoded representation is suitable for showing to users, but the encoded representation is typically what you would send to a web server. For example, the unencoded URL "http://bühler.example.com/List of applicants.xml" would be sent to the server as "http://xn--bhler-kva.example.com/List%20of%20applicants.xml".

A URL can also be constructed piece by piece by calling setScheme(), setUserName(), setPassword(), setHost(), setPort(), setPath(), setQuery() and setFragment(). Some convenience functions are also available: setAuthority() sets the user name, password, host and port. setUserInfo() sets the user name and password at once.

Call isValid() to check if the URL is valid. This can be done at any point during the constructing of a URL. If isValid() returns false, you should clear() the URL before proceeding, or start over by parsing a new URL with setUrl().

Constructing a query is particularly convenient through the use of the QUrlQuery class and its methods QUrlQuery::setQueryItems(), QUrlQuery::addQueryItem() and QUrlQuery::removeQueryItem(). Use QUrlQuery::setQueryDelimiters() to customize the delimiters used for generating the query string.

For the convenience of generating encoded URL strings or query strings, there are two static functions called fromPercentEncoding() and toPercentEncoding() which deal with percent encoding and decoding of QString objects.

Calling isRelative() will tell whether or not the URL is relative. A relative URL can be resolved by passing it as argument to resolved(), which returns an absolute URL. isParentOf() is used for determining whether one URL is a parent of another.

fromLocalFile() constructs a QUrl by parsing a local file path. toLocalFile() converts a URL to a local file path.

The human readable representation of the URL is fetched with toString(). This representation is appropriate for displaying a URL to a user in unencoded form. The encoded form however, as returned by toEncoded(), is for internal use, passing to web servers, mail clients and so on. Both forms are technically correct and represent the same URL unambiguously(明确的) -- in fact, passing either form to QUrl's constructor or to setUrl() will yield the same QUrl object.

Error checking

QUrl is capable of detecting many errors in URLs while parsing it or when components of the URL are set with individual setter methods (like setScheme(), setHost() or setPath()). If the parsing or setter function is successful, any previously recorded error conditions will be discarded.

By default, QUrl setter methods operate in QUrl::TolerantMode(宽容模式), which means they accept some common mistakes and mis-representation of data. An alternate method of parsing is QUrl::StrictMode, which applies further checks. See QUrl::ParsingMode for a description of the difference of the parsing modes.

QUrl only checks for conformance with the URL specification. It does not try to verify that high-level protocol URLs are in the format they are expected to be by handlers elsewhere. For example, the following URIs are all considered valid by QUrl, even if they do not make sense when used:

  • "http:/filename.html"
  • "mailto://example.com"

When the parser encounters an error, it signals the event by making isValid() return false and toString() / toEncoded() return an empty string. If it is necessary to show the user the reason why the URL failed to parse, the error condition can be obtained from QUrl by calling errorString(). Note that this message is highly technical and may not make sense to end-users.

QUrl is capable of recording only one error condition. If more than one error is found, it is undefined which error is reported.

Character Conversions

Follow these rules to avoid erroneous character conversion when dealing with URLs and strings:

When creating an QString to contain a URL from a QByteArray or a char*, always use QString::fromUtf8().

enum QUrl::ParsingMode

The parsing mode controls the way QUrl parses strings.

Constant Value Description
QUrl::TolerantMode 0 QUrl will try to correct some common errors in URLs. This mode is useful for parsing URLs coming from sources not known to be strictly standards-conforming.
QUrl::StrictMode 1 Only valid URLs are accepted. This mode is useful for general URL validation.
QUrl::DecodedMode 2 QUrl will interpret(解释) the URL component in the fully-decoded form, where percent characters stand for themselves, not as the beginning of a percent-encoded sequence. This mode is only valid for the setters setting components of a URL; it is not permitted in the QUrl constructor, in fromEncoded() or in setUrl(). For more information on this mode, see the documentation for QUrl::FullyDecoded.

In TolerantMode, the parser has the following behaviour:

  • Spaces and "%20": unencoded space characters will be accepted and will be treated as equivalent to "%20".
  • Single "%" characters: Any occurrences of a percent character "%" not followed by exactly two hexadecimal characters (e.g., "13% coverage.html") will be replaced by "%25". Note that one lone "%" character will trigger the correction mode for all percent characters.
  • Reserved and unreserved(保留和未保留) characters: An encoded URL should only contain a few characters as literals; all other characters should be percent-encoded. In TolerantMode, these characters will be accepted if they are found in the URL:

    space / double-quote / < / > / "" / ^ / \ / { / | / } `

    Those same characters can be decoded again by passing QUrl::DecodeReserved to toString() or toEncoded(). In the getters of individual components, those characters are often returned in decoded form.

When in StrictMode, if a parsing error is found, isValid() will return false and errorString() will return a message describing the error. If more than one error is detected, it is undefined which error gets reported.

Note that TolerantMode is not usually enough for parsing user input, which often contains more errors and expectations than the parser can deal with. When dealing with data coming directly from the user -- as opposed to data coming from data-transfer sources, such as other programs -- it is recommended to use fromUserInput().

enum QUrl::ComponentFormattingOption

The component formatting options define how the components of an URL will be formatted when written out as text. They can be combined with the options from QUrl::FormattingOptions when used in toString() and toEncoded().

Constant Value Description
QUrl::PrettyDecoded 0x000000 The component is returned in a "pretty form", with most percent-encoded characters decoded. The exact behavior of PrettyDecoded varies from component to component and may also change from Qt release to Qt release. This is the default.
QUrl::EncodeSpaces 0x100000 Leave space characters in their encoded form ("%20").
QUrl::EncodeUnicode 0x200000 Leave non-US-ASCII characters encoded in their UTF-8 percent-encoded form (e.g., "%C3%A9" for the U+00E9 codepoint, LATIN SMALL LETTER E WITH ACUTE).
QUrl::EncodeDelimiters 0x400000 0x800000 Leave certain delimiters in their encoded form, as would appear in the URL when the full URL is represented as text. The delimiters are affected by this option change from component to component. This flag has no effect in toString() or toEncoded().
QUrl::EncodeReserved 0x1000000 Leave US-ASCII characters not permitted(允许) in the URL by the specification in their encoded form. This is the default on toString() and toEncoded().
QUrl::DecodeReserved 0x2000000 Decode the US-ASCII characters that the URL specification does not allow to appear in the URL. This is the default on the getters of individual components.
QUrl::FullyEncoded EncodeSpaces EncodeUnicode EncodeDelimiters EncodeReserved Leave all characters in their properly-encoded form, as this component would appear as part of a URL. When used with toString(), this produces a fully-compliant URL in QString form, exactly equal to the result of toEncoded()
QUrl::FullyDecoded FullyEncoded DecodeReserved 0x4000000 Attempt to decode as much as possible. For individual components of the URL, this decodes every percent encoding sequence, including control characters (U+0000 to U+001F) and UTF-8 sequences found in percent-encoded form. Use of this mode may cause data loss, see below for more information.

The values of EncodeReserved and DecodeReserved should not be used together in one call. The behavior is undefined if that happens. They are provided as separate values because the behavior of the "pretty mode" with regards to reserved characters is different on certain components and specially on the full URL.

Full decoding

The FullyDecoded mode is similar to the behavior of the functions returning QString in Qt 4.x, in that every character represents itself and never has any special meaning. This is true even for the percent character ('%'), which should be interpreted(解释) to mean a literal(文本的) percent, not the beginning of a percent-encoded sequence. The same actual character, in all other decoding modes, is represented by the sequence "%25".

Whenever re-applying data obtained with QUrl::FullyDecoded into a QUrl, care must be taken to use the QUrl::DecodedMode parameter to the setters (like setPath() and setUserName()). Failure to do so may cause re-interpretation of the percent character ('%') as the beginning of a percent-encoded sequence.

This mode is quite useful when portions of a URL are used in a non-URL context. For example, to extract the username, password or file paths in an FTP client application, the FullyDecoded mode should be used.

This mode should be used with care, since there are two conditions that cannot be reliably represented in the returned QString. They are:

  1. Non-UTF-8 sequences: URLs may contain sequences of percent-encoded characters that do not form valid UTF-8 sequences. Since URLs need to be decoded using UTF-8, any decoder failure will result in the QString containing one or more replacement characters where the sequence existed.

  2. Encoded delimiters: URLs are also allowed to make a distinction between a delimiter found in its literal form and its equivalent in percent-encoded form. This is most commonly found in the query, but is permitted in most parts of the URL.

The following example illustrates(阐明) the problem:

  QUrl original("http://example.com/?q=a%2B%3Db%26c");
QUrl copy(original);
copy.setQuery(copy.query(QUrl::FullyDecoded), QUrl::DecodedMode); qDebug() << original.toString(); // prints: http://example.com/?q=a%2B%3Db%26c
qDebug() << copy.toString(); // prints: http://example.com/?q=a+=b&c

If the two URLs were used via HTTP GET, the interpretation by the web server would probably be different. In the first case, it would interpret as one parameter, with a key of "q" and value "a+=b&c". In the second case, it would probably interpret as two parameters, one with a key of "q" and value "a =b", and the second with a key "c" and no value.

Other Function

QUrl::QUrl(const QString &url, ParsingMode parsingMode = TolerantMode)

Constructs a URL by parsing url. QUrl will automatically percent encode all characters that are not allowed in a URL and decode the percent-encoded sequences that represent an unreserved character (letters, digits, hyphens(连字符), undercores(下划线), dots and tildes(波浪线)). All other characters are left in their original forms.

Parses the url using the parser mode parsingMode. In TolerantMode (the default), QUrl will correct certain mistakes, notably the presence of a percent character ('%') not followed by two hexadecimal digits(两个十六进制数), and it will accept any character in any position. In StrictMode, encoding mistakes will not be tolerated and QUrl will also check that certain forbidden characters are not present in unencoded form. If an error is detected in StrictMode, isValid() will return false. The parsing mode DecodedMode is not permitted in this context.

Example:

  QUrl url("http://www.example.com/List of holidays.xml");
// url.toEncoded() == "http://www.example.com/List%20of%20holidays.xml"

To construct a URL from an encoded string, you can also use fromEncoded():

  QUrl url = QUrl::fromEncoded("http://qt-project.org/List%20of%20holidays.xml");

Both functions are equivalent and, in Qt 5, both functions accept encoded data. Usually, the choice of the QUrl constructor or setUrl() versus fromEncoded() will depend on the source data: the constructor and setUrl() take a QString, whereas fromEncoded takes a QByteArray.

QString QUrl::fileName(ComponentFormattingOptions options = FullyDecoded) const

Returns the name of the file, excluding the directory path.

Note that, if this QUrl object is given a path ending in a slash, the name of the file is considered empty.

If the path doesn't contain any slash, it is fully returned as the fileName.

Example:

  QUrl url("http://qt-project.org/support/file.html");
// url.adjusted(RemoveFilename) == "http://qt-project.org/support/"
// url.fileName() == "file.html"

The options argument controls how to format the file name component. All values produce an unambiguous result. With QUrl::FullyDecoded, all percent-encoded sequences are decoded; otherwise, the returned value may contain some percent-encoded sequences for some control sequences not representable in decoded form in QString.

QUrl QUrl::resolved(const QUrl &relative) const

Returns the result of the merge of this URL with relative. This URL is used as a base to convert relative to an absolute URL.

If relative is not a relative URL, this function will return relative directly. Otherwise, the paths of the two URLs are merged, and the new URL returned has the scheme and authority of the base URL, but with the merged path, as in the following example:

  QUrl baseUrl("http://qt.digia.com/Support/");
QUrl relativeUrl("../Product/Library/");
qDebug(baseUrl.resolved(relativeUrl).toString());
// prints "http://qt.digia.com/Product/Library/"

Calling resolved() with ".." returns a QUrl whose directory is one level higher than the original. Similarly, calling resolved() with "../.." removes two levels from the path. If relative is "/", the path becomes "/".

void QUrl::setAuthority(const QString &authority, ParsingMode mode = TolerantMode)

Sets the authority of the URL to authority.

The authority of a URL is the combination of user info, a host name and a port. All of these elements are optional; an empty authority is therefore valid.

The user info and host are separated by a '@', and the host and port are separated by a ':'. If the user info is empty, the '@' must be omitted; although a stray ':' is permitted if the port is empty.

The following example shows a valid authority string:

The authority data is interpreted according to mode: in StrictMode, any '%' characters must be followed by exactly two hexadecimal characters and some characters (including space) are not allowed in undecoded form. In TolerantMode (the default), all characters are accepted in undecoded form and the tolerant parser will correct stray '%' not followed by two hex characters.

This function does not allow mode to be QUrl::DecodedMode. To set fully decoded data, call setUserName(), setPassword(), setHost() and setPort() individually.

void QUrl::setFragment(const QString &fragment, ParsingMode mode = TolerantMode)

Sets the fragment of the URL to fragment. The fragment is the last part of the URL, represented by a '#' followed by a string of characters. It is typically used in HTTP for referring to a certain link or point on a page:

The fragment is sometimes also referred to as the URL "reference".

Passing an argument of QString() (a null QString) will unset the fragment. Passing an argument of QString("") (an empty but not null QString) will set the fragment to an empty string (as if the original URL had a lone "#").

The fragment data is interpreted according to mode: in StrictMode, any '%' characters must be followed by exactly two hexadecimal characters and some characters (including space) are not allowed in undecoded form. In TolerantMode, all characters are accepted in undecoded form and the tolerant parser will correct stray '%' not followed by two hex characters. In DecodedMode, '%' stand for themselves and encoded characters are not possible.

QUrl::DecodedMode should be used when setting the fragment from a data source which is not a URL or with a fragment obtained by calling fragment() with the QUrl::FullyDecoded formatting option.

void QUrl::setPath(const QString &path, ParsingMode mode = DecodedMode)

Sets the path of the URL to path. The path is the part of the URL that comes after the authority but before the query string.

For non-hierarchical schemes, the path will be everything following the scheme declaration, as in the following example:

void QUrl::setScheme(const QString &scheme)

Sets the scheme of the URL to scheme. As a scheme can only contain ASCII characters, no conversion or decoding is done on the input. It must also start with an ASCII letter.

The scheme describes the type (or protocol) of the URL. It's represented by one or more ASCII characters at the start the URL.

A scheme is strictly RFC 3986-compliant:

scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

The following example shows a URL where the scheme is "ftp":

To set the scheme, the following call is used:

  QUrl url;
url.setScheme("ftp");

The scheme can also be empty, in which case the URL is interpreted as relative.

void QUrl::setUserInfo(const QString &userInfo, ParsingMode mode = TolerantMode)

Sets the user info of the URL to userInfo. The user info is an optional part of the authority of the URL, as described in setAuthority().

The user info consists of a user name and optionally a password, separated by a ':'. If the password is empty, the colon must be omitted. The following example shows a valid user info string:

QString QUrl::topLevelDomain(ComponentFormattingOptions options = FullyDecoded) const

Returns the TLD (Top-Level Domain) of the URL, (e.g. .co.uk, .net). Note that the return value is prefixed with a '.' unless the URL does not contain a valid TLD, in which case the function returns an empty string.

Note that this function considers a TLD to be any domain that allows users to register subdomains under, including many home, dynamic DNS websites and blogging providers. This is useful for determining whether two websites belong to the same infrastructure and communication should be allowed, such as browser cookies: two domains should be considered part of the same website if they share at least one label in addition to the value returned by this function.

  • foo.co.uk and foo.com do not share a top-level domain
  • foo.co.uk and bar.co.uk share the .co.uk domain, but the next label is different
  • www.foo.co.uk and ftp.foo.co.uk share the same top-level domain and one more label, so they are considered part of the same site

If options includes EncodeUnicode, the returned string will be in ASCII Compatible Encoding.

QUrlQuery

Detailed Description

The QUrlQuery class provides a way to manipulate a key-value pairs in a URL's query.

It is used to parse the query strings found in URLs like the following:

Query strings like the above are used to transmit options in the URL and are usually decoded into multiple key-value pairs. The one above would contain two entries in its list, with keys "type" and "color". QUrlQuery can also be used to create a query string suitable for use in QUrl::setQuery() from the individual components of the query.

The most common way of parsing a query string is to initialize it in the constructor by passing it the query string. Otherwise, the setQuery() method can be used to set the query to be parsed. That method can also be used to parse a query with non-standard delimiters, after having set them using the setQueryDelimiters() function.

The encoded query string can be obtained again using query(). This will take all the internally-stored items and encode the string using the delimiters.

Encoding

All of the getter methods in QUrlQuery support an optional parameter of type QUrl::ComponentFormattingOptions, including query(), which dictate how to encode the data in question. Except for QUrl::FullyDecoded, the returned value must still be considered a percent-encoded string, as there are certain values which cannot be expressed in decoded form (like control characters, byte sequences not decodable to UTF-8). For that reason, the percent character is always represented by the string "%25".

Handling of spaces and plus ("+")

Web browsers usually encode spaces found in HTML FORM elements to a plus sign ("+") and plus signs to its percent-encoded form (%2B). However, the Internet specifications governing URLs do not consider spaces and the plus character equivalent.

For that reason, QUrlQuery never encodes the space character to "+" and will never decode "+" to a space character. Instead, space characters will be rendered "%20" in encoded form.

To support encoding like that of HTML forms, QUrlQuery also never decodes the "%2B" sequence to a plus sign nor encode a plus sign. In fact, any "%2B" or "+" sequences found in the keys, values, or query string are left exactly like written (except for the uppercasing of "%2b" to "%2B").

Full decoding

With QUrl::FullyDecoded formatting, all percent-encoded sequences will be decoded fully and the '%' character is used to represent itself. QUrl::FullyDecoded should be used with care, since it may cause data loss. See the documentation of QUrl::FullyDecoded for information on what data may be lost.

This formatting mode should be used only when dealing with text presented to the user in contexts where percent-encoding is not desired. Note that QUrlQuery setters and query methods do not support the counterpart(副本) QUrl::DecodedMode parsing, so using QUrl::FullyDecoded to obtain a listing of keys may result in keys not found in the object.

Non-standard delimiters

By default, QUrlQuery uses an equal sign ("=") to separate a key from its value, and an ampersand ("&") to separate key-value pairs from each other. It is possible to change the delimiters that QUrlQuery uses for parsing and for reconstructing the query by calling setQueryDelimiters().

Non-standard delimiters should be chosen from among what RFC 3986 calls "sub-delimiters". They are:

  sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="

Use of other characters is not supported and may result in unexpected behaviour. QUrlQuery does not verify that you passed a valid delimiter.

Other Function

void QUrlQuery::setQueryDelimiters(QChar valueDelimiter, QChar pairDelimiter)

Sets the characters used for delimiting between keys and values, and between key-value pairs in the URL's query string. The default value delimiter is '=' and the default pair delimiter is '&'.

valueDelimiter will be used for separating keys from values, and pairDelimiter will be used to separate key-value pairs. Any occurrences of these delimiting characters in the encoded representation of the keys and values of the query string are percent encoded when returned in query().

If valueDelimiter is set to '(' and pairDelimiter is ')', the above query string would instead be represented like this:

  http://www.example.com/cgi-bin/drawgraph.cgi?type(pie)color(green)

Note: Non-standard delimiters should be chosen from among what RFC 3986 calls "sub-delimiters". They are:

  sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="

Use of other characters is not supported and may result in unexpected behaviour. This method does not verify that you passed a valid delimiter.


QUrl的更多相关文章

  1. QWebView下载文件,QUrl中解析文件名

    参考网址: http://blog.csdn.net/cdnight/article/details/23658715 http://www.tuicool.com/articles/AzeaUz h ...

  2. QT打开网页 QURL

    用QT打开一个网页就是先定义一个QUrl对象url,然后利用QDesktopServices::open(url)即可. 例如: const QUrl url(http://www.baidu.com ...

  3. QUrl不同版本之间的坑

    在项目中使用了native application + html的方式构建界面. 之前在4.8.4用QUrl直接加载相对路径一点问题都没有.但是切换到5.1编译之后却发现本地的html文件全部没有加载 ...

  4. Qt打开外部程序和文件夹需要注意的细节(注意QProcess的空格问题,以及打开本地文件时,需要QUrl::fromLocalFile才可以)

    下午写程序中遇到几个小细节,需要在这里记录一下. ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 QProcess *process = new QProcess(this ...

  5. QUrl的使用,特别是对含特殊字符的字符串进行 URL 格式化编码

    QUrl提取与写入参数QUrl url("www.baidu.com?a=666&b=888"); url.addQueryItem("); qDebug()&l ...

  6. Qt:QUrl构造时的qrc前缀

    参考(按对我帮助从大到小排列): Qt内的各种路径(让人迷惑) - 鬼谷子com - 博客园 qt webengineview 加载本地资源方式 - beautifulday - 博客园 (17条消息 ...

  7. Qt:QUrl

    1.说明 概述 一个代表URL的类,此外还支持国际域名(IDNs). 通常在初始化时传入QString构造QUrl,除此之外还能用setUrl(). URL有两种表示格式:编码.未编码.未编码URL常 ...

  8. QT5利用chromium内核与HTML页面交互

    在QT5.4之前,做QT开发浏览器只能选择QWebkit,但是有过使用的都会发现,这个webkit不是出奇的慢,简直是慢的令人发指,Release模式下还行,debug下你就无语了,但是webkit毕 ...

  9. qt添加cef库嵌入web [转]

    qt cef嵌入web 原文http://blog.sina.com.cn/s/blog_9e59cf590102vnfc.html 最近项目需要,研究了下libcef库. Cef(Chromium ...

随机推荐

  1. myeclipse部署maven项目到tomcat,src/main/resources里面配置文件部署不到webapp下classes

    解决myeclipse部署maven时,src/main/resources里面配置文件加载不到webapp下classes路径下的问题. 有时候是src/main/resources下面的,有时候是 ...

  2. 7个去伪存真的JavaScript面试题

    1.创建JavaScript对象的两种方法是什么? 这是一个非常简单的问题,如果你用过JavaScript的话.你至少得知道一种方法.但是,尽管如此,根据我的经验,也有很多自称是JavaScript程 ...

  3. GPS数据包格式解析

    四种定位系统:1.美国的全球定位系统(Global Positioning System,GPS)2.俄罗斯的格罗拉斯(Global Nabigation Satellite System,GLONA ...

  4. Log4j配置概述

    一.Log4j 简介 Log4j有三个主要的组件:Loggers(记录器),Appenders (输出源)和Layouts(布局).这里可简单理解为日志类别,日志要输出的地方和日志以何种形式输出.综合 ...

  5. python 多线程要点

    要点整理 多线程 #coding=utf-8 import threading from time import ctime,sleep def music(func): for i in range ...

  6. FPGA时序优化简单窍门

    尽量用硬核,比如硬件乘法器,这个应该都知道. 结构上的pipeline,简言之就是“拆",最极端的情形是拆到源和目的Reg间只有基本的组合逻辑门,比如说~a & b之类...:当然F ...

  7. 第三方登录之微信登录,基于ThinkSDK

    本文基于ThinkSDK,为其补充微信登录demo 增加ThinkSDK的微信第三方登录 阅读本文之前请先了解ThinkSDK的文档 http://www.echomod.com/nexstep/fo ...

  8. 显示本月日历demo

    import java.text.DateFormatSymbols; import java.util.Calendar; import java.util.GregorianCalendar; p ...

  9. HTTP头的Expires与Cache-control区别

    2010年3月24日 a18ccms 发表评论 阅读评论 今天在群里聊天.说道了Expires.这里来说明下这两个的区别吧. 1.概念 Cache-control 用于控制HTTP缓存(在HTTP/1 ...

  10. 用CSS制作小三角提示符号

    今天在项目中遇到了如下图的切图要求. 对,重点就是那个小三角提示符号. html 结构如下 <div class="wrap"> <div class=" ...