前面说过了,httpWebRequest的好处在于轻量,不需要界面,缺点在于无法执行javascript。这里再归纳一些问题。

1. 设置代理

1) httpWebRequest不支持https的代理,也就是说用不了某些vpn,你懂的。

2) 一般的写法:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

request.Proxy = new WebProxy(proxyUrl, true); //如:http://123.123.123.123:80

3) 使用Pac(自动配置代理脚本):

这个比较麻烦,需要win32 api,下面是一个类以及调用方法,有详细的注释,不用说是抄来的:

	public class Win32Api
{
#region AutoProxy Constants
/// <summary>
/// Applies only when setting proxy information
/// </summary>
public const int WINHTTP_ACCESS_TYPE_DEFAULT_PROXY = 0;
/// <summary>
/// Internet accessed through a direct connection
/// </summary>
public const int WINHTTP_ACCESS_TYPE_NO_PROXY = 1;
/// <summary>
/// Internet accessed using a proxy
/// </summary>
public const int WINHTTP_ACCESS_TYPE_NAMED_PROXY = 3;
/// <summary>
/// Attempt to automatically discover the URL of the
/// PAC file using both DHCP and DNS queries to the local network.
/// </summary>
public const int WINHTTP_AUTOPROXY_AUTO_DETECT = 0x00000001;
/// <summary>
/// Download the PAC file from the URL in the WINHTTP_AUTOPROXY_OPTIONS structure.
/// </summary>
public const int WINHTTP_AUTOPROXY_CONFIG_URL = 0x00000002;
/// <summary>
/// Executes the Web Proxy Auto-Discovery (WPAD) protocol in-process instead of
/// delegating to an out-of-process WinHTTP AutoProxy Service, if available.
/// This flag must be combined with one of the other flags
/// </summary>
public const int WINHTTP_AUTOPROXY_RUN_INPROCESS = 0x00010000;
/// <summary>
/// By default, WinHTTP is configured to fall back to auto-discover a proxy
/// in-process. If this fallback behavior is undesirable in the event that
/// an out-of-process discovery fails, it can be disabled using this flag.
/// </summary>
public const int WINHTTP_AUTOPROXY_RUN_OUTPROCESS_ONLY = 0x00020000;
/// <summary>
/// Use DHCP to locate the proxy auto-configuration file.
/// </summary>
public const int WINHTTP_AUTO_DETECT_TYPE_DHCP = 0x00000001;
/// <summary>
/// Use DNS to attempt to locate the proxy auto-configuration file at a
/// well-known location on the domain of the local computer
/// </summary>
public const int WINHTTP_AUTO_DETECT_TYPE_DNS_A = 0x00000002;
#endregion #region Proxy Structures
/// <summary>
/// The structure is used to indicate to the WinHttpGetProxyForURL
/// function whether to specify the URL of the Proxy Auto-Configuration
/// (PAC) file or to automatically locate the URL with DHCP or DNS
/// queries to the network
/// </summary>
[StructLayout(LayoutKind.Sequential, CharSet=CharSet.Unicode)]
public struct WINHTTP_AUTOPROXY_OPTIONS {
/// <summary>
/// Mechanisms should be used to obtain the PAC file
/// </summary>
[MarshalAs(UnmanagedType.U4)]
public int dwFlags;
/// <summary>
/// If dwflags includes the WINHTTP_AUTOPROXY_AUTO_DETECT flag,
/// then dwAutoDetectFlags specifies what protocols are to be
/// used to locate the PAC file. If both the DHCP and DNS auto
/// detect flags are specified, then DHCP is used first;
/// if no PAC URL is discovered using DHCP, then DNS is used.
/// If dwflags does not include the WINHTTP_AUTOPROXY_AUTO_DETECT
/// flag, then dwAutoDetectFlags must be zero.
/// </summary>
[MarshalAs(UnmanagedType.U4)]
public int dwAutoDetectFlags;
/// <summary>
/// If dwflags includes the WINHTTP_AUTOPROXY_CONFIG_URL flag, the
/// lpszAutoConfigUrl must point to a null-terminated Unicode string
/// that contains the URL of the proxy auto-configuration (PAC) file.
/// If dwflags does not include the WINHTTP_AUTOPROXY_CONFIG_URL flag,
/// then lpszAutoConfigUrl must be NULL.
/// </summary>
public string lpszAutoConfigUrl;
/// <summary>
/// Reserved for future use; must be NULL.
/// </summary>
public IntPtr lpvReserved;
/// <summary>
/// Reserved for future use; must be zero.
/// </summary>
[MarshalAs(UnmanagedType.U4)]
public int dwReserved;
/// <summary>
/// Specifies whether the client's domain credentials should be automatically
/// sent in response to an NTLM or Negotiate Authentication challenge when
/// WinHTTP requests the PAC file.
/// If this flag is TRUE, credentials should automatically be sent in response
/// to an authentication challenge. If this flag is FALSE and authentication
/// is required to download the PAC file, the WinHttpGetProxyForUrl fails.
/// </summary>
public bool fAutoLoginIfChallenged; } /// <summary>
/// The structure contains the session or default proxy configuration.
/// </summary>
[StructLayout(LayoutKind.Sequential, CharSet=CharSet.Unicode)]
public struct WINHTTP_PROXY_INFO {
/// <summary>
/// Unsigned long integer value that contains the access type
/// </summary>
[MarshalAs(UnmanagedType.U4)]
public int dwAccessType;
/// <summary>
/// Pointer to a string value that contains the proxy server list
/// </summary>
public string lpszProxy;
/// <summary>
/// Pointer to a string value that contains the proxy bypass list
/// </summary>
public string lpszProxyBypass;
}
#endregion #region WinHttp
/// <summary>
/// This function implements the Web Proxy Auto-Discovery (WPAD) protocol
/// for automatically configuring the proxy settings for an HTTP request.
/// The WPAD protocol downloads a Proxy Auto-Configuration (PAC) file,
/// which is a script that identifies the proxy server to use for a given
/// target URL. PAC files are typically deployed by the IT department within
/// a corporate network environment. The URL of the PAC file can either be
/// specified explicitly or WinHttpGetProxyForUrl can be instructed to
/// automatically discover the location of the PAC file on the local network.
/// </summary>
/// <param name="hSession">The WinHTTP session handle returned by the WinHttpOpen function</param>
/// <param name="lpcwszUrl">A pointer to a null-terminated Unicode string that contains the
/// URL of the HTTP request that the application is preparing to send.</param>
/// <param name="pAutoProxyOptions">A pointer to a WINHTTP_AUTOPROXY_OPTIONS structure that
/// specifies the auto-proxy options to use.</param>
/// <param name="pProxyInfo">A pointer to a WINHTTP_PROXY_INFO structure that receives the
/// proxy setting. This structure is then applied to the request handle using the
/// WINHTTP_OPTION_PROXY option.</param>
/// <returns></returns>
[DllImport("winhttp.dll", SetLastError=true, CharSet=CharSet.Unicode)]
public static extern bool WinHttpGetProxyForUrl(
IntPtr hSession,
string lpcwszUrl,
ref WINHTTP_AUTOPROXY_OPTIONS pAutoProxyOptions,
ref WINHTTP_PROXY_INFO pProxyInfo); /// <summary>
/// The function initializes, for an application, the use of WinHTTP
/// functions and returns a WinHTTP-session handle
/// </summary>
/// <param name="pwszUserAgent">A pointer to a string variable that contains the name of the
/// application or entity calling the WinHTTP functions.</param>
/// <param name="dwAccessType">Type of access required. This can be one of the following values</param>
/// <param name="pwszProxyName"> A pointer to a string variable that contains the name of the
/// proxy server to use when proxy access is specified by setting dwAccessType to
/// WINHTTP_ACCESS_TYPE_NAMED_PROXY. The WinHTTP functions recognize only CERN type proxies for HTTP.
/// If dwAccessType is not set to WINHTTP_ACCESS_TYPE_NAMED_PROXY, this parameter must be set
/// to WINHTTP_NO_PROXY_NAME</param>
/// <param name="pwszProxyBypass">A pointer to a string variable that contains an optional list
/// of host names or IP addresses, or both, that should not be routed through the proxy when
/// dwAccessType is set to WINHTTP_ACCESS_TYPE_NAMED_PROXY. The list can contain wildcard characters.
/// Do not use an empty string, because the WinHttpOpen function uses it as the proxy bypass list.
/// If this parameter specifies the "<local>" macro as the only entry, this function bypasses
/// any host name that does not contain a period. If dwAccessType is not set to WINHTTP_ACCESS_TYPE_NAMED_PROXY,
/// this parameter must be set to WINHTTP_NO_PROXY_BYPASS.</param>
/// <param name="dwFlags">Unsigned long integer value that contains the flags that indicate various options
/// affecting the behavior of this function</param>
/// <returns>Returns a valid session handle if successful, or NULL otherwise</returns>
[DllImport("winhttp.dll", SetLastError=true, CharSet=CharSet.Unicode)]
public static extern IntPtr WinHttpOpen(
string pwszUserAgent,
int dwAccessType,
IntPtr pwszProxyName,
IntPtr pwszProxyBypass,
int dwFlags
); /// <summary>
/// The function closes a single HINTERNET handle
/// </summary>
/// <param name="hInternet">Valid HINTERNET handle to be closed.</param>
/// <returns>Returns TRUE if the handle is successfully closed, or FALSE otherwise</returns>
[DllImport("winhttp.dll", SetLastError=true, CharSet=CharSet.Unicode)]
public static extern bool WinHttpCloseHandle(IntPtr hInternet); #endregion [DllImport("kernel32.dll")]
public static extern int GetLastError();
        private string getProxyForUrlUsingPac(string DestinationUrl, string PacUri)
{ IntPtr WinHttpSession = Win32Api.WinHttpOpen("User", Win32Api.WINHTTP_ACCESS_TYPE_DEFAULT_PROXY, IntPtr.Zero, IntPtr.Zero, 0); Win32Api.WINHTTP_AUTOPROXY_OPTIONS ProxyOptions = new Win32Api.WINHTTP_AUTOPROXY_OPTIONS();
Win32Api.WINHTTP_PROXY_INFO ProxyInfo = new Win32Api.WINHTTP_PROXY_INFO(); ProxyOptions.dwFlags = Win32Api.WINHTTP_AUTOPROXY_CONFIG_URL;
ProxyOptions.dwAutoDetectFlags = (Win32Api.WINHTTP_AUTO_DETECT_TYPE_DHCP | Win32Api.WINHTTP_AUTO_DETECT_TYPE_DNS_A);
ProxyOptions.lpszAutoConfigUrl = PacUri; // Get Proxy
bool IsSuccess = Win32Api.WinHttpGetProxyForUrl(WinHttpSession, DestinationUrl, ref ProxyOptions, ref ProxyInfo); Win32Api.WinHttpCloseHandle(WinHttpSession); if (IsSuccess)
{
return ProxyInfo.lpszProxy;
}
else
{
Console.WriteLine("Error: {0}", Win32Api.GetLastError());
return null;
}
}

  使用时,request.Proxy = new WebProxy(getProxyForUrlUsingPac(url, pac));

这里要注意一点,HttpWebRequest设置代理后,不要设置太多的Http Header,否则容易出问题。

3. 读取cookieContainer里的cookie

Hashtable table = (Hashtable)cookie.GetType().InvokeMember("m_domainTable",
BindingFlags.NonPublic |
BindingFlags.GetField |
BindingFlags.Instance,
null,
cookie,
new object[] { }); foreach (var tableKey in table.Keys)
{
String str_tableKey = (string)tableKey; if (str_tableKey[0] == '.')
{
str_tableKey = str_tableKey.Substring(1);
} SortedList list = (SortedList)table[tableKey].GetType().InvokeMember("m_list",
BindingFlags.NonPublic |
BindingFlags.GetField |
BindingFlags.Instance,
null,
table[tableKey],
new object[] { }); foreach (var listKey in list.Keys)
{
String uri = "https://" + str_tableKey + (string)listKey;
foreach (Cookie c in cookie.GetCookies(new Uri(uri)))
{//取cookie的Name, Value等属性,上面是https的domain,不难改写,增加支持http }
}
}

  

浏览器自动化的一些体会8 HttpWebRequest的几个问题的更多相关文章

  1. 浏览器自动化的一些体会2 webBrowser控件之ajax

    上个帖子简要讨论了浏览器自动化的几种方法.现在讨论webBrowser控件使用中的一些问题.基本的操作就不详细说了,随便网上找个帖子或找本书都有介绍的.这里只写点网上似乎少有人总结过的内容,以及自己的 ...

  2. 浏览器自动化的一些体会9 访问angular页面的一个问题

    发现浏览器自动化有一个重要方面没有提及,即所谓的无页面浏览器,不过最近没有需求,不想尝试,先记上一笔,以后有需求时,可以有个思路. 大约一两个月前(现在比较懒散,时间不知不觉过去,连今天是几号有时候都 ...

  3. 浏览器自动化的一些体会9 webBrowser控件之零碎问题3

    WebBrowser控件最大的优点是可以轻松嵌入win form程序中,但是微软好像对这个控件没什么兴趣,这么多年了还没有改进,结果造成一堆问题. 1. 不支持https 2. 缺省模拟ie 7,如果 ...

  4. 浏览器自动化的一些体会6 增强的webBrowser控件

    这里谈两点 1.支持代理服务器切换 一种方法是修改注册表,不是太好的做法,而且,只能改全局设置,不能改局部(比如只让当前的webBrowser控件使用代理,而其他应用不用代理) 另外一个较好的方法,示 ...

  5. 浏览器自动化的一些体会3 webBrowser控件之零碎问题

    1. 一般需要执行这一句:webBrowser1.ScriptErrorsSuppressed = true; 主要目的是禁止跳出javascript错误的对话框,否则会导致程序无法正确地跑下去.缺点 ...

  6. 浏览器自动化的一些体会11 webclient的异步操作

    原来的代码大致如下: private void foo(string url) { using (WebClient client = new WebClient()) { client.Downlo ...

  7. 浏览器自动化的一些体会7 selenium webdriver的一些问题

    1. 下载图片 这个链接说得最好,差不多所有可能的方法都列举了,除了没有提到用URLDownloadToFile,不过这和用WebClient差不多. https://stackoverflow.co ...

  8. 浏览器自动化的一些体会5 webBrowser控件之winform和webBrowser的交互

    从winform访问webBrowser,大致就是利用webBrowser提供的解析dom的方法以及用InvokeScript方法执行javascript.这个相对比较简单. 从webBrowser访 ...

  9. 浏览器自动化的一些体会4 webBrowser控件之零碎问题2

    1. DocumentCompleted的多次执行问题 有的网页,会多次触发DocumentCompleted事件,由于它是异步的,不会阻塞,所以如果不恰当处理,会造成某些代码被错误地多次执行,造成意 ...

随机推荐

  1. 题解 洛谷 P3639 【[APIO2013]道路费用 】

    不难想到可以\(2^k\)去枚举\(k\)条新边的选择方案,然后加入原图中的边来使图连通,用当前方案的收益去更新答案,但是这样复杂度过不去. 可以先把\(k\)条新边都连上,然后再加入边权从小到大排序 ...

  2. Python 什么时候会被取代?

      以下是译文: Python经过了几十年的努力才得到了编程社区的赏识.自2010年以来,Python得到了蓬勃发展,并最终超越了C.C#.Java和JavaScript. 但是,这种趋势将持续到什么 ...

  3. http安全

    https介绍  因为HTTP是明文传输,所以不安全,容易被黑客窃听或窜改: 通信安全必须同时具备机密性.完整性,身份认证和不可否认这四个特性 HTTPS的语法.语义仍然是HTTP,但把下层的协议由T ...

  4. Html5 表单元素基础

    表单元素 1.定义: 表单是提供让读者在网页上输入,勾选和选取数据,以便提交给服务器数据库的工具.(邮箱注册,用户登录,调查问卷等) 2.表单元素(下拉框,输入框……) 3.表单主结构: <fo ...

  5. Java开发环境配置之安装JDK

    一:序言摘要 学习过Java的人都知道,如果想要开发一套java程序,首先需要做的准备工作就是配置JDK.JDK是 Java 语言的软件开发工具包,它主要用于移动设备.嵌入式设备上的java应用程序. ...

  6. Azure AD(五)使用多租户应用程序模式让任何 Azure Active Directory 用户登录

    一,引言 距离上次分享关于 “Azure AD" 的知识过去差不多2个多月了, 今天最近刚好也是学习,分享一下关于Azure AD 使用多租户应用程序模式让任何 Azure Active D ...

  7. PHP xml_set_notation_decl_handler() 函数

    定义和用法 xml_set_notation_decl_handler() 函数规定当解析器在 XML 文档中找到符号声明时被调用的函数. 如果成功,该函数则返回 TRUE.如果失败,则返回 FALS ...

  8. NOI On Line 提高组题解

    (话说其实我想填的是去年CSP的坑...但是貌似有一道题我还不会写咕咕咕... 先写一下这一次的题解吧. T1:序列.题意省略. 两种操作.这种题要先分析部分分 给出了全部都是2操作的子任务. 发现A ...

  9. python4.1定义函数

    def add(a,b,c,d): e=a+b+c-d return e result=add(21,32,43,56)print("加和结果是:",result) def zzj ...

  10. “随手记”开发记录day05

    今天完成了关于统计页面里面的总览页面 里面的功能有可以显示你这个月的花费最多的账单,和收入最多的页面 还有总计 运行效果如图所示