Web爬虫的C#请求发送

public class HttpControler

    {

        //post请求发送

        private Encoding m_Encoding = Encoding.GetEncoding("gb2312");

        public string Request(string strUrl,string postStr)

        {

            HttpWebRequest tHWRq = (HttpWebRequest)HttpWebRequest.Create(strUrl);

            tHWRq.CookieContainer = new CookieContainer();

            CookieContainer cookie = tHWRq.CookieContainer;//如果用不到Cookie，删去即可

            //以下是发送的http头，随便加，其中referer挺重要的，有些网站会根据这个来反盗链

            tHWRq.Referer = "http://www.cninfo.com.cn/cninfo-new/announcement/show";

            tHWRq.Accept = "application/json, text/javascript, */*; q=0.01";

            tHWRq.Headers["Accept-Language"] = "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3";

            //tHWRq.Headers["Accept-Charset"] = "GBK,utf-8;q=0.7,*;q=0.3";

            tHWRq.Headers["Accept-Encoding"] = "gzip, deflate";

            tHWRq.UserAgent = "User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0";

            tHWRq.KeepAlive = true;

            //上面的http头看情况而定，但是下面俩必须加

            tHWRq.ContentType = "application/x-www-form-urlencoded; charset=UTF-8";

            tHWRq.Method = "POST";

            tHWRq.Timeout =  * ;

            Encoding encoding = Encoding.UTF8;//根据网站的编码自定义  

            byte[] postData = encoding.GetBytes(postStr);//postDataStr即为发送的数据，格式还是和上次说的一样 

            try

            {

                tHWRq.ContentLength = postData.Length;

                Stream requestStream = tHWRq.GetRequestStream();

                requestStream.Write(postData, , postData.Length);

                requestStream.Close();

                using (HttpWebResponse tHWRp = (HttpWebResponse)tHWRq.GetResponse())

                {

                    using (Stream tStreamRp = tHWRp.GetResponseStream())

                    {

                        using (StreamReader tSR = new StreamReader(tStreamRp, m_Encoding))

                        {

                            string result = tSR.ReadToEnd();

                            tHWRq.Abort();

                            return result;//请求响应后返回的内容

                        }

                    }

                }

            }

            catch (Exception e)

            {

                try

                {

                    tHWRq.Abort();

                }

                catch (Exception err)

                {

                    throw err;

                }

                return "NoUrl";

            }

        }

        //Get请求发送

        public bool RequestCode(string strUrl,string path)

        {

            HttpWebRequest tHWRq = (HttpWebRequest)HttpWebRequest.Create(strUrl);

            tHWRq.CookieContainer = new CookieContainer();

            CookieContainer cookie = tHWRq.CookieContainer;//如果用不到Cookie，删去即可

            //以下是发送的http头，随便加，其中referer挺重要的，有些网站会根据这个来反盗链

            tHWRq.Referer = "http://www.cninfo.com.cn/cninfo-new/announcement/show";

            tHWRq.Accept = "application/json, text/javascript, */*; q=0.01";

            tHWRq.Headers["Accept-Language"] = "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3";

            tHWRq.Headers["Accept-Charset"] = "GBK,utf-8;q=0.7,*;q=0.3";

            tHWRq.UserAgent = "User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0";

            tHWRq.KeepAlive = true;

            //上面的http头看情况而定，但是下面俩必须加

            tHWRq.ContentType = "application/x-www-form-urlencoded; charset=UTF-8";

            tHWRq.Method = "GET";

            tHWRq.Timeout =  * ;

            string result = null;

            try

            {

                using (HttpWebResponse tHWRp = (HttpWebResponse)tHWRq.GetResponse())

                {

                    using (Stream tStreamRp = tHWRp.GetResponseStream())

                    {

                        using (StreamReader tSR = new StreamReader(tStreamRp))

                        {

                            result = tSR.ReadToEnd();

                        }

                    }

                }

                //正则表达式过滤想要的内容

                string patternCode = "\"code\":\"\\d{6,}\"";

                List<string> lstCode = new List<string>();

                Regex rgxUrl = new Regex(patternCode, RegexOptions.IgnoreCase);

                MatchCollection matches = rgxUrl.Matches(result);

                if (matches.Count > )

                {

                    foreach (Match matPage in matches)

                    {

                        string codeItem = matPage.Value;

                        if (!string.IsNullOrEmpty(codeItem))

                        {

                            string code = codeItem.Substring(codeItem.IndexOf(":") + );

                            lstCode.Add(code);

                        }

                    }

                } 

                using (FileStream fs = new FileStream(path, FileMode.Create, FileAccess.Write))

                {

                    using (StreamWriter sw = new StreamWriter(fs))

                    {

                        foreach (string code in lstCode)

                        {

                            sw.WriteLine(code);

                        }

                    }

                }

                tHWRq.Abort();

                return true;

            }

            catch (Exception e)

            {

                try

                {

                    tHWRq.Abort();

                }

                catch (Exception err)

                {

                    throw err;

                }

                return false;

            }

        }

    }

Web爬虫的C#请求发送的更多相关文章

第三百二十二节，web爬虫，requests请求
第三百二十二节,web爬虫,requests请求 requests请求,就是用yhthon的requests模块模拟浏览器请求,返回html源码模拟浏览器请求有两种,一种是不需要用户登录或者验证的请 ...
web爬虫，requests请求
requests请求,就是用yhthon的requests模块模拟浏览器请求,返回html源码模拟浏览器请求有两种,一种是不需要用户登录或者验证的请求,一种是需要用户登录或者验证的请求一.不需要用 ...
一 web爬虫，requests请求
requests请求,就是用python的requests模块模拟浏览器请求,返回html源码模拟浏览器请求有两种,一种是不需要用户登录或者验证的请求,一种是需要用户登录或者验证的请求一.不需要用 ...
1、web爬虫，requests请求
requests请求,就是用python的requests模块模拟浏览器请求,返回html源码模拟浏览器请求有两种,一种是不需要用户登录或者验证的请求,一种是需要用户登录或者验证的请求一.不需要用 ...
利用post请求发送内容进行爬虫
利用post请求发送内容进行爬虫 import requests url = 'http://www.iqianyue.com/mypost' header = {} header['Accept-L ...
第三百二十七节，web爬虫讲解2—urllib库爬虫—基础使用—超时设置—自动模拟http请求
第三百二十七节,web爬虫讲解2—urllib库爬虫利用python系统自带的urllib库写简单爬虫 urlopen()获取一个URL的html源码read()读出html源码内容decode(& ...
python爬虫---scrapy框架爬取图片,scrapy手动发送请求,发送post请求,提升爬取效率,请求传参(meta),五大核心组件,中间件
# settings 配置 UA USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, l ...
Web爬去的C#请求发送
public class HttpControler { //post请求发送 private Encoding m_Encoding = Encoding.GetEncoding("gb2 ...
第三百三十三节，web爬虫讲解2—Scrapy框架爬虫—Scrapy模拟浏览器登录—获取Scrapy框架Cookies
第三百三十三节,web爬虫讲解2—Scrapy框架爬虫—Scrapy模拟浏览器登录模拟浏览器登录 start_requests()方法,可以返回一个请求给爬虫的起始网站,这个返回的请求相当于star ...

随机推荐

qt setData()和data()
简述在GUI开发中,往往需要在界面中存储一些有用的数据,这些数据可以来自配置文件.注册表.数据库.或者是Server. 无论来自哪里,这些数据对于用户来说都是至关重要的,它们在交互过程中大部分都会被 ...
Spider Studio 社区信息
Spider Studio (采集工作站) 产品页面: http://www.gdtsearch.com/products.spiderstudio.htm QQ群: 45995410 - 有人驻场解 ...
Hadoop-2.2.0中文文档—— MapReduce 下一代 - Encrypted Shuffle
简单介绍 Encrypted Shuffle capability (加密洗牌功能? )同意用HTTPS 和可选的client验证 (也称作双向的 HTTPS, 或有client证书的 HTTPS) ...
word 操作教程
http://blog.163.com/haolongqin@126/blog/static/10999842220159993540527/ https://blog.csdn.net/ibigpi ...
C++ 友元类，友元函数
//友元函数友元类 #include<iostream> using namespace std; class PointB { public: friend class PointC; ...
阿里云ecs开启x11图形化桌面
阿里云帮助文档:https://www.alibabacloud.com/help/zh/faq-detail/41227.htm 安装云服务器 ECS CentOS 7 图形化桌面以安装 MATE ...
(转)FFMPEG-数据结构解释(AVCodecContext,AVStream,AVFormatContext)
AVCodecContext 这是一个描述编解码器上下文的数据结构,包含了众多编解码器需要的参数信息如果是单纯使用libavcodec,这部分信息需要调用者进行初始化:如果是使用整个FFMPEG库 ...
【BZOJ】2555: SubString（后缀自动机）
http://www.lydsy.com/JudgeOnline/problem.php?id=2555 学到了如何快速维护right值orz (不过这仍然是暴力维护,可以卡到O(n) 首先我们在加一 ...
python爬虫<urlopen error [Errno 10061] >
在网上看了十几篇文章,都是说的是IE的代理设置,具体是: Tools->Internet Options->Connections->Lan Settings 将代理服务器的小勾勾去 ...
Red Hat系统安装Redis
环境 RHLinux-6.4-64-EN, 红帽6.4 64位,英文正式公布版安装安装非常easy,先下载redis的压缩包,下载地址见这里.然后复制到你的linux机器.接着运行以下的命令. 1 ...

Web爬虫的C#请求发送

Web爬虫的C#请求发送的更多相关文章

随机推荐

热门专题