C#获取网页信息并存入数据库
1,获取以及商品分类信息
给一网页获取网页上商品信息的分类
using Skay.WebBot;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading;
using System.Windows.Forms;
using Ivony.Html;
using Ivony.Html.Parser;
using System.Data.SqlClient; namespace catchGoods
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public static Thread th;
private void button1_Click(object sender, EventArgs e)
{
th = new Thread(GetJDData);
th.Start();
}
public void GetJDData()
{
SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn.Open(); string str = "http://www.htluxe.com";
HttpUtility http = new HttpUtility();
string html = http.GetHtmlText(str);
var documenthtml = new JumonyParser().Parse(html);
var items = documenthtml.Find(".categroup dl");
foreach(var item in items)
{
string name = item.FindFirst("h4 a").InnerText();
string remarkOdd = item.FindFirst("h4 a").Attribute("href").Value();
string remark = remarkOdd.Split('=')[];
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(name); }));//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼(委托?
string into = string.Format("insert into exerciseOneSort (className, remark) values ('" + name + "', '" + remark + "')");
SqlCommand com = new SqlCommand(into, conn);
int i = com.ExecuteNonQuery(); var elements = item.Find("dt p a");
foreach(var element in elements)
{
string nameTwo = element.InnerText();
string url = "http://www.htluxe.com/" + element.Attribute("href").Value();
string intoTwo = string.Format("insert into exerciseTwoSort (className, url, idplus) values ('" + nameTwo + "', '" + url + "', '" + remark + "')");
SqlCommand comTwo = new SqlCommand(intoTwo, conn);
int j = comTwo.ExecuteNonQuery();
}
}
}
}
}
完整版
using Skay.WebBot;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading;
using System.Windows.Forms;
using Ivony.Html;
using Ivony.Html.Parser;
using System.Data.SqlClient;
using Newtonsoft.Json.Linq;
using Newtonsoft.Json; namespace catchGoods
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public static Thread th;
private void button1_Click(object sender, EventArgs e)
{
th = new Thread(GetJDDataOne);
th.Start();
//SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
//conn.Open();
//string str = string.Format("delete from exerciseTwoSort");
//SqlCommand com = new SqlCommand(str, conn);
//int w = com.ExecuteNonQuery();
}
public void GetJDDataOne()
{
SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn.Open(); string str = "http://www.htluxe.com";
HttpUtility http = new HttpUtility();
string html = http.GetHtmlText(str);
var documenthtml = new JumonyParser().Parse(html);
var items = documenthtml.Find(".categroup dl");
foreach(var item in items)
{
string name = item.FindFirst("h4 a").InnerText();
string remarkOdd = item.FindFirst("h4 a").Attribute("href").Value();
string remark = remarkOdd.Split('=')[];
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(name+" "+remark); }));//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼
string into = string.Format("insert into exerciseOneSort (className, remark) values ('" + name + "', '" + remark + "')");
SqlCommand com = new SqlCommand(into, conn);
int i = com.ExecuteNonQuery(); var elements = item.Find("dt p a");
foreach(var element in elements)
{
string nameTwo = element.InnerText();
string url = "http://www.htluxe.com/" + element.Attribute("href").Value();
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(nameTwo + " " +url + " " + remark); }));//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼
string intoTwo = string.Format("insert into exerciseTwoSort (className, url, idplus) values ('" + nameTwo + "', '" + url + "', '" + remark + "')");
SqlCommand comTwo = new SqlCommand(intoTwo, conn);
int j = comTwo.ExecuteNonQuery();
}
}
}
int page = ;
string surl;
public static Thread th2;
private void button2_Click(object sender, EventArgs e)
{
listBox1.Items.Clear();
th2 = new Thread(threadTwo);
th2.Start();
//SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
//conn.Open();
//string str = string.Format("delete from GoodsList");
//SqlCommand com = new SqlCommand(str, conn);
//int d = com.ExecuteNonQuery();
//MessageBox.Show(Convert.ToString(d));
}
public void threadTwo()
{
SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn.Open();
//如果字符串中含有单引号,解决方法1----------------------------------
//string titlestr = "念佛'夜晚访'问欧诺'法";
//string pricestr = "99.00";
//string sqlstr = string.Format("insert into goods (name,price) values (@name,'" + pricestr + "')");
//SqlCommand sqlcom = new SqlCommand(sqlstr, conn);
//sqlcom.Parameters.Add("@name", titlestr);
//sqlcom.ExecuteNonQuery();
//解决方法2-----------------------------------------------------------------------
//string bufffuck = "fdgjjf'fgfgf";
//bufffuck = bufffuck.Replace("'", "''");
//string sqlstr = string.Format("insert into goods (name) values ('"+bufffuck+"')");
//SqlCommand sqlcom = new SqlCommand(sqlstr, conn);
//int y = sqlcom.ExecuteNonQuery(); string sel = string.Format("select url from exerciseTwoSort");
DataTable dt = new DataTable();
SqlDataAdapter dapt = new SqlDataAdapter(sel, conn);
dapt.Fill(dt); for (int i = ; i < dt.Rows.Count; i++)
{
surl = dt.Rows[i][].ToString();
HttpUtility httpTwo = new HttpUtility();
string htmlTwo = httpTwo.GetHtmlText(surl);
var documenthtml = new JumonyParser().Parse(htmlTwo);
var pageto = Convert.ToString(documenthtml.FindFirst(".goods-page-min label").InnerText());
page = Convert.ToInt32(pageto.Split('/')[]);
GetJDData();
}
}
void GetJDData()
{
for (int j = ; j <= page; j++)
{
string htmlTwo = surl + "&price_min=0&price_max=0&page=" + j + "&sort=sort_order%20asc,last_update&order=DESC";
HttpUtility httpMid = new HttpUtility();
string htmlMid = httpMid.GetHtmlText(htmlTwo);
var documenthtmlMid = new JumonyParser().Parse(htmlMid);
var items = documenthtmlMid.Find(".piclist li");
foreach(var item in items)
{
string title = item.FindFirst(".base a").InnerText();
title = title.Replace("'", "''");
//string goodsurl = "http://www.htluxe.com/"+item.FindFirst(".base a").Attribute("href").Value();
//string subhtml = http.GetHtmlText(goodsurl, "utf-8", "text/html; charset=utf-8");
//string Area_Html = http.GetHtmlText(goodsurl.Split('?')[0] + "?act=price&" + goodsurl.Split('?')[1], "utf-8", "text/html;charset=utf-8", "");
try
{
string nowPrice = item.FindFirst(".minprice").InnerText();
string oldPrice = item.FindFirst(".maxprice").InnerText();
string popular = item.FindFirst(".ratecount strong").InnerText();
string sales = item.FindFirst(".soldnum strong").InnerText();
string contents = item.FindFirst(".commentcount strong").InnerText().ToString();
string htmlThree = "http://www.htluxe.com/" + item.FindFirst("dt a").Attribute("href").Value().ToString();
HttpUtility httpThree = new HttpUtility();
string htmlBuff = httpThree.GetHtmlText(htmlThree);
var documenthtmlThree = new JumonyParser().Parse(htmlBuff);
string sben = documenthtmlThree.FindFirst(".promotionMiddleTop p").InnerText().ToString();
string num = sben.Split(':')[]; string starLevel = documenthtmlThree.FindFirst(".m-ratescore i").InnerText().ToString();
bufff(title, nowPrice, oldPrice, popular, sales, num, contents, starLevel);
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(title + " " + nowPrice + " " + num + " " + oldPrice + " " + sales + " " + popular + " " + contents + " " + starLevel); }));
//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼
//this.listBox1.Items.Add("");
//listBox1.Items.Add(title + " " + nowPrice + " " + num + " " + oldPrice + " " + sales + " " + popular); }
catch
{
MessageBox.Show("异常");
} } }
}
private static void bufff(string title, string nowPrice, string oldPrice,
string popular, string sales, string num, string contents, string starLevel)
{
SqlConnection conn2 = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn2.Open(); string strstr = string.Format("insert into GoodsList (name, num, sales, popular, starLevel, contents, price, oldPrice) values ('" + title + "', '" +num + "', '" + sales + "', '" + popular + "', '"+starLevel+"', '"+contents+"', '" + nowPrice + "', '" + oldPrice + "')");
SqlCommand com2 = new SqlCommand(strstr, conn2);
int g = com2.ExecuteNonQuery();
}
}
}
C#获取网页信息并存入数据库的更多相关文章
- C# HttpWebRequest 绝技 根据URL地址获取网页信息
如果要使用中间的方法的话,可以访问我的帮助类完全免费开源:C# HttpHelper,帮助类,真正的Httprequest请求时无视编码,无视证书,无视Cookie,网页抓取 1.第一招,根据URL地 ...
- 使用URLConnection获取网页信息的基本流程
参考自core java v2, chapter3 Networking. 注:URLConnection的子类HttpURLConnection被广泛用于Android网络客户端编程,它与apach ...
- 使用URLConnection获取网页信息的基本流程 分类: H1_ANDROID 2013-10-12 23:51 3646人阅读 评论(0) 收藏
参考自core java v2, chapter3 Networking. 注:URLConnection的子类HttpURLConnection被广泛用于Android网络客户端编程,它与apach ...
- C# 获取网页信息
获取网页源码 ///通过HttpWebResponse public string GetUrlHtml(string url) { string strHtml = string.Empty; Ht ...
- Python 爬虫 招聘信息并存入数据库
新学习了selenium,啪一下腾讯招聘 from lxml import etree from selenium import webdriver import pymysql def Geturl ...
- C#获取网页信息核心方法(入门一)
目录:信息采集入门系列目录 下面记录的是我自己整理的C#请求页面核心类,主要有如下几个方法 1.HttpWebRequest Get请求获得页面html 2.HttpWebRequest Post请求 ...
- python爬虫爬取ip记录网站信息并存入数据库
import requests import re import pymysql #10页 仔细观察路由 db = pymysql.connect("localhost",&quo ...
- python获取网页信息的三种方法
import urllib.request import http.cookiejar url = 'http://www.baidu.com/' # 方法一 print('方法一') req_one ...
- 获取网页上数据(图片、文字、视频)-b
Demo地址:http://download.csdn.net/detail/u012881779/8831835 获取网页上所有图片.获取所有html.获取网页title.获取网页内容文字... . ...
随机推荐
- 使用ActiveMQ实现JMS消息通信服务
PTP(点对点的消息模型) 在点对点模型中,相当于两个人打电话,两个人独享一条通信线路.一方发送消息,一方接收消息. 在p2p的模型中,双方通过队列交流,一个队列只有一个生产者和一个消费者. 1.建立 ...
- 一个php文件就可以把数据库的数据导出Excel表格
数据库内容太多,复制粘贴太麻烦?那就用代码实现把,把代码写好了,导出还不容易吗,访问即可导出. excel.php <?php error_reporting(E_ALL ^ E_DEPRECA ...
- [.net core]11.异常页
.net core中的异常页很重要 因为可以查看异常的堆栈信息, 请求的参数(如果有),cookie, http头 帮助我们快速的定位问题 .net core web app 默认开启了异常页,但是 ...
- [wpf]wpf full screen.
void window_KeyDown(object sender,KeyEventArgs e) { if(e.Key == Key.F11) { Window.ResizeMode = Resiz ...
- java向word中插入Excel附件
1.word中插入对象的原理 编辑word,向word中插入图片.EXCEL.WORD等附件,再将word保存为xml格式,通过XML查看工具打开xml格式的word的源码,通过对比源码, 可以发现平 ...
- linux php环境搭建
1.我使用的是一键安装包 下载地址: https://lnmp.org/download.html2.我下载的是完整包 http://soft.vpser.net/lnmp/lnmp1.4-full. ...
- android中的rn项目更新gradle及补充二
修改build.gradle的版本,com.android.tools.build:gradle:2.1.0, 改为更高的,然后更改gradle/wrapper/gradle-wrapper.prop ...
- RabbitMQ从安装到使用
一.在Linux中安装RabbitMQ 通过Docker安装: 获取镜像(选用management是带有管理界面的) docker pull rabbitmq:-management 查看下载好的镜像 ...
- wpf Textbox 回车就换行
将 TextWrapping 属性设置为 Wrap 会导致输入的文本在到达 TextBox 控件的边缘时换至新行,必要时会自动扩展 TextBox 控件以便为新行留出空间. 将 AcceptsRetu ...
- apache 部署
<VirtualHost *:80> ServerAdmin webmaster@dummy-host.localhost DocumentRoot "D:/EmpireServ ...