C#获取网页信息并存入数据库
1,获取以及商品分类信息
给一网页获取网页上商品信息的分类
using Skay.WebBot;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading;
using System.Windows.Forms;
using Ivony.Html;
using Ivony.Html.Parser;
using System.Data.SqlClient; namespace catchGoods
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public static Thread th;
private void button1_Click(object sender, EventArgs e)
{
th = new Thread(GetJDData);
th.Start();
}
public void GetJDData()
{
SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn.Open(); string str = "http://www.htluxe.com";
HttpUtility http = new HttpUtility();
string html = http.GetHtmlText(str);
var documenthtml = new JumonyParser().Parse(html);
var items = documenthtml.Find(".categroup dl");
foreach(var item in items)
{
string name = item.FindFirst("h4 a").InnerText();
string remarkOdd = item.FindFirst("h4 a").Attribute("href").Value();
string remark = remarkOdd.Split('=')[];
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(name); }));//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼(委托?
string into = string.Format("insert into exerciseOneSort (className, remark) values ('" + name + "', '" + remark + "')");
SqlCommand com = new SqlCommand(into, conn);
int i = com.ExecuteNonQuery(); var elements = item.Find("dt p a");
foreach(var element in elements)
{
string nameTwo = element.InnerText();
string url = "http://www.htluxe.com/" + element.Attribute("href").Value();
string intoTwo = string.Format("insert into exerciseTwoSort (className, url, idplus) values ('" + nameTwo + "', '" + url + "', '" + remark + "')");
SqlCommand comTwo = new SqlCommand(intoTwo, conn);
int j = comTwo.ExecuteNonQuery();
}
}
}
}
}
完整版
using Skay.WebBot;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading;
using System.Windows.Forms;
using Ivony.Html;
using Ivony.Html.Parser;
using System.Data.SqlClient;
using Newtonsoft.Json.Linq;
using Newtonsoft.Json; namespace catchGoods
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public static Thread th;
private void button1_Click(object sender, EventArgs e)
{
th = new Thread(GetJDDataOne);
th.Start();
//SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
//conn.Open();
//string str = string.Format("delete from exerciseTwoSort");
//SqlCommand com = new SqlCommand(str, conn);
//int w = com.ExecuteNonQuery();
}
public void GetJDDataOne()
{
SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn.Open(); string str = "http://www.htluxe.com";
HttpUtility http = new HttpUtility();
string html = http.GetHtmlText(str);
var documenthtml = new JumonyParser().Parse(html);
var items = documenthtml.Find(".categroup dl");
foreach(var item in items)
{
string name = item.FindFirst("h4 a").InnerText();
string remarkOdd = item.FindFirst("h4 a").Attribute("href").Value();
string remark = remarkOdd.Split('=')[];
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(name+" "+remark); }));//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼
string into = string.Format("insert into exerciseOneSort (className, remark) values ('" + name + "', '" + remark + "')");
SqlCommand com = new SqlCommand(into, conn);
int i = com.ExecuteNonQuery(); var elements = item.Find("dt p a");
foreach(var element in elements)
{
string nameTwo = element.InnerText();
string url = "http://www.htluxe.com/" + element.Attribute("href").Value();
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(nameTwo + " " +url + " " + remark); }));//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼
string intoTwo = string.Format("insert into exerciseTwoSort (className, url, idplus) values ('" + nameTwo + "', '" + url + "', '" + remark + "')");
SqlCommand comTwo = new SqlCommand(intoTwo, conn);
int j = comTwo.ExecuteNonQuery();
}
}
}
int page = ;
string surl;
public static Thread th2;
private void button2_Click(object sender, EventArgs e)
{
listBox1.Items.Clear();
th2 = new Thread(threadTwo);
th2.Start();
//SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
//conn.Open();
//string str = string.Format("delete from GoodsList");
//SqlCommand com = new SqlCommand(str, conn);
//int d = com.ExecuteNonQuery();
//MessageBox.Show(Convert.ToString(d));
}
public void threadTwo()
{
SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn.Open();
//如果字符串中含有单引号,解决方法1----------------------------------
//string titlestr = "念佛'夜晚访'问欧诺'法";
//string pricestr = "99.00";
//string sqlstr = string.Format("insert into goods (name,price) values (@name,'" + pricestr + "')");
//SqlCommand sqlcom = new SqlCommand(sqlstr, conn);
//sqlcom.Parameters.Add("@name", titlestr);
//sqlcom.ExecuteNonQuery();
//解决方法2-----------------------------------------------------------------------
//string bufffuck = "fdgjjf'fgfgf";
//bufffuck = bufffuck.Replace("'", "''");
//string sqlstr = string.Format("insert into goods (name) values ('"+bufffuck+"')");
//SqlCommand sqlcom = new SqlCommand(sqlstr, conn);
//int y = sqlcom.ExecuteNonQuery(); string sel = string.Format("select url from exerciseTwoSort");
DataTable dt = new DataTable();
SqlDataAdapter dapt = new SqlDataAdapter(sel, conn);
dapt.Fill(dt); for (int i = ; i < dt.Rows.Count; i++)
{
surl = dt.Rows[i][].ToString();
HttpUtility httpTwo = new HttpUtility();
string htmlTwo = httpTwo.GetHtmlText(surl);
var documenthtml = new JumonyParser().Parse(htmlTwo);
var pageto = Convert.ToString(documenthtml.FindFirst(".goods-page-min label").InnerText());
page = Convert.ToInt32(pageto.Split('/')[]);
GetJDData();
}
}
void GetJDData()
{
for (int j = ; j <= page; j++)
{
string htmlTwo = surl + "&price_min=0&price_max=0&page=" + j + "&sort=sort_order%20asc,last_update&order=DESC";
HttpUtility httpMid = new HttpUtility();
string htmlMid = httpMid.GetHtmlText(htmlTwo);
var documenthtmlMid = new JumonyParser().Parse(htmlMid);
var items = documenthtmlMid.Find(".piclist li");
foreach(var item in items)
{
string title = item.FindFirst(".base a").InnerText();
title = title.Replace("'", "''");
//string goodsurl = "http://www.htluxe.com/"+item.FindFirst(".base a").Attribute("href").Value();
//string subhtml = http.GetHtmlText(goodsurl, "utf-8", "text/html; charset=utf-8");
//string Area_Html = http.GetHtmlText(goodsurl.Split('?')[0] + "?act=price&" + goodsurl.Split('?')[1], "utf-8", "text/html;charset=utf-8", "");
try
{
string nowPrice = item.FindFirst(".minprice").InnerText();
string oldPrice = item.FindFirst(".maxprice").InnerText();
string popular = item.FindFirst(".ratecount strong").InnerText();
string sales = item.FindFirst(".soldnum strong").InnerText();
string contents = item.FindFirst(".commentcount strong").InnerText().ToString();
string htmlThree = "http://www.htluxe.com/" + item.FindFirst("dt a").Attribute("href").Value().ToString();
HttpUtility httpThree = new HttpUtility();
string htmlBuff = httpThree.GetHtmlText(htmlThree);
var documenthtmlThree = new JumonyParser().Parse(htmlBuff);
string sben = documenthtmlThree.FindFirst(".promotionMiddleTop p").InnerText().ToString();
string num = sben.Split(':')[]; string starLevel = documenthtmlThree.FindFirst(".m-ratescore i").InnerText().ToString();
bufff(title, nowPrice, oldPrice, popular, sales, num, contents, starLevel);
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(title + " " + nowPrice + " " + num + " " + oldPrice + " " + sales + " " + popular + " " + contents + " " + starLevel); }));
//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼
//this.listBox1.Items.Add("");
//listBox1.Items.Add(title + " " + nowPrice + " " + num + " " + oldPrice + " " + sales + " " + popular); }
catch
{
MessageBox.Show("异常");
} } }
}
private static void bufff(string title, string nowPrice, string oldPrice,
string popular, string sales, string num, string contents, string starLevel)
{
SqlConnection conn2 = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn2.Open(); string strstr = string.Format("insert into GoodsList (name, num, sales, popular, starLevel, contents, price, oldPrice) values ('" + title + "', '" +num + "', '" + sales + "', '" + popular + "', '"+starLevel+"', '"+contents+"', '" + nowPrice + "', '" + oldPrice + "')");
SqlCommand com2 = new SqlCommand(strstr, conn2);
int g = com2.ExecuteNonQuery();
}
}
}
C#获取网页信息并存入数据库的更多相关文章
- C# HttpWebRequest 绝技 根据URL地址获取网页信息
如果要使用中间的方法的话,可以访问我的帮助类完全免费开源:C# HttpHelper,帮助类,真正的Httprequest请求时无视编码,无视证书,无视Cookie,网页抓取 1.第一招,根据URL地 ...
- 使用URLConnection获取网页信息的基本流程
参考自core java v2, chapter3 Networking. 注:URLConnection的子类HttpURLConnection被广泛用于Android网络客户端编程,它与apach ...
- 使用URLConnection获取网页信息的基本流程 分类: H1_ANDROID 2013-10-12 23:51 3646人阅读 评论(0) 收藏
参考自core java v2, chapter3 Networking. 注:URLConnection的子类HttpURLConnection被广泛用于Android网络客户端编程,它与apach ...
- C# 获取网页信息
获取网页源码 ///通过HttpWebResponse public string GetUrlHtml(string url) { string strHtml = string.Empty; Ht ...
- Python 爬虫 招聘信息并存入数据库
新学习了selenium,啪一下腾讯招聘 from lxml import etree from selenium import webdriver import pymysql def Geturl ...
- C#获取网页信息核心方法(入门一)
目录:信息采集入门系列目录 下面记录的是我自己整理的C#请求页面核心类,主要有如下几个方法 1.HttpWebRequest Get请求获得页面html 2.HttpWebRequest Post请求 ...
- python爬虫爬取ip记录网站信息并存入数据库
import requests import re import pymysql #10页 仔细观察路由 db = pymysql.connect("localhost",&quo ...
- python获取网页信息的三种方法
import urllib.request import http.cookiejar url = 'http://www.baidu.com/' # 方法一 print('方法一') req_one ...
- 获取网页上数据(图片、文字、视频)-b
Demo地址:http://download.csdn.net/detail/u012881779/8831835 获取网页上所有图片.获取所有html.获取网页title.获取网页内容文字... . ...
随机推荐
- python之网络部分
1.C/S B/S架构 C: client端 B: browse 浏览器 S: server端 C/S架构: 基于客户端与服务端之间的通信 QQ, 游戏,皮皮虾, 快手,抖音. 优点: 个性化 ...
- PostgreSQL-事务与commit优化
基本概念 事务 Transaction 是 数据库管理系统DBMS 执行过程中的一个逻辑单元,是一个 sql命令组成的序列. 其特点在于,当事务被提交DBMS后,DBMS需要确保所有的操作被完成:如果 ...
- 解决 SQLPlus无法登陆oracle,PLSql可以登陆,报错ORA-12560
使用Oracle 11g 64位服务器,安装64位.32位客户端,出现SQLPlus无法连接数据库,PLSql可以连接问题. 网上查了很多,都不能解决问题,在下面提供一种. 环境变量 右击计算机属性- ...
- CentOS下配置Apache HTTPS
一.安装Apache支持SSL/TLS yum install mod_ssl openssl 二.创建证书 证书(Cerificate)的基本作用是将一个公钥和安全个体(个人.公司.组织等)的名字绑 ...
- ES中的查询操作
1.前缀查询 先输入数据: PUT /my_index/address/ { "postcode": "W1 3DG" } PUT /my_index/addr ...
- OneDrive高速下载链接分享
目录 1. 下载帮助 2. 本文地址 3. 资源链接 4. 打赏&支持 5. 关于&联系我 1. 下载帮助 OneDrive下载教程,建议不了解的先看下: https://www.cn ...
- Docker之rm: Device or resource busy
docker 容器里 rm -rf /data 提示: rm: cannot remove ‘/data’: Device or resource busy 原因: 在建立容器的时候做了相应目录的挂载 ...
- N皇后问题 回溯法 C/C++
一:问题描述 N皇后问题(含八皇后问题的拓展,规则同四皇后):在N*N的棋盘上,放置N个皇后,要求每一横行每一列,每一对角线上均只能放置一个皇后,求解可能的方案及方案数. 二:代码及结果如下 #inc ...
- (转) 修改weblogic部署的应用名称
通过weblogic管理后台console进行发布本地项目的时候,它会默认以WEB-INF的上一级目录作为访问路径, 如,假如你的项目WEB-INF目录的上一层是WebRoot,那么发布后, 访问的路 ...
- Min-max theorem
wiki链接