C#获取网页信息并存入数据库
1,获取以及商品分类信息
给一网页获取网页上商品信息的分类
using Skay.WebBot;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading;
using System.Windows.Forms;
using Ivony.Html;
using Ivony.Html.Parser;
using System.Data.SqlClient; namespace catchGoods
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public static Thread th;
private void button1_Click(object sender, EventArgs e)
{
th = new Thread(GetJDData);
th.Start();
}
public void GetJDData()
{
SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn.Open(); string str = "http://www.htluxe.com";
HttpUtility http = new HttpUtility();
string html = http.GetHtmlText(str);
var documenthtml = new JumonyParser().Parse(html);
var items = documenthtml.Find(".categroup dl");
foreach(var item in items)
{
string name = item.FindFirst("h4 a").InnerText();
string remarkOdd = item.FindFirst("h4 a").Attribute("href").Value();
string remark = remarkOdd.Split('=')[];
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(name); }));//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼(委托?
string into = string.Format("insert into exerciseOneSort (className, remark) values ('" + name + "', '" + remark + "')");
SqlCommand com = new SqlCommand(into, conn);
int i = com.ExecuteNonQuery(); var elements = item.Find("dt p a");
foreach(var element in elements)
{
string nameTwo = element.InnerText();
string url = "http://www.htluxe.com/" + element.Attribute("href").Value();
string intoTwo = string.Format("insert into exerciseTwoSort (className, url, idplus) values ('" + nameTwo + "', '" + url + "', '" + remark + "')");
SqlCommand comTwo = new SqlCommand(intoTwo, conn);
int j = comTwo.ExecuteNonQuery();
}
}
}
}
}
完整版
using Skay.WebBot;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading;
using System.Windows.Forms;
using Ivony.Html;
using Ivony.Html.Parser;
using System.Data.SqlClient;
using Newtonsoft.Json.Linq;
using Newtonsoft.Json; namespace catchGoods
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public static Thread th;
private void button1_Click(object sender, EventArgs e)
{
th = new Thread(GetJDDataOne);
th.Start();
//SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
//conn.Open();
//string str = string.Format("delete from exerciseTwoSort");
//SqlCommand com = new SqlCommand(str, conn);
//int w = com.ExecuteNonQuery();
}
public void GetJDDataOne()
{
SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn.Open(); string str = "http://www.htluxe.com";
HttpUtility http = new HttpUtility();
string html = http.GetHtmlText(str);
var documenthtml = new JumonyParser().Parse(html);
var items = documenthtml.Find(".categroup dl");
foreach(var item in items)
{
string name = item.FindFirst("h4 a").InnerText();
string remarkOdd = item.FindFirst("h4 a").Attribute("href").Value();
string remark = remarkOdd.Split('=')[];
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(name+" "+remark); }));//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼
string into = string.Format("insert into exerciseOneSort (className, remark) values ('" + name + "', '" + remark + "')");
SqlCommand com = new SqlCommand(into, conn);
int i = com.ExecuteNonQuery(); var elements = item.Find("dt p a");
foreach(var element in elements)
{
string nameTwo = element.InnerText();
string url = "http://www.htluxe.com/" + element.Attribute("href").Value();
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(nameTwo + " " +url + " " + remark); }));//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼
string intoTwo = string.Format("insert into exerciseTwoSort (className, url, idplus) values ('" + nameTwo + "', '" + url + "', '" + remark + "')");
SqlCommand comTwo = new SqlCommand(intoTwo, conn);
int j = comTwo.ExecuteNonQuery();
}
}
}
int page = ;
string surl;
public static Thread th2;
private void button2_Click(object sender, EventArgs e)
{
listBox1.Items.Clear();
th2 = new Thread(threadTwo);
th2.Start();
//SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
//conn.Open();
//string str = string.Format("delete from GoodsList");
//SqlCommand com = new SqlCommand(str, conn);
//int d = com.ExecuteNonQuery();
//MessageBox.Show(Convert.ToString(d));
}
public void threadTwo()
{
SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn.Open();
//如果字符串中含有单引号,解决方法1----------------------------------
//string titlestr = "念佛'夜晚访'问欧诺'法";
//string pricestr = "99.00";
//string sqlstr = string.Format("insert into goods (name,price) values (@name,'" + pricestr + "')");
//SqlCommand sqlcom = new SqlCommand(sqlstr, conn);
//sqlcom.Parameters.Add("@name", titlestr);
//sqlcom.ExecuteNonQuery();
//解决方法2-----------------------------------------------------------------------
//string bufffuck = "fdgjjf'fgfgf";
//bufffuck = bufffuck.Replace("'", "''");
//string sqlstr = string.Format("insert into goods (name) values ('"+bufffuck+"')");
//SqlCommand sqlcom = new SqlCommand(sqlstr, conn);
//int y = sqlcom.ExecuteNonQuery(); string sel = string.Format("select url from exerciseTwoSort");
DataTable dt = new DataTable();
SqlDataAdapter dapt = new SqlDataAdapter(sel, conn);
dapt.Fill(dt); for (int i = ; i < dt.Rows.Count; i++)
{
surl = dt.Rows[i][].ToString();
HttpUtility httpTwo = new HttpUtility();
string htmlTwo = httpTwo.GetHtmlText(surl);
var documenthtml = new JumonyParser().Parse(htmlTwo);
var pageto = Convert.ToString(documenthtml.FindFirst(".goods-page-min label").InnerText());
page = Convert.ToInt32(pageto.Split('/')[]);
GetJDData();
}
}
void GetJDData()
{
for (int j = ; j <= page; j++)
{
string htmlTwo = surl + "&price_min=0&price_max=0&page=" + j + "&sort=sort_order%20asc,last_update&order=DESC";
HttpUtility httpMid = new HttpUtility();
string htmlMid = httpMid.GetHtmlText(htmlTwo);
var documenthtmlMid = new JumonyParser().Parse(htmlMid);
var items = documenthtmlMid.Find(".piclist li");
foreach(var item in items)
{
string title = item.FindFirst(".base a").InnerText();
title = title.Replace("'", "''");
//string goodsurl = "http://www.htluxe.com/"+item.FindFirst(".base a").Attribute("href").Value();
//string subhtml = http.GetHtmlText(goodsurl, "utf-8", "text/html; charset=utf-8");
//string Area_Html = http.GetHtmlText(goodsurl.Split('?')[0] + "?act=price&" + goodsurl.Split('?')[1], "utf-8", "text/html;charset=utf-8", "");
try
{
string nowPrice = item.FindFirst(".minprice").InnerText();
string oldPrice = item.FindFirst(".maxprice").InnerText();
string popular = item.FindFirst(".ratecount strong").InnerText();
string sales = item.FindFirst(".soldnum strong").InnerText();
string contents = item.FindFirst(".commentcount strong").InnerText().ToString();
string htmlThree = "http://www.htluxe.com/" + item.FindFirst("dt a").Attribute("href").Value().ToString();
HttpUtility httpThree = new HttpUtility();
string htmlBuff = httpThree.GetHtmlText(htmlThree);
var documenthtmlThree = new JumonyParser().Parse(htmlBuff);
string sben = documenthtmlThree.FindFirst(".promotionMiddleTop p").InnerText().ToString();
string num = sben.Split(':')[]; string starLevel = documenthtmlThree.FindFirst(".m-ratescore i").InnerText().ToString();
bufff(title, nowPrice, oldPrice, popular, sales, num, contents, starLevel);
this.Invoke((EventHandler)(delegate
{
listBox1.Items.Add(title + " " + nowPrice + " " + num + " " + oldPrice + " " + sales + " " + popular + " " + contents + " " + starLevel); }));
//有线程时listbox添加东西的时候要这么写不然报错谁知道什么鬼
//this.listBox1.Items.Add("");
//listBox1.Items.Add(title + " " + nowPrice + " " + num + " " + oldPrice + " " + sales + " " + popular); }
catch
{
MessageBox.Show("异常");
} } }
}
private static void bufff(string title, string nowPrice, string oldPrice,
string popular, string sales, string num, string contents, string starLevel)
{
SqlConnection conn2 = new SqlConnection("Data Source=.;Initial Catalog=StuTinafirst;User ID=sa;Password=123456");
conn2.Open(); string strstr = string.Format("insert into GoodsList (name, num, sales, popular, starLevel, contents, price, oldPrice) values ('" + title + "', '" +num + "', '" + sales + "', '" + popular + "', '"+starLevel+"', '"+contents+"', '" + nowPrice + "', '" + oldPrice + "')");
SqlCommand com2 = new SqlCommand(strstr, conn2);
int g = com2.ExecuteNonQuery();
}
}
}
C#获取网页信息并存入数据库的更多相关文章
- C# HttpWebRequest 绝技 根据URL地址获取网页信息
如果要使用中间的方法的话,可以访问我的帮助类完全免费开源:C# HttpHelper,帮助类,真正的Httprequest请求时无视编码,无视证书,无视Cookie,网页抓取 1.第一招,根据URL地 ...
- 使用URLConnection获取网页信息的基本流程
参考自core java v2, chapter3 Networking. 注:URLConnection的子类HttpURLConnection被广泛用于Android网络客户端编程,它与apach ...
- 使用URLConnection获取网页信息的基本流程 分类: H1_ANDROID 2013-10-12 23:51 3646人阅读 评论(0) 收藏
参考自core java v2, chapter3 Networking. 注:URLConnection的子类HttpURLConnection被广泛用于Android网络客户端编程,它与apach ...
- C# 获取网页信息
获取网页源码 ///通过HttpWebResponse public string GetUrlHtml(string url) { string strHtml = string.Empty; Ht ...
- Python 爬虫 招聘信息并存入数据库
新学习了selenium,啪一下腾讯招聘 from lxml import etree from selenium import webdriver import pymysql def Geturl ...
- C#获取网页信息核心方法(入门一)
目录:信息采集入门系列目录 下面记录的是我自己整理的C#请求页面核心类,主要有如下几个方法 1.HttpWebRequest Get请求获得页面html 2.HttpWebRequest Post请求 ...
- python爬虫爬取ip记录网站信息并存入数据库
import requests import re import pymysql #10页 仔细观察路由 db = pymysql.connect("localhost",&quo ...
- python获取网页信息的三种方法
import urllib.request import http.cookiejar url = 'http://www.baidu.com/' # 方法一 print('方法一') req_one ...
- 获取网页上数据(图片、文字、视频)-b
Demo地址:http://download.csdn.net/detail/u012881779/8831835 获取网页上所有图片.获取所有html.获取网页title.获取网页内容文字... . ...
随机推荐
- c语言程序命名规范:函数、变量、数组、文件名
函数: //send or recv data task void send_recv_data(void *pvParameters); //get socket error code. retur ...
- 【7.10校内test】T2不等数列
[题目链接luogu] 此题在luogu上模数是2015,考试题的模数是2012. 然后这道题听说好多人是打表找规律的(就像7.9T2一样)(手动滑稽_gc) 另外手动 sy,每次测试都无意之间bib ...
- java 环境配置及开发工具
1.下载JDK 网址:http://www.oracle.com/technetwork/java/javase/downloads/index.html 2 安装jdk 3.安装好jdk后配置环境变 ...
- selenium之京东商品爬虫
#今日目标 **selenium之京东商品爬虫** 自动打开京东首页,并输入你要搜索的东西,进入界面进行爬取信息 ``` from selenium import webdriver import t ...
- php 图像处理 抠图,生成背景透明png 图片
*自定义一个图片等比缩放函数 *@param string $picname 被缩放图片名 *@param string $path 被缩放图片路径 *@param int $maxWidth 图片被 ...
- PHPStorm开启MiniMap功能
Sublime Text 右侧的代码导航(MiniMap)功能相当好用,PHPStorm第三方插件,可以实现相同功能. 只需要下载 CodeGlance 插件即可,如下操作
- webpack3 打包
1. 基于 webpack 3.0 2.步骤.说明 2.1 webpack 本地初始化.安装基本包 npm init > package.json npm i webpack ...
- cobbler装机系统部署
1.cobbler安装 [root@linux-node1 ~]# cp /etc/cobbler/settings{,.ori} # 备份 # server,Cobbler服务器的IP. sed - ...
- PAT Basic 1081 检查密码 (15 分)
本题要求你帮助某网站的用户注册模块写一个密码合法性检查的小功能.该网站要求用户设置的密码必须由不少于6个字符组成,并且只能有英文字母.数字和小数点 .,还必须既有字母也有数字. 输入格式: 输入第一行 ...
- Eureka实现高可用及为Eureka设置登录账号和密码
本文通过两个eureka相互注册实现注册中心的高可用,同时为注册中心配置认证登录. 需要用到的maven配置 <dependency> <groupId>org.springf ...