关注微信公众号:FocusBI 查看更多文章;加QQ群:808774277 获取学习资料和一起探讨问题。

  

  《商业智能教程》pdf下载地址

  链接:https://pan.baidu.com/s/1f9VdZUXztwylkOdFLbcmWw 密码:2r4v

  在为企业实施商业智能时,大部分都是使用内部数据建模和可视化;以前极少企业有爬虫工程师来为企业准备外部数据,最近一年来Python爬虫异常火爆,企业也开始招爬虫工程师为企业丰富数据来源。

我使用Python 抓取过一些网站数据,如:美团、点评、一亩田、租房等;这些数据并没有用作商业用途而是个人兴趣爬取下来做练习使用;这里我已  一亩田为例使用 scrapy框架去抓取它的数据。

一亩田

它是一个农产品网站,汇集了中国大部分农产品产地和市场行情,发展初期由百度系的人员创建,最初是招了大量的业务员去农村收集和教育农民把产品信息发布到一亩田网上..。

一亩田一开始是网页版,由于爬虫太多和农户在外劳作使用不方便而改成APP版废弃网页版,一亩田App反爬能力非常强悍;另外一亩田有一亩田产地行情和市场行情网页版,它的信息量也非常多,所以我选择爬取一亩田产地行情数据。

爬取一亩田使用的是Scrapy框架,这个框架的原理及dome我在这里不讲,直接给爬取一亩田的分析思路及源码;

一亩田爬虫分析思路

首先登陆一亩田产地行情:http://hangqing.ymt.com/chandi,看到农产品分类

单击水果分类就能看到它下面有很多小分类,单击梨进入水果梨的行情页,能看到它下面有全部品种和指定地区选择一个省就能看到当天的行情和一个月的走势;

看到这一连串的网页我就根据这个思路去抓取数据。

一亩田爬虫源码

1.首先创建一个Spider

2.行情数据

抓取大类、中类、小类、品种  hangqing.py

  1. # -*- coding: utf-8 -*-
  2. import scrapy
  3. from mySpider.items import MyspiderItem
  4. from copy import deepcopy
  5. import time
  6.  
  7. class HangqingSpider(scrapy.Spider):
  8. name = "hangqing"
  9. allowed_domains = ["hangqing.ymt.com"]
  10. start_urls = (
  11. 'http://hangqing.ymt.com/',
  12. )
  13.  
  14. # 大分类数据
  15. def parse(self, response):
  16. a_list = response.xpath("//div[@id='purchase_wrapper']/div//a[@class='hide']")
  17.  
  18. for a in a_list:
  19. items = MyspiderItem()
  20. items["ymt_bigsort_href"] = a.xpath("./@href").extract_first()
  21. items["ymt_bigsort_id"] = items["ymt_bigsort_href"].replace("http://hangqing.ymt.com/common/nav_chandi_", "")
  22. items["ymt_bigsort_name"] = a.xpath("./text()").extract_first()
  23.  
  24. # 发送详情页的请求
  25. yield scrapy.Request(
  26. items["ymt_bigsort_href"],
  27. callback=self.parse_medium_detail,
  28. meta={"item": deepcopy(items)}
  29. )
  30.  
  31. # 中分类数据 其中小类也包含在其中
  32. def parse_medium_detail(self, response):
  33. items = response.meta["item"]
  34. li_list = response.xpath("//div[@class='cate_nav_wrap']//a")
  35. for li in li_list:
  36. items["ymt_mediumsort_id"] = li.xpath("./@data-id").extract_first()
  37. items["ymt_mediumsort_name"] = li.xpath("./text()").extract_first()
  38. yield scrapy.Request(
  39. items["ymt_bigsort_href"],
  40. callback=self.parse_small_detail,
  41. meta={"item": deepcopy(items)},
  42. dont_filter=True
  43. )
  44.  
  45. # 小分类数据
  46. def parse_small_detail(self, response):
  47. item = response.meta["item"]
  48. mediumsort_id = item["ymt_mediumsort_id"]
  49. if int(mediumsort_id) > 0:
  50. nav_product_id = "nav-product-" + mediumsort_id
  51. a_list = response.xpath("//div[@class='cate_content_1']//div[contains(@class,'{}')]//ul//a".format(nav_product_id))
  52. for a in a_list:
  53. item["ymt_smallsort_id"] = a.xpath("./@data-id").extract_first()
  54. item["ymt_smallsort_href"] = a.xpath("./@href").extract_first()
  55. item["ymt_smallsort_name"] = a.xpath("./text()").extract_first()
  56. yield scrapy.Request(
  57. item["ymt_smallsort_href"],
  58. callback=self.parse_variety_detail,
  59. meta={"item": deepcopy(item)}
  60. )
  61.  
  62. # 品种数据
  63. def parse_variety_detail(self, response):
  64. item = response.meta["item"]
  65. li_list = response.xpath("//ul[@class='all_cate clearfix']//li")
  66. if len(li_list) > 0:
  67. for li in li_list:
  68. item["ymt_breed_href"] = li.xpath("./a/@href").extract_first()
  69. item["ymt_breed_name"] = li.xpath("./a/text()").extract_first()
  70. item["ymt_breed_id"] = item["ymt_breed_href"].split("_")[2]
  71.  
  72. yield item
  73.  
  74. else:
  75. item["ymt_breed_href"] = ""
  76. item["ymt_breed_name"] = ""
  77. item["ymt_breed_id"] = -1
  78.  
  79. yield item

3.产地数据

抓取省份、城市、县市  chandi.py

  1. # -*- coding: utf-8 -*-
  2. import scrapy
  3. from mySpider.items import MyspiderChanDi
  4. from copy import deepcopy
  5.  
  6. class ChandiSpider(scrapy.Spider):
  7. name = 'chandi'
  8. allowed_domains = ['hangqing.ymt.com']
  9. start_urls = ['http://hangqing.ymt.com/chandi_8031_0_0']
  10.  
  11. # 省份数据
  12. def parse(self, response):
  13. # 产地列表
  14. li_list = response.xpath("//div[@class='fl sku_name']/ul//li")
  15. for li in li_list:
  16. items = MyspiderChanDi()
  17. items["ymt_province_href"] = li.xpath("./a/@href").extract_first()
  18. items["ymt_province_id"] = items["ymt_province_href"].split("_")[-1]
  19. items["ymt_province_name"] = li.xpath("./a/text()").extract_first()
  20.  
  21. yield scrapy.Request(
  22. items["ymt_province_href"],
  23. callback=self.parse_city_detail,
  24. meta={"item": deepcopy(items)}
  25. )
  26.  
  27. # 城市数据
  28. def parse_city_detail(self, response):
  29. item = response.meta["item"]
  30. option = response.xpath("//select[@class='location_select'][1]//option")
  31.  
  32. if len(option) > 0:
  33. for op in option:
  34. name = op.xpath("./text()").extract_first()
  35. if name != "全部":
  36. item["ymt_city_name"] = name
  37. item["ymt_city_href"] = op.xpath("./@data-url").extract_first()
  38. item["ymt_city_id"] = item["ymt_city_href"].split("_")[-1]
  39. yield scrapy.Request(
  40. item["ymt_city_href"],
  41. callback=self.parse_area_detail,
  42. meta={"item": deepcopy(item)}
  43. )
  44. else:
  45. item["ymt_city_name"] = ""
  46. item["ymt_city_href"] = ""
  47. item["ymt_city_id"] = 0
  48. yield scrapy.Request(
  49. item["ymt_city_href"],
  50. callback=self.parse_area_detail,
  51. meta={"item": deepcopy(item)}
  52.  
  53. )
  54.  
  55. # 县市数据
  56. def parse_area_detail(self, response):
  57. item = response.meta["item"]
  58. area_list = response.xpath("//select[@class='location_select'][2]//option")
  59.  
  60. if len(area_list) > 0:
  61. for area in area_list:
  62. name = area.xpath("./text()").extract_first()
  63. if name != "全部":
  64. item["ymt_area_name"] = name
  65. item["ymt_area_href"] = area.xpath("./@data-url").extract_first()
  66. item["ymt_area_id"] = item["ymt_area_href"].split("_")[-1]
  67. yield item
  68. else:
  69. item["ymt_area_name"] = ""
  70. item["ymt_area_href"] = ""
  71. item["ymt_area_id"] = 0
  72. yield item

4.行情分布

  location_char.py

  1. # -*- coding: utf-8 -*-
  2. import scrapy
  3. import pymysql
  4. import json
  5. from copy import deepcopy
  6. from mySpider.items import MySpiderSmallProvincePrice
  7. import datetime
  8.  
  9. class LocationCharSpider(scrapy.Spider):
  10. name = 'location_char'
  11. allowed_domains = ['hangqing.ymt.com']
  12. start_urls = ['http://hangqing.ymt.com/']
  13.  
  14. i = datetime.datetime.now()
  15. dateKey = str(i.year) + str(i.month) + str(i.day)
  16. db = pymysql.connect(
  17. host="127.0.0.1", port=3306,
  18. user='root', password='mysql',
  19. db='ymt_db', charset='utf8'
  20. )
  21.  
  22. def parse(self, response):
  23. cur = self.db.cursor()
  24. location_char_sql = "select small_id from ymt_price_small where dateKey = {} and day_avg_price > 0".format(self.dateKey)
  25.  
  26. cur.execute(location_char_sql)
  27. location_chars = cur.fetchall()
  28. for ch in location_chars:
  29. item = MySpiderSmallProvincePrice()
  30. item["small_id"] = ch[0]
  31. location_char_url = "http://hangqing.ymt.com/chandi/location_charts"
  32. small_id = str(item["small_id"])
  33. form_data = {
  34. "locationId": "",
  35. "productId": small_id,
  36. "breedId": ""
  37. }
  38. yield scrapy.FormRequest(
  39. location_char_url,
  40. formdata=form_data,
  41. callback=self.location_char,
  42. meta={"item": deepcopy(item)}
  43. )
  44.  
  45. def location_char(self, response):
  46. item = response.meta["item"]
  47.  
  48. html_str = json.loads(response.text)
  49. status = html_str["status"]
  50. if status == 0:
  51. item["unit"] = html_str["data"]["unit"]
  52. item["dateKey"] = self.dateKey
  53. dataList = html_str["data"]["dataList"]
  54. for data in dataList:
  55. if type(data) == type([]):
  56. item["province_name"] = data[0]
  57. item["province_price"] = data[1]
  58. elif type(data) == type({}):
  59. item["province_name"] = data["name"]
  60. item["province_price"] = data["y"]
  61.  
  62. location_char_url = "http://hangqing.ymt.com/chandi/location_charts"
  63. small_id = str(item["small_id"])
  64. province_name = str(item["province_name"])
  65. province_id_sql = "select province_id from ymt_1_dim_cdProvince where province_name = \"{}\" ".format(province_name)
  66. cur = self.db.cursor()
  67. cur.execute(province_id_sql)
  68. province_id = cur.fetchone()
  69.  
  70. item["province_id"] = province_id[0]
  71.  
  72. province_id = str(province_id[0])
  73. form_data = {
  74. "locationId": province_id,
  75. "productId": small_id,
  76. "breedId": ""
  77. }
  78. yield scrapy.FormRequest(
  79. location_char_url,
  80. formdata=form_data,
  81. callback=self.location_char_province,
  82. meta={"item": deepcopy(item)}
  83. )
  84.  
  85. def location_char_province(self, response):
  86. item = response.meta["item"]
  87.  
  88. html_str = json.loads(response.text)
  89. status = html_str["status"]
  90.  
  91. if status == 0:
  92. dataList = html_str["data"]["dataList"]
  93. for data in dataList:
  94. if type(data) == type([]):
  95. item["city_name"] = data[0]
  96. item["city_price"] = data[1]
  97. elif type(data) == type({}):
  98. item["city_name"] = data["name"]
  99. item["city_price"] = data["y"]
  100.  
  101. location_char_url = "http://hangqing.ymt.com/chandi/location_charts"
  102. small_id = str(item["small_id"])
  103. city_name = str(item["city_name"])
  104. city_id_sql = "select city_id from ymt_1_dim_cdCity where city_name = \"{}\" ".format(city_name)
  105. cur = self.db.cursor()
  106. cur.execute(city_id_sql)
  107. city_id = cur.fetchone()
  108.  
  109. item["city_id"] = city_id[0]
  110.  
  111. city_id = str(city_id[0])
  112. form_data = {
  113. "locationId": city_id,
  114. "productId": small_id,
  115. "breedId": ""
  116. }
  117. yield scrapy.FormRequest(
  118. location_char_url,
  119. formdata=form_data,
  120. callback=self.location_char_province_city,
  121. meta={"item": deepcopy(item)}
  122. )
  123.  
  124. def location_char_province_city(self, response):
  125. item = response.meta["item"]
  126.  
  127. html_str = json.loads(response.text)
  128. status = html_str["status"]
  129.  
  130. if status == 0:
  131. dataList = html_str["data"]["dataList"]
  132. for data in dataList:
  133. if type(data) == type([]):
  134. item["area_name"] = data[0]
  135. item["area_price"] = data[1]
  136. elif type(data) == type({}):
  137. item["area_name"] = data["name"]
  138. item["area_price"] = data["y"]
  139. area_name = item["area_name"]
  140. area_id_sql = "select area_id from ymt_1_dim_cdArea where area_name = \"{}\" ".format(area_name)
  141. cur1 = self.db.cursor()
  142. cur1.execute(area_id_sql)
  143. area_id = cur1.fetchone()
  144.  
  145. item["area_id"] = area_id[0]
  146.  
  147. breed_id_sql = "select breed_id from ymt_all_info_sort where small_id = {} and breed_id > 0".format(item["small_id"])
  148. cur1.execute(breed_id_sql)
  149. breed_ids = cur1.fetchall()
  150. # print(len(breed_ids))
  151. location_char_url = "http://hangqing.ymt.com/chandi/location_charts"
  152. for breed_id in breed_ids:
  153. item["breed_id"] = breed_id[0]
  154. form_data = {
  155. "locationId": str(item["city_id"]),
  156. "productId": str(item["small_id"]),
  157. "breedId": str(breed_id[0])
  158. }
  159. # print(form_data, breed_id)
  160. yield scrapy.FormRequest(
  161. location_char_url,
  162. formdata=form_data,
  163. callback=self.location_char_province_city_breed,
  164. meta={"item": deepcopy(item)}
  165. )
  166.  
  167. def location_char_province_city_breed(self, response):
  168. item = response.meta["item"]
  169.  
  170. html_str = json.loads(response.text)
  171. status = html_str["status"]
  172.  
  173. if status == 0:
  174. dataList = html_str["data"]["dataList"]
  175. for data in dataList:
  176. if type(data) == type([]):
  177. item["breed_city_name"] = data[0]
  178. item["breed_city_price"] = data[1]
  179. elif type(data) == type({}):
  180. item["breed_city_name"] = data["name"]
  181. item["breed_city_price"] = data["y"]
  182.  
  183. yield item

5.价格走势

  pricedata.py

  1. # -*- coding: utf-8 -*-
  2. import scrapy
  3. import pymysql.cursors
  4. from copy import deepcopy
  5. from mySpider.items import MySpiderSmallprice
  6.  
  7. import datetime
  8. import json
  9.  
  10. class PricedataSpider(scrapy.Spider):
  11. name = 'pricedata'
  12. allowed_domains = ['hangqing.ymt.com']
  13. start_urls = ['http://hangqing.ymt.com/chandi_8031_0_0']
  14. i = datetime.datetime.now()
  15.  
  16. def parse(self, response):
  17. db = pymysql.connect(
  18. host="127.0.0.1", port=3306,
  19. user='root', password='mysql',
  20. db='ymt_db', charset='utf8'
  21. )
  22. cur = db.cursor()
  23.  
  24. all_small_sql = "select distinct small_id,small_name,small_href from ymt_all_info_sort"
  25.  
  26. cur.execute(all_small_sql)
  27. small_all = cur.fetchall()
  28.  
  29. for small in small_all:
  30. item = MySpiderSmallprice()
  31. item["small_href"] = small[2]
  32. # item["small_name"] = small[1]
  33. item["small_id"] = small[0]
  34. yield scrapy.Request(
  35. item["small_href"],
  36. callback=self.small_breed_info,
  37. meta={"item": deepcopy(item)}
  38. )
  39.  
  40. def small_breed_info(self, response):
  41. item = response.meta["item"]
  42. item["day_avg_price"] = response.xpath("//dd[@class='c_origin_price']/p[2]//span[1]/text()").extract_first()
  43. item["unit"] = response.xpath("//dd[@class='c_origin_price']/p[2]//span[2]/text()").extract_first()
  44. item["dateKey"] = str(self.i.year)+str(self.i.month)+str(self.i.day)
  45.  
  46. if item["day_avg_price"] is None:
  47. item["day_avg_price"] = 0
  48. item["unit"] = ""
  49.  
  50. yield item

6.设计字典

  items.py

  1. # -*- coding: utf-8 -*-
  2.  
  3. # Define here the models for your scraped items
  4. #
  5. # See documentation in:
  6. # http://doc.scrapy.org/en/latest/topics/items.html
  7.  
  8. import scrapy
  9.  
  10. # 行情爬虫字段
  11.  
  12. class MyspiderItem(scrapy.Item):
  13. ymt_bigsort_href = scrapy.Field()
  14. ymt_bigsort_id = scrapy.Field()
  15. ymt_bigsort_name = scrapy.Field()
  16. ymt_mediumsort_id = scrapy.Field()
  17. ymt_mediumsort_name = scrapy.Field()
  18. ymt_smallsort_id = scrapy.Field()
  19. ymt_smallsort_href = scrapy.Field()
  20. ymt_smallsort_name = scrapy.Field()
  21. ymt_breed_id = scrapy.Field()
  22. ymt_breed_name = scrapy.Field()
  23. ymt_breed_href = scrapy.Field()
  24.  
  25. # 产地爬虫字段
  26.  
  27. class MyspiderChanDi(scrapy.Item):
  28. ymt_province_id = scrapy.Field()
  29. ymt_province_name = scrapy.Field()
  30. ymt_province_href = scrapy.Field()
  31. ymt_city_id = scrapy.Field()
  32. ymt_city_name = scrapy.Field()
  33. ymt_city_href = scrapy.Field()
  34. ymt_area_id = scrapy.Field()
  35. ymt_area_name = scrapy.Field()
  36. ymt_area_href = scrapy.Field()
  37.  
  38. # 小类产地价格
  39.  
  40. class MySpiderSmallprice(scrapy.Item):
  41. small_href = scrapy.Field()
  42. small_id = scrapy.Field()
  43. day_avg_price = scrapy.Field()
  44. unit = scrapy.Field()
  45. dateKey = scrapy.Field()
  46.  
  47. # 小分类 省份/城市/县市 价格
  48.  
  49. class MySpiderSmallProvincePrice(scrapy.Item):
  50. small_id = scrapy.Field()
  51. unit = scrapy.Field()
  52. province_name = scrapy.Field()
  53. province_price = scrapy.Field() # 小类 省份 均价
  54. province_id = scrapy.Field()
  55. city_name = scrapy.Field()
  56. city_price = scrapy.Field() # 小类 城市 均价
  57. city_id = scrapy.Field()
  58. area_name = scrapy.Field()
  59. area_price = scrapy.Field() # 小类 县市均价
  60. area_id = scrapy.Field()
  61.  
  62. breed_city_name = scrapy.Field()
  63. breed_city_price = scrapy.Field()
  64. breed_id = scrapy.Field()
  65.  
  66. dateKey = scrapy.Field()

7.数据入库

  pipelines.py

  1. # -*- coding: utf-8 -*-
  2.  
  3. from pymongo import MongoClient
  4. import pymysql.cursors
  5.  
  6. class MyspiderPipeline(object):
  7. def open_spider(self, spider):
  8. # client = MongoClient(host=spider.settings["MONGO_HOST"], port=spider.settings["MONGO_PORT"])
  9. # self.collection = client["ymt"]["hangqing"]
  10. pass
  11.  
  12. def process_item(self, item, spider):
  13. db = pymysql.connect(
  14. host="127.0.0.1", port=3306,
  15. user='root', password='mysql',
  16. db='ymt_db', charset='utf8'
  17. )
  18. cur = db.cursor()
  19.  
  20. if spider.name == "hangqing":
  21.  
  22. # 所有 分类数据
  23. all_sort_sql = "insert into ymt_all_info_sort(big_id, big_name, big_href, " \
  24. "medium_id, medium_name, " \
  25. "small_id, small_name, small_href, " \
  26. "breed_id, breed_name, breed_href) " \
  27. "VALUES({},\"{}\",\"{}\",\"{}\",\"{}\",\"{}\",\"{}\",\"{}\",\"{}\",\"{}\",\"{}\")".format(
  28. item["ymt_bigsort_id"], item["ymt_bigsort_name"], item["ymt_bigsort_href"],
  29. item["ymt_mediumsort_id"], item["ymt_mediumsort_name"],
  30. item["ymt_smallsort_id"], item["ymt_smallsort_name"], item["ymt_smallsort_href"],
  31. item["ymt_breed_id"], item["ymt_breed_name"], item["ymt_breed_href"])
  32.  
  33. try:
  34. cur.execute(all_sort_sql)
  35. db.commit()
  36.  
  37. except Exception as e:
  38. db.rollback()
  39. finally:
  40. cur.close()
  41. db.close()
  42.  
  43. return item
  44.  
  45. elif spider.name == "chandi":
  46.  
  47. # 所有的产地数据
  48. all_cd_sql = "insert into ymt_all_info_cd(" \
  49. "province_id, province_name, province_href, " \
  50. "city_id, city_name, city_href," \
  51. "area_id, area_name, area_href) " \
  52. "VALUES({},\"{}\",\"{}\",{},\"{}\",\"{}\",{},\"{}\",\"{}\")".format(
  53. item["ymt_province_id"], item["ymt_province_name"], item["ymt_province_href"],
  54. item["ymt_city_id"], item["ymt_city_name"], item["ymt_city_href"],
  55. item["ymt_area_id"], item["ymt_area_name"], item["ymt_area_href"])
  56. try:
  57. # 产地数据
  58. cur.execute(all_cd_sql)
  59. db.commit()
  60. except Exception as e:
  61. db.rollback()
  62.  
  63. finally:
  64. cur.close()
  65. db.close()
  66.  
  67. return item
  68.  
  69. elif spider.name == "pricedata":
  70. avg_day_price_sql = "insert into ymt_price_small(small_href, small_id, day_avg_price, unit, dateKey) " \
  71. "VALUES(\"{}\",{},{},\"{}\",\"{}\")".format(item["small_href"], item["small_id"], item["day_avg_price"], item["unit"], item["dateKey"])
  72. try:
  73. cur.execute(avg_day_price_sql)
  74. db.commit()
  75. except Exception as e:
  76. db.rollback()
  77. finally:
  78. cur.close()
  79. db.close()
  80.  
  81. elif spider.name == "location_char":
  82. location_char_sql = "insert into ymt_price_provice(small_id, province_name, provice_price, city_name, city_price, area_name, area_price,unit, dateKey, area_id, city_id, provice_id, breed_city_name, breed_city_price, breed_id) " \
  83. "VALUES({},\"{}\",{},\"{}\",{},\"{}\",{},\"{}\",{},{},{},{},\"{}\",{},{})".format(item["small_id"], item["province_name"], item["province_price"], item["city_name"], item["city_price"],
  84. item["area_name"], item["area_price"], item["unit"], item["dateKey"],
  85. item["area_id"], item["city_id"], item["province_id"],
  86. item["breed_city_name"], item["breed_city_price"], item["breed_id"])
  87. try:
  88. cur.execute(location_char_sql)
  89. db.commit()
  90. except Exception as e:
  91. db.rollback()
  92. finally:
  93. cur.close()
  94. db.close()
  95.  
  96. else:
  97. cur.close()
  98. db.close()

最后结果

出于个人兴趣,最后把爬取下来的农产品信息变成了一个WEB系统。

历史文章:

FocusBI: SQL Server内核

企业数据管理战略

FocusBI: 总线矩阵(原创)

FocusBI: 数据仓库 (原创)

FocusBI: 商业智能场景(原创)

FocusBI: SSIS体系结构(原创)

FocusBI: 使用Python爬虫为BI准备数据源(原创)

FocusBI: SSIS 开发案例(原创)

FocusBI关注者
FocusBI:SSAS体系结构(原创)
FocusBI:租房分析&星型模型
FocusBI:地产分析&雪花模型
FocusBI:MDX检索多维模型
FocusBI:租房分析可视化(网址体验)

FocusBI: 《DW/BI项目管理》之数据库表结构 (原创)

FocusBI:《DW/BI项目管理》之SSIS执行情况

FocusBI: 使用Python爬虫为BI准备数据源(原创)的更多相关文章

  1. Python爬虫框架Scrapy学习笔记原创

     字号 scrapy [TOC] 开始 scrapy安装 首先手动安装windows版本的Twisted https://www.lfd.uci.edu/~gohlke/pythonlibs/#twi ...

  2. Python爬虫 股票数据爬取

    前一篇提到了与股票数据相关的可能几种数据情况,本篇接着上篇,介绍一下多个网页的数据爬取.目标抓取平安银行(000001)从1989年~2017年的全部财务数据. 数据源分析 地址分析 http://m ...

  3. python爬虫的一个常见简单js反爬

    python爬虫的一个常见简单js反爬 我们在写爬虫是遇到最多的应该就是js反爬了,今天分享一个比较常见的js反爬,这个我已经在多个网站上见到过了. 我把js反爬分为参数由js加密生成和js生成coo ...

  4. 写博客没高质量配图?python爬虫教你绕过限制一键搜索下载图虫创意图片!

    目录 前言 分析 理想状态 爬虫实现 其他注意 效果与总结 @(文章目录) 前言 在我们写文章(博客.公众号.自媒体)的时候,常常觉得自己的文章有些老土,这很大程度是因为配图没有选好. 笔者也是遇到相 ...

  5. 小白学 Python 爬虫(25):爬取股票信息

    人生苦短,我用 Python 前文传送门: 小白学 Python 爬虫(1):开篇 小白学 Python 爬虫(2):前置准备(一)基本类库的安装 小白学 Python 爬虫(3):前置准备(二)Li ...

  6. Python 爬虫模拟登陆知乎

    在之前写过一篇使用python爬虫爬取电影天堂资源的博客,重点是如何解析页面和提高爬虫的效率.由于电影天堂上的资源获取权限是所有人都一样的,所以不需要进行登录验证操作,写完那篇文章后又花了些时间研究了 ...

  7. python爬虫成长之路(一):抓取证券之星的股票数据

    获取数据是数据分析中必不可少的一部分,而网络爬虫是是获取数据的一个重要渠道之一.鉴于此,我拾起了Python这把利器,开启了网络爬虫之路. 本篇使用的版本为python3.5,意在抓取证券之星上当天所 ...

  8. python爬虫学习(7) —— 爬取你的AC代码

    上一篇文章中,我们介绍了python爬虫利器--requests,并且拿HDU做了小测试. 这篇文章,我们来爬取一下自己AC的代码. 1 确定ac代码对应的页面 如下图所示,我们一般情况可以通过该顺序 ...

  9. python爬虫学习(6) —— 神器 Requests

    Requests 是使用 Apache2 Licensed 许可证的 HTTP 库.用 Python 编写,真正的为人类着想. Python 标准库中的 urllib2 模块提供了你所需要的大多数 H ...

随机推荐

  1. 底部导航栏使用BottomNavigationBar

    1.github地址 https://github.com/zhouxu88/BottomNavigationBar 2.基本使用 2,1添加依赖 implementation 'com.ashokv ...

  2. 关于获取客户端Mac地址

    private static string GetClientMAC() { string mac_dest = string.Empty; try { string strClientIP = Ht ...

  3. system idle process

    偶然发现windows环境任务管理其中存在进程 system idle process,cpu占用极大但内存占用很小.google了一下,保存下结果 system idle process :系统空闲 ...

  4. 线程概要 Java

    线程 进程和线程的区别 串行:初期的计算机只能串行执行任务,大量时间等待用户输入 批处理:预先将用户的指令集中成清单,批量串行处理用户指令,仍然无法并发执行 进程:进程独占内存空间,保存各自运行状态, ...

  5. while 小项目练习

    # (1) 用双层while 写十行十列小星星 j = 0 while j < 10: #打印一行十个小星星 i = 0 while i <10: print("*", ...

  6. 各大SRC中的CSRF技巧

    本文作者:i春秋签约作家——Max. 一.CSRF是什么? CSRF(Cross-site request forgery),中文名称:跨站请求伪造,也被称为:one click attack/ses ...

  7. [ActionScript 3.0] 如何控制加载swf动画的播放与暂停

    此方法适用于用as 1.0或者as2.0以及as3.0编译的swf,因为as1.0和as2.0编译的swf是AVM1Movie类型,因此需要通过类ForcibleLoader.as将其转换为versi ...

  8. ArchLinux借助Winetricks-zh安裝WineQQ8.1

    Wine是一个在x86.x86-64上容许类Unix操作系统在X Window System下运行Microsoft Windows程序的软件.Wine有另一个非官方名称,"Windows ...

  9. 查看已安装tensorflow版本以及安装路径

    查看版本: import tensorflow as tf tf.__version__ 查看安装路径: tf.__path__

  10. Python爬取LOL英雄皮肤

    Python爬取LOL英雄皮肤 Python 爬虫  一 实现分析 在官网上找到英雄皮肤的真实链接,查看多个后发现前缀相同,后面对应为英雄的ID和皮肤的ID,皮肤的ID从00开始顺序递增,而英雄ID跟 ...