之前的运行结果比对发现,有1个函数的作用在2个job里面是相同的,但是对应的计算时间却差太远

  于是把4个job分开运行.虽说使用的数据不同,但是生成数据的生成器是相同的,数据排布差距不大,数据量也是相同的.

  以下是这4个job的运行时间表

.content{position:relative}
.vis.timeline .axis{position:absolute;width:100%;height:0;left:0;z-index:1}
.vis.timeline .item{position:absolute;color:#1A1A1A;border-color:#97B0F8;border-width:1px;background-color:#D5DDF6;display:inline-block;padding:5px}
.vis.timeline .item.range{border-style:solid;border-radius:2px;box-sizing:border-box}
.vis.timeline .item.stage {cursor: pointer;}
.vis.timeline .item.stage.succeeded {background-color: #A0DFFF;border-color: #3EC0FF;}
.vis.timeline .item.range .content{position:relative;display:inline-block;max-width:100%;overflow:hidden}
.vis.timeline .item.range .content {position: unset;}
.vis.timeline .timeaxis{position:relative;overflow:hidden}
.vis.timeline .timeaxis .text.measure{position:absolute;padding-left:0;padding-right:0;margin-left:0;margin-right:0;visibility:hidden}
.vis.timeline .timeaxis.foreground{top:0;left:0;width:100%}
.vis.timeline .timeaxis.background{position:absolute;top:0;left:0;width:100%;height:100%}
.vis.timeline .foreground {cursor: move;}
.vis.timeline .foreground .group{position:relative;box-sizing:border-box;border-bottom:1px solid #bfbfbf}
.vis.timeline .foreground .group:last-child{border-bottom:none}
.vis.timeline .timeaxis.background{position:absolute;top:0;left:0;width:100%;height:100%}
.vis.timeline .timeaxis .text{position:absolute;color:#4d4d4d;padding:3px;white-space:nowrap}
.vis.timeline .labelset{position:relative;overflow:hidden;box-sizing:border-box}
.vis.timeline .vispanel{position:absolute;padding:0;margin:0;box-sizing:border-box}
.vis.timeline .vispanel .shadow{position:absolute;width:100%;height:1px;box-shadow:0 0 10px rgba(0,0,0,.8)}
.vis.timeline .vispanel .shadow.top{top:-1px;left:0}
.vis.timeline .vispanel .shadow.bottom{bottom:-1px;left:0}
.vis.timeline .vispanel.bottom,.vis.timeline .vispanel.center,.vis.timeline .vispanel.left,.vis.timeline .vispanel.right,.vis.timeline .vispanel.top{border:1px #bfbfbf}
.vis.timeline .vispanel.center,.vis.timeline .vispanel.left,.vis.timeline .vispanel.right{border-top-style:solid;border-bottom-style:solid;overflow:hidden}
.vis.timeline .vispanel.bottom,.vis.timeline .vispanel.center,.vis.timeline .vispanel.top{border-left-style:solid;border-right-style:solid}
.vis.timeline .background{overflow:hidden}
.vis.timeline .labelset .vlabel{position:relative;left:0;top:0;width:100%;color:#4d4d4d;box-sizing:border-box;border-bottom:1px solid #bfbfbf}
.vis.timeline .labelset .vlabel .inner{display:inline-block;padding:5px}
.vis.timeline .labelset .vlabel:last-child{border-bottom:none}
.vis.timeline .timeaxis .grid.vertical{position:absolute;border-left:1px solid}
.vis.timeline .timeaxis .grid.minor{border-color:#e5e5e5}
#application-timeline div.legend-area,.my-job-timeline div.legend-area {margin-top: 5px;}
.vispanel.center {font-size: 12px;line-height: 12px;}
.legend-area rect.completed-stage-legend {fill: #A0DFFF;stroke: #3EC0FF;}
.legend-area rect.failed-stage-legend {fill: #FFA1B0;stroke: #FF4D6D;}
.legend-area rect.active-stage-legend {fill: #A2FCC0;stroke: #36F572;}
.legend-area rect.executor-added-legend {fill: #A0DFFF;stroke: #3EC0FF;}
.legend-area rect.executor-removed-legend {fill: #FFA1B0;stroke: #FF4D6D;}
div#application-timeline, div.my-job-timeline {margin-bottom: 30px;}
[class*="span"]{float:left;min-height:1px;margin-left:20px;}
table.sortable thead {cursor: pointer;}
table{max-width:100%;background-color:transparent;border-collapse:collapse;border-spacing:0;}
.table{width:100%;margin-bottom:20px;}
.table th,.table td{padding:8px;line-height:20px;text-align:left;vertical-align:top;border-top:1px solid #dddddd;}
.table th{font-weight:bold;}
.table caption+thead tr:first-child th,.table caption+thead tr:first-child td,.table colgroup+thead tr:first-child th,.table colgroup+thead tr:first-child td,.table thead:first-child tr:first-child th,.table thead:first-child tr:first-child td{border-top:0;}
table.sortable td {word-wrap: break-word;max-width: 600px;}
.table-striped tbody>tr:nth-child(odd)>td,.table-striped tbody>tr:nth-child(odd)>th{background-color:#f9f9f9;}
.table thead th{vertical-align:bottom;}
.table-condensed th,.table-condensed td{padding:4px 5px;}
.table-bordered{border:1px solid #dddddd;border-collapse:separate;*border-collapse:collapse;border-left:0;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}
.table-bordered th,.table-bordered td{border-left:1px solid #dddddd;}
.table-bordered caption+thead tr:first-child th,.table-bordered caption+tbody tr:first-child th,.table-bordered caption+tbody tr:first-child td,.table-bordered colgroup+thead tr:first-child th,.table-bordered colgroup+tbody tr:first-child th,.table-bordered colgroup+tbody tr:first-child td,.table-bordered thead:first-child tr:first-child th,.table-bordered tbody:first-child tr:first-child th,.table-bordered tbody:first-child tr:first-child td{border-top:0;}
.table-bordered thead:first-child tr:first-child>th:first-child,.table-bordered tbody:first-child tr:first-child>td:first-child,.table-bordered tbody:first-child tr:first-child>th:first-child{-webkit-border-top-left-radius:4px;-moz-border-radius-topleft:4px;border-top-left-radius:4px;}
.table-bordered thead:first-child tr:first-child>th:last-child,.table-bordered tbody:first-child tr:first-child>td:last-child,.table-bordered tbody:first-child tr:first-child>th:last-child{-webkit-border-top-right-radius:4px;-moz-border-radius-topright:4px;border-top-right-radius:4px;}
.table{width:100%;margin-bottom:20px;}
.progress{overflow:hidden;height:20px;margin-bottom:20px;background-color:#f7f7f7;background-image:-moz-linear-gradient(top, #f5f5f5, #f9f9f9);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#f5f5f5), to(#f9f9f9));background-image:-webkit-linear-gradient(top, #f5f5f5, #f9f9f9);background-image:-o-linear-gradient(top, #f5f5f5, #f9f9f9);background-image:linear-gradient(to bottom, #f5f5f5, #f9f9f9);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#fff5f5f5', endColorstr='#fff9f9f9', GradientType=0);-webkit-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.1);-moz-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.1);box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.1);-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}
.progress {margin-bottom: 0px; position: relative}
.progress-completed .bar,.progress .bar{width:0%;height:100%;color:#ffffff;float:left;font-size:12px;text-align:center;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#0e90d2;background-image:-moz-linear-gradient(top, #149bdf, #0480be);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#149bdf), to(#0480be));background-image:-webkit-linear-gradient(top, #149bdf, #0480be);background-image:-o-linear-gradient(top, #149bdf, #0480be);background-image:linear-gradient(to bottom, #149bdf, #0480be);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#ff149bdf', endColorstr='#ff0480be', GradientType=0);-webkit-box-shadow:inset 0 -1px 0 rgba(0, 0, 0, 0.15);-moz-box-shadow:inset 0 -1px 0 rgba(0, 0, 0, 0.15);box-shadow:inset 0 -1px 0 rgba(0, 0, 0, 0.15);-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;-webkit-transition:width 0.6s ease;-moz-transition:width 0.6s ease;-o-transition:width 0.6s ease;transition:width 0.6s ease;}
.progress .bar-completed {background-color: #3EC0FF;background-image: -moz-linear-gradient(top, #44CBFF, #34B0EE);background-image: -webkit-gradient(linear, 0 0, 0 100%, from(#44CBFF), to(#34B0EE));background-image: -webkit-linear-gradient(top, #44CBFF, #34B0EE);background-image: -o-linear-gradient(top, #44CBFF, #34B0EE);background-image: linear-gradient(to bottom, #64CBFF, #54B0EE);background-repeat: repeat-x;filter: progid:dximagetransform.microsoft.gradient(startColorstr='#FF44CBFF', endColorstr='#FF34B0EE', GradientType=0);}
tr.corresponding-item-hover > td, tr.corresponding-item-hover > th {background-color: #D6FFE4 !important;}
.tooltip{position:absolute;z-index:1030;display:block;visibility:visible;font-size:11px;line-height:1.4;opacity:0;filter:alpha(opacity=0);}.tooltip.in{opacity:0.8;filter:alpha(opacity=80);}
.tooltip.top{margin-top:-3px;padding:5px 0;}
.tooltip.right{margin-left:3px;padding:0 5px;}
.tooltip.bottom{margin-top:3px;padding:5px 0;}
.tooltip.left{margin-left:-3px;padding:0 5px;}
.tooltip-inner{max-width:200px;padding:8px;color:#ffffff;text-align:center;text-decoration:none;background-color:#000000;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}
.tooltip-arrow{position:absolute;width:0;height:0;border-color:transparent;border-style:solid;}
.tooltip.top .tooltip-arrow{bottom:0;left:50%;margin-left:-5px;border-width:5px 5px 0;border-top-color:#000000;}
.tooltip.right .tooltip-arrow{top:50%;left:0;margin-top:-5px;border-width:5px 5px 5px 0;border-right-color:#000000;}
.tooltip.left .tooltip-arrow{top:50%;right:0;margin-top:-5px;border-width:5px 0 5px 5px;border-left-color:#000000;}
.tooltip.bottom .tooltip-arrow{top:0;left:50%;margin-left:-5px;border-width:0 5px 5px;border-bottom-color:#000000;}
.fade{opacity:0;-webkit-transition:opacity 0.15s linear;-moz-transition:opacity 0.15s linear;-o-transition:opacity 0.15s linear;transition:opacity 0.15s linear;}
.fade.in{opacity:1;}
.tooltip{font-weight: normal;}
.vis.timeline .item .tooltip-inner {max-width: unset !important;}
.vis.timeline .item.dot {position: absolute;padding: 0;border-width: 4px;border-style: solid;border-radius: 4px;}
.vis.timeline .item.box {text-align: center;border-style: solid;border-radius: 2px;}
.vis.timeline .item.line{padding:0;position:absolute;width:0;border-left-width:1px;border-left-style:solid}
.vis.timeline .item.executor.added {background-color: #A0DFFF;border-color: #3EC0FF;}
-->

Details for pure RDD job

Event Timeline

Enable zooming

Completed Stages (7)

Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write
6 2019/01/30 15:58:43 94 ms
41/41

 
 
    235.4 KB  
5 2019/01/30 15:58:42 0.4 s
41/41

 
 
    382.9 KB 235.4 KB
4 2019/01/30 15:58:42 0.1 s
41/41

 
 
    99.2 KB 246.0 KB
2 2019/01/30 15:58:41 1 s
41/41

 
 
    765.8 KB 99.2 KB
1 2019/01/30 15:58:38 3 s
41/41

 
 
      750.1 KB
0 2019/01/30 15:58:38 3 s
1/1

 
 
      15.7 KB
3 2019/01/30 15:58:38 4 s
41/41

 
 
      137.0 KB

  可以看到,产品信息被转换为pairRDD要花4秒,城市信息和点击信息要花3秒.而之前的实验的运行时间却是零点几秒.说明这里可能有自动缓存,把之前的运行结果直接拿来用了

  这3个步骤是并行的,花的时间也缩小了.运行时间:5秒

Details for pure RDD job with map join

Event Timeline

Enable zooming

Completed Stages (3)

Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write
3 2019/01/30 16:00:23 0.2 s
41/41

 
 
    246.7 KB  
2 2019/01/30 16:00:22 0.5 s
41/41

 
 
    477.6 KB 246.8 KB
1 2019/01/30 16:00:17 5 s
41/41

 
 
      478.2 KB

  估计是map join很占内存的理由,承载城市信息和点击记录的mapToPair运行时间被延长了.运行时间:6秒

Details for original job

Event Timeline

Enable zooming

Completed Stages (7)

Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write
6 2019/01/30 16:04:04 0.8 s
200/200

 
 
    865.5 KB  
5 2019/01/30 16:03:58 6 s
200/200 (2 failed)

 
 
    899.9 KB 869.3 KB
3 2019/01/30 16:03:56 1 s
200/200

 
 
    224.2 KB 733.2 KB
2 2019/01/30 16:03:55 2 s
41/41

 
 
    766.0 KB 224.3 KB
4 2019/01/30 16:03:50 3 s
41/41

 
 
      159.9 KB
1 2019/01/30 16:03:49 6 s
41/41

 
 
      750.3 KB
0 2019/01/30 16:03:49 3 s
1/1

 
 
      15.7 KB

  数据量最多的点击记录mapToPair耗费时间最长,为6秒

  其他的对应操作耗时都不低于纯RDD版本对应操作,特别是collect前面2个操作,纯RDD程序不用1秒就能跑完.

  据前面的too many open files错误,可以推定SQL操作是在本地创建文件读写的,加上某些SQL语句对业务处理步骤不如RDD简洁,严重拖慢了运行时间,运行时间:16秒

Details for pure sparkSQL job

Event Timeline

Enable zooming

Completed Stages (7)

Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write
6 2019/01/30 16:08:23 0.8 s
200/200

 
 
    869.0 KB  
5 2019/01/30 16:08:21 2 s
200/200 (1 failed)

 
 
    894.1 KB 870.2 KB
3 2019/01/30 16:08:20 1 s
200/200

 
 
    224.2 KB 733.4 KB
2 2019/01/30 16:08:18 1 s
200/200

 
 
    405.2 KB 224.6 KB
4 2019/01/30 16:08:01 4 s
41/41

 
 
      159.9 KB
1 2019/01/30 16:08:01 17 s
1/1

 
 
      4.0 KB
0 2019/01/30 16:08:01 6 s
41/41 (1 failed)

 
 
      401.8 KB

  本身sparkSQL就很慢,前面2步操作被SQL化之后更慢了...运行时间:22秒

Spark大型电商项目实战-及其改良(4) 单独运行程序发现的问题的更多相关文章

  1. Spark大型电商项目实战-及其改良之番外(1)-将spark前端页面效果高效拷贝至博客

    Spark大型电商项目实战-及其改良这个系列的时间轴展示图一直在变....1-3篇是用图直接表示时间轴,用一段简陋的html代码表示时间表.第4篇开始才是用比较完整的前端效果,能移动.缩放时间轴,鼠标 ...

  2. Spark大型电商项目实战-及其改良(3) 分析sparkSQL语句的性能影响

    之前的运行数据被清除了,只能再运行一次,对比一下sparkSQL语句的影响 纯SQL的时间 对应时间表 th:first-child,.table-bordered tbody:first-child ...

  3. Spark大型电商项目实战-及其改良(1) 比对sparkSQL和纯RDD实现的结果

    代码存在码云:https://coding.net/u/funcfans/p/sparkProject/git 代码主要学习https://blog.csdn.net/u012318074/artic ...

  4. Spark大型电商项目实战-及其改良(2) RDD优化效果不稳定的真正原因

    首先看没有map join的第2任务: 时间线如下 接着是对应id的算子计算时间表 Stage Id Description Submitted Duration Tasks: Succeeded/T ...

  5. 16套java架构师,高并发,高可用,高性能,集群,大型分布式电商项目实战视频教程

    16套Java架构师,集群,高可用,高可扩展,高性能,高并发,性能优化,设计模式,数据结构,虚拟机,微服务架构,日志分析,工作流,Jvm,Dubbo ,Spring boot,Spring cloud ...

  6. Java 18套JAVA企业级大型项目实战分布式架构高并发高可用微服务电商项目实战架构

    Java 开发环境:idea https://www.jianshu.com/p/7a824fea1ce7 从无到有构建大型电商微服务架构三个阶段SpringBoot+SpringCloud+Solr ...

  7. SpringBoot电商项目实战 — ElasticSearch接入实现

    如今在一些中大型网站中,搜索引擎已是必不可少的内容了.首先我们看看搜索引擎到底是什么呢?搜索引擎,就是根据用户需求与一定算法,运用特定策略从互联网检索出制定信息反馈给用户的一门检索技术.搜索引擎依托于 ...

  8. SpringBoot电商项目实战 — 前后端分离后的优雅部署及Nginx部署实现

    在如今的SpringBoot微服务项目中,前后端分离已成为业界标准使用方式,通过使用nginx等代理方式有效的进行解耦,并且前后端分离会为以后的大型分布式架构.弹性计算架构.微服务架构.多端化服务(多 ...

  9. C# 大型电商项目性能优化(一)

    经过几个月的忙碌,我厂最近的电商平台项目终于上线,期间遇到的问题以及解决方案,也可以拿来和大家多做交流了. 我厂的项目大多采用C#.net,使用逐渐发展并流行起来的EF(Entity Framewor ...

随机推荐

  1. 织梦手机站下一篇变上一篇而且还出错Request Error!

    最新的织梦dedecms程序手机版下一篇变上一篇而且还出错Request Error!,这是因为官方写错了一个地方 打开 /include/arc.archives.class.php 找到 $mli ...

  2. oracle中的listener.ora和tnsnames.ora

    一.oracle的客户端与服务器端 oracle在安装完成后服务器和客户端都需要进行网络配置才能实现网络连接.    服务器端配置监听器,客户端配置网络服务名. 服务器端可配置一个或多个监听程序 . ...

  3. 自学stm32就要记住入了这个“大坑”要耐得住寂寞

    在现在的MCU使用量中,STM32绝对是翘楚!因为现在使用STM32开发产品的公司非常多,这主要得益于ST公司对自家MCU的大力推广,而且ST对自己MCU也配套了一系列开发软件,也有相应的硬件开发板供 ...

  4. webapi 利用webapiHelp和swagger生成接口文档

    webapi 利用webapiHelp和swagger生成接口文档.均依赖xml(需允许项目生成注释xml) webapiHelp:微软技术自带,仅含有模块.方法.请求-相应参数的注释. swagge ...

  5. STREAMING HIVE流过滤 官网例子 注意中间用的py脚本

    Simple Example Use Cases MovieLens User Ratings First, create a table with tab-delimited text file f ...

  6. Redis主从复制详解

    1. 概述 主从复制:主节点负责写数据,从节点负责读数据,主节点定期把数据同步到从节点保证数据的一致性 2. 主从复制的相关操作 (1)配置文件:在从服务器的配置文件中加入 slaveof<ma ...

  7. MySql 外键重名问题

    在使用mysql workbench 来设计数据库模型时可能一不注意就会出现外键约束重名的情况,并在执行sql语句是会报这样的错误: Error 1022 - Can't write; duplica ...

  8. webpack 4.0 配置文件 webpack.config.js文件的放置位置

    一般webpack.config.js是默认放在根目录的,不在根目录的时候需要在package.json中制定位置,我的配置文件目录是config/webpack.config.js,在package ...

  9. vue深度监控数据改变,缓存数据到本地

    项目效果图: var vm = new Vue({ el:'#app', data:{ students:[], }, watch:{ students:{ handler(){ localStora ...

  10. 使用Apache JMeter对SQL Server、Mysql、Oracle压力测试(二)

    接着第一篇的写: 第三步,测试SQL Server数据库的性能: a.加载JDBC SQL Server驱动.添加线程组和设置线程属性和第二步一样,就不再赘述了: b.设置JDBC Connectio ...