基于sparksql调用shell脚本运行SQL

[Author]: kwu

基于sparksql调用shell脚本运行SQL，sparksql提供了类似hive中的 -e , -f ,-i的选项

1、定时调用脚本

#!/bin/sh

# upload logs to hdfs  

yesterday=`date --date='1 days ago' +%Y%m%d`  

/opt/modules/spark/bin/spark-sql -i /opt/bin/spark_opt/init.sql --master spark://10.130.2.20:7077 --executor-memory 6g --total-executor-cores 45 --conf spark.ui.port=4075   -e "\

insert overwrite table st.stock_realtime_analysis PARTITION (DTYPE='01' )

  select t1.stockId as stockId,

         t1.url as url,

         t1.clickcnt as clickcnt,

         0,

         round((t1.clickcnt / (case when t2.clickcntyesday is null then   0 else t2.clickcntyesday end) - 1) * 100, 2) as LPcnt,

         '01' as type,

         t1.analysis_date as analysis_date,

         t1.analysis_time as analysis_time

    from (select stock_code stockId,

                 concat('http://stockdata.stock.hexun.com/', stock_code,'.shtml') url,

                 count(1) clickcnt,

                 substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),1,10) analysis_date,

                 substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),12,8) analysis_time

            from dms.tracklog_5min

           where stock_type = 'STOCK'

             and day =

                 substr(from_unixtime(unix_timestamp(), 'yyyyMMdd'), 1, 8)

           group by stock_code

           order by clickcnt desc limit 20) t1

    left join (select stock_code stockId, count(1) clickcntyesday

                 from dms.tracklog_5min a

                where stock_type = 'STOCK'

                  and substr(datetime, 1, 10) = date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'),1)

                  and substr(datetime, 12, 5) <substr(from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'), 12, 5)

                  and day = '${yesterday}'

                group by stock_code) t2

      on t1.stockId = t2.stockId;

  "\

sqoop export  --connect jdbc:mysql://10.130.2.245:3306/charts   --username guojinlian  --password Abcd1234  --table stock_realtime_analysis  --fields-terminated-by '\001' --columns "stockid,url,clickcnt,splycnt,lpcnt,type" --export-dir /dw/st/stock_realtime_analysis/dtype=01;

init.sql内容为载入udf:

add jar /opt/bin/UDF/hive-udf.jar;

create temporary function udtf_stockidxfund as 'com.hexun.hive.udf.stock.UDTFStockIdxFund';

create temporary function udf_getbfhourstime as 'com.hexun.hive.udf.time.UDFGetBfHoursTime';

create temporary function udf_getbfhourstime2 as 'com.hexun.hive.udf.time.UDFGetBfHoursTime2';

create temporary function udf_stockidxfund as 'com.hexun.hive.udf.stock.UDFStockIdxFund';

create temporary function udf_md5 as 'com.hexun.hive.udf.common.HashMD5UDF';

create temporary function udf_murhash as 'com.hexun.hive.udf.common.HashMurUDF';

create temporary function udf_url as 'com.hexun.hive.udf.url.UDFUrl';

create temporary function url_host as 'com.hexun.hive.udf.url.UDFHost';

create temporary function udf_ip as 'com.hexun.hive.udf.url.UDFIP';

create temporary function udf_site as 'com.hexun.hive.udf.url.UDFSite';

create temporary function udf_UrlDecode as 'com.hexun.hive.udf.url.UDFUrlDecode';

create temporary function udtf_url as 'com.hexun.hive.udf.url.UDTFUrl';

create temporary function udf_ua as 'com.hexun.hive.udf.useragent.UDFUA';

create temporary function udf_ssh as 'com.hexun.hive.udf.useragent.UDFSSH';

create temporary function udtf_ua as 'com.hexun.hive.udf.useragent.UDTFUA';

create temporary function udf_kw as 'com.hexun.hive.udf.url.UDFKW';

create temporary function udf_chdecode as 'com.hexun.hive.udf.url.UDFChDecode';

设置ui的port

--conf spark.ui.port=4075

默觉得4040，会与其它正在跑的任务冲突，这里改动为4075

设定任务使用的内存与CPU资源

--executor-memory 6g --total-executor-cores 45

原来的语句是用hive
-e 运行的，改动为spark后速度大加快了。

原来为15min，提升速度后为 45s.

基于sparksql调用shell脚本运行SQL的更多相关文章

Java 调用 shell 脚本详解
这一年的项目中,有大量的场景需要Java 进程调用 Linux的bash shell 脚本实现相关功能. 从之前的项目中拷贝的相关模块和网上的例子来看,有个别的“陷阱”造成调用shell 脚本在某些特 ...
Centos下使用php调用shell脚本
我们在实际项目中或许会遇到php调用shell脚本的需求.下面就用简单案例在Centos环境下实践准备查看php.ini中配置是否打开安全模式 //php.ini safe_mode = //这个 ...
用java代码调用shell脚本执行sqoop将hive表中数据导出到mysql
1:创建shell脚本 touch sqoop_options.sh chmod 777 sqoop_options.sh 编辑文件特地将执行map的个数设置为变量测试可以java代码传参数 ...
Python 调用 Shell脚本的方法
Python 调用 Shell脚本的方法 1．os模块的popen方法通过 os.popen() 返回的是 file read 的对象,对其进行读取 read() 的操作可以看到执行的输出. > ...
[Python]在python中调用shell脚本,并传入参数-02python操作shell实例
首先创建2个shell脚本文件,测试用. test_shell_no_para.sh 运行时,不需要传递参数 test_shell_2_para.sh 运行时,需要传递2个参数 test_shell ...
Spring Boot 实现看门狗功能 (调用 Shell 脚本)
需要实现看门狗功能,定时检测另外一个程序是否在运行,使用 crontab 仅可以实现检测程序是否正在运行,无法做到扩展,如:手动重启.程序升级(如果只需要实现自动升级功能可以使用 inotify)等功 ...
【原】Gradle调用shell脚本和python脚本并传参
最近由于项目自动化构建的需要,研究了下gradle调用脚本并传参的用法,在此作个总结. Pre build.gradle中定义了$jenkinsJobName $jenkinsBuild两个Jenki ...
调用shell脚本，IP处理
//调用shell脚本,IP处理 package com.letv.sdns.web.utils; import org.slf4j.Logger; import org.slf4j.LoggerFa ...
C程序调用shell脚本共有三种方法
C程序调用shell脚本共有三种法子 :system().popen().exec系列函数call_exec1.c ,内容为:system() 不用你自己去产生进程,它已经封装了,直接加入自己的命令e ...

随机推荐

mysql改动用户password
登录root用户用root用户登录控制台. use mysql use mysql,mysql是mysql数据库自己主动创建的一个数据库. 改动user表的数据 update user set pa ...
【Java并发编程实战】—–synchronized
在我们的实际应用其中可能常常会遇到这样一个场景:多个线程读或者.写相同的数据,訪问相同的文件等等.对于这样的情况假设我们不加以控制,是非常easy导致错误的. 在java中,为了解决问题,引入临界区概 ...
Hadoop自学笔记（二）HDFS简单介绍
1. HDFS Architecture 一种Master-Slave结构.包括Name Node, Secondary Name Node,Data Node Job Tracker, Task T ...
malloc和new出来的地址都是虚拟地址你就说内存管理单元怎么可能让你直接操作硬件内存地址！
malloc的实现与物理内存自然是无关的,内核为每个进程维护一张页表,页表存储进程空间内每页的虚拟地址,页表项中有的虚拟内存页对应着某个物理内存页面,也有的虚拟内存页没有实际的物理页面对应.无论mal ...
Spark RDD概念学习系列之transformation操作
不多说,直接上干货! transformation操作惰性求值 (1)RDD 的转化操作都是惰性求值的.这意味着在被调用行动操作之前Spark不会开始计算. (2)读取数据到RDD的操作也是惰性的. ...
Spark RDD概念学习系列之典型RDD的特征
不多说,直接上干货!
Ubuntu 14.04安装Skype
Skype 4.3版本在14.04 LTS工作正常.安装步骤: $ sudo apt-get remove skype skype-bin:i386 skype:i386 $ sudo apt-get ...
通过修改路由，或者增加Route属性来控制访问webApi的路径
可以通过RouteConfig.cs文件中的路由规则来控制通过为每个方法增加单独的[Route(“api/xx类/xx方法”)]
敬请关注 Linr 公众号
洛谷P1291 [SHOI2002]百事世界杯之旅(期望DP)
题目描述 “……在2002年6月之前购买的百事任何饮料的瓶盖上都会有一个百事球星的名字.只要凑齐所有百事球星的名字,就可参加百事世界杯之旅的抽奖活动,获得球星背包,随声听,更克赴日韩观看世界杯.还不赶 ...

基于sparksql调用shell脚本运行SQL

基于sparksql调用shell脚本运行SQL的更多相关文章

随机推荐

热门专题