spark geoip

import java.io.File

import scala.io.Source

import com.sanoma.cda.geoip.MaxMindIpGeo

import com.sanoma.cda.geo.Point

import java.io.PrintWriter

val geoIp = MaxMindIpGeo("/data/elas-input/GeoIP2-City.mmdb", 1000,synchronized = true)

def iter_dir(srcDir:String,dstDir:String): Unit ={

  val files = (new File(srcDir)).listFiles().filter(_.isFile)

  for( item <- files){

    println(item.getName)

    val dstname = item.getName

    val out = new PrintWriter(s"""${dstDir}/${dstname}""")

    for(line <- Source.fromFile(item).getLines()){

      val it = line.split("\t")

      val geo = geoIp.getLocation(it(0))

      if(geo.isEmpty){

        out.printf("%s,%s,%s,%s\n",it(0),it(1),it(2),it(3),it(4),"")

      }

      else{

        val geoGet = geo.get

        val countryCode = geoGet.countryCode.getOrElse("")

        val countryName = geoGet.countryName.getOrElse("")

        val region = geoGet.region.getOrElse("")

        val city = geoGet.city.getOrElse("")

        val geoPoint = geoGet.geoPoint

        val latitude = if(geoPoint.isEmpty) "" else geoPoint.get.latitude.toString

        val longitude = if(geoPoint.isEmpty) "" else geoPoint.get.longitude.toString

        val postalCode = geoGet.postalCode.getOrElse("")

        val continent = geoGet.continent.getOrElse("")

        out.printf("%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n",it(0),it(1),it(2),it(3),it(4),countryCode,countryName,region,city,latitude,longitude,postalCode,continent,it(5))

      }

    }

    out.close()

  }

}

iter_dir("/data/elas-input/uniqServiceDir","/data/elas-input/tsoutput")

val str2 = "North Amercia"

val index = str.indexOf(str2)

val index2 = str.length + index + 1

val content = str.substring(index2)

spark geoip的更多相关文章

spark streaming 使用geoIP解析IP
1.首先将GEOIP放到服务器上,如,/opt/db/geo/GeoLite2-City.mmdb 2.新建scala sbt工程,测试是否可以顺利解析 import java.io.Fileimpo ...
geoip ip2region2 with spark
上一篇文章中我使用 maxmind的免费库开发了一个waterdrop的插件,测试数据发现,国内的有些市级还是不准确,而且香港并不是显示中国,这就不友好了. 找了一下,发下 ip2region 这 ...
spark操作geoip的domain数据库
val ipv4 = sc.textFile("hdfs://hbase11:9000/sparkTsData/GeoIP2-Domain-Blocks-IPv4.csv").ma ...
geoip scala api
#!/bin/bash /home/hadoop/spark-1.6.2/bin/spark-shell --master spark://hbase11:7077 --executor-memory ...
使用Java编写并运行Spark应用程序
我们首先提出这样一个简单的需求: 现在要分析某网站的访问日志信息,统计来自不同IP的用户访问的次数,从而通过Geo信息来获得来访用户所在国家地区分布状况.这里我拿我网站的日志记录行示例,如下所示: 1 ...
Spark踩坑记——Spark Streaming+Kafka
[TOC] 前言在WeTest舆情项目中,需要对每天千万级的游戏评论信息进行词频统计,在生产者一端,我们将数据按照每天的拉取时间存入了Kafka当中,而在消费者一端,我们利用了spark strea ...
Spark RDD 核心总结
摘要: 1.RDD的五大属性 1.1 partitions(分区) 1.2 partitioner(分区方法) 1.3 dependencies(依赖关系) 1.4 compute(获取分区迭代列表) ...
spark处理大规模语料库统计词汇
最近迷上了spark,写一个专门处理语料库生成词库的项目拿来练练手, github地址:https://github.com/LiuRoy/spark_splitter.代码实现参考wordmaker ...
Hive on Spark安装配置详解（都是坑啊）
个人主页:http://www.linbingdong.com 简书地址:http://www.jianshu.com/p/a7f75b868568 简介本文主要记录如何安装配置Hive on Sp ...

随机推荐

android 按钮点击效果实现在studio下出现的错误
在照做上一篇随笔的时候在studio下为了方便我在写完一个 btn_select.xml 文件后直接粘贴了三个文件到drawable下结果问题来了总是报这样一个错误: Resource is n ...
jax-ws开发总结
服务端开发步骤: 1.定义SEI,即java中的接口 2.定义SEI的实现类,使用@webservice注解标记它是一个webservice服务类 3.发布服务客户端开发步骤:使用jdk的servi ...
了解ASP.NET MVC几种ActionResult的本质：JavaScriptResult & JsonResult
在之前的两篇文章(<EmptyResult & ContentResult>和<FileResult>)我们剖析了EmptyResult.ContentResult和F ...
iOS开发--QQ音乐练习,歌词的展示,歌词的滚动,歌词的颜色变化
一.歌词的展示 -- 首先歌词是在scrollView上,scrollView的大小是两个屏幕的宽度 scrollView滚动修改透明度的代码 ...
git组成结构
1. blob对象(blob) 2. 目录树(tree) 3. 提交(commit) 4. 标签(tag) git 文件按照状态分为3类: 1. 已追踪的(tracked) 2. 被忽略的(Ignor ...
mybatis学习（一） mybatis框架的特性
mybatis 的源代码地址是https://github.com/mybatis/mybatis-3/ 以及相关文档 All the information i get from http://ww ...
hoj2662 状态压缩dp
Pieces Assignment My Tags (Edit) Source : zhouguyue Time limit : 1 sec Memory limit : 64 M S ...
U盘常见问题汇总
优盘常见问题,持续更新.大家有什么问题可以留言,一起解决,谢谢. 1.优盘中的文件全部变成快捷方式解决办法打开优盘,查找updat.vbs文件脚本,此文件脚本为病毒脚本,若找不到文件脚本则开启隐藏文 ...
Android 自定义Activity基类与TitleBar
我们在开发App的时候有时候碰到多个界面有一个共同点的时候,比如,都有相同的TitleBar,并且TitleBar可以设置显示的文字.TitleBar上的点击事件,如果给每一个Activity都写一遍 ...
【Codeforces 723D】Lakes in Berland （dfs）
海洋包围的小岛,岛内的有湖,'.'代表水,'*'代表陆地,给出的n*m的地图里至少有k个湖,求填掉面积尽量少的水,使得湖的数量正好为k. dfs找出所有水联通块,判断一下是否是湖(海水区非湖).将湖按 ...

spark geoip

spark geoip的更多相关文章

随机推荐

热门专题