9.spark Core 进阶2--Cashe
Storage Level
|
Meaning
|
MEMORY_ONLY
|
Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time they're needed. This is the default level.
|
MEMORY_AND_DISK
|
Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, store the partitions that don't fit on disk, and read them from there when they're needed.
|
MEMORY_ONLY_SER (Java and Scala)
|
Store RDD as serialized Java objects (one byte array per partition). This is generally more space-efficient than deserialized objects, especially when using a fast serializer, but more CPU-intensive to read.
|
MEMORY_AND_DISK_SER (Java and Scala)
|
Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in memory to disk instead of recomputing them on the fly each time they're needed.
|
DISK_ONLY
|
Store the RDD partitions only on disk.
|
MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.
|
Same as the levels above, but replicate each partition on two cluster nodes.
|
OFF_HEAP (experimental)
|
Similar to MEMORY_ONLY_SER, but store the data in off-heap memory. This requires off-heap memory to be enabled.
|
- If your RDDs fit comfortably with the default storage level (MEMORY_ONLY), leave them that way. This is the most CPU-efficient option, allowing operations on the RDDs to run as fast as possible.
- If not, try using MEMORY_ONLY_SER and selecting a fast serialization library to make the objects much more space-efficient, but still reasonably fast to access. (Java and Scala)
- Don’t spill to disk unless the functions that computed your datasets are expensive, or they filter a large amount of the data. Otherwise, recomputing a partition may be as fast as reading it from disk.
- Use the replicated storage levels if you want fast fault recovery (e.g. if using Spark to serve requests from a web application). All the storage levels provide full fault tolerance by recomputing lost data, but the replicated ones let you continue running tasks on the RDD without waiting to recompute a lost partition.
9.spark Core 进阶2--Cashe的更多相关文章
- 8.spark Core 进阶1
(e.g. standalone manager, Mesos, YARN) In "cluster" mode, the framework launches the ...
- Spark 3.x Spark Core详解 & 性能优化
Spark Core 1. 概述 Spark 是一种基于内存的快速.通用.可扩展的大数据分析计算引擎 1.1 Hadoop vs Spark 上面流程对应Hadoop的处理流程,下面对应着Spark的 ...
- Spark Streaming揭秘 Day35 Spark core思考
Spark Streaming揭秘 Day35 Spark core思考 Spark上的子框架,都是后来加上去的.都是在Spark core上完成的,所有框架一切的实现最终还是由Spark core来 ...
- 【Spark Core】任务运行机制和Task源代码浅析1
引言 上一小节<TaskScheduler源代码与任务提交原理浅析2>介绍了Driver側将Stage进行划分.依据Executor闲置情况分发任务,终于通过DriverActor向exe ...
- TypeError: Error #1034: 强制转换类型失败:无法将 mx.controls::DataGrid@9a7c0a1 转换为 spark.core.IViewport。
1.错误描述 TypeError: Error #1034: 强制转换类型失败:无法将 mx.controls::DataGrid@9aa90a1 转换为 spark.core.IViewport. ...
- Spark Core
Spark Core DAG概念 有向无环图 Spark会根据用户提交的计算逻辑中的RDD的转换(变换方法)和动作(action方法)来生成RDD之间的依赖关系,同时 ...
- Spark Streaming 进阶与案例实战
Spark Streaming 进阶与案例实战 1.带状态的算子: UpdateStateByKey 2.实战:计算到目前位置累积出现的单词个数写入到MySql中 1.create table CRE ...
- spark core (二)
一.Spark-Shell交互式工具 1.Spark-Shell交互式工具 Spark-Shell提供了一种学习API的简单方式, 以及一个能够交互式分析数据的强大工具. 在Scala语言环境下或Py ...
- Spark Core 资源调度与任务调度(standalone client 流程描述)
Spark Core 资源调度与任务调度(standalone client 流程描述) Spark集群启动: 集群启动后,Worker会向Master汇报资源情况(实际上将Worker的资 ...
随机推荐
- [USACO06JAN]牛的舞会The Cow Prom
题目描述 The N (2 <= N <= 10,000) cows are so excited: it's prom night! They are dressed in their ...
- pytest-mark跳过
import pytestimport sysenvironment='android' @pytest.mark.skipif(environment=="android",re ...
- 只用200行Go代码写一个自己的区块链!(转)
区块链是目前最热门的话题,广大读者都听说过比特币,或许还有智能合约,相信大家都非常想了解这一切是如何工作的.这篇文章就是帮助你使用 Go 语言来实现一个简单的区块链,用不到 200 行代码来揭示区块链 ...
- 2019-4-15-VisualStudio-如何在-NuGet-包里面同时包含-DEBUG-和-RELEASE-的库
title author date CreateTime categories VisualStudio 如何在 NuGet 包里面同时包含 DEBUG 和 RELEASE 的库 lindexi 20 ...
- 格式化抽象本地地址(实战linux socket编程)
格式化抽象本地地址传统AF_UNIX套接口名字的麻烦之一就在于总是调用文件系统对象.这不是必须的,而且也不方便.如果原始的文件系统对象并没有删除,而在bind调用时使用相同的文件名,名字赋值就会失败. ...
- 利用VUE-CLI脚手架搭建VUE项目
前言 在学习完vue基础语法之后,学着利用vue-cli脚手架搭建一个项目,本篇随笔主要记录搭建的过程,供大家一起学习. 具体内容 搭建vue项目的准备工作 1.安装Nodejs.NPM以及VSCod ...
- ElementUI的Loading组件 —— 想实现在请求后台数据之前开启Loading组件,请求成功或失败之后,关闭Loading组件
我在实际项目开发中,遇到了这个需求,记录一下~~~~~~ 在ElementUI官网上有几种实现Loading的方法,但官网上是在一个方法里写了开启与关闭组件,所以可以根据官网的实现方法进行一个封装,便 ...
- HashMap 1.7 与 1.8 的 区别,说明 1.8 做了哪些优化,如何优化的
JDK1.7用的链表散列结构,JDK1.8用的红黑树 在扩充HashMap的时候,JDK1.7的重新计算hash, JDK1.7只需要看看原来的hash值新增的那个bit是1还是0就好了,是0的话索引 ...
- vue tabNav 点击
<template> <div class="content"> <header class="tab_nav"> < ...
- C# using System.Windows.Media.Imaging;该引用哪个dll
在网上看到BitmapSource和WriteableBitmap一些类听说是用 using System.Windows.Media.Imaging:可是我发现VS中没有什么System.Windo ...