http://solr-vs-elasticsearch.com/

Apache Solr vs Elasticsearch

The Feature Smackdown

API

Feature	Solr 6.2.1	ElasticSearch 5.0
Format	XML, CSV, JSON	JSON
HTTP REST API
Binary API	SolrJ	TransportClient, Thrift (through a plugin)
JMX support		ES specific stats are exposed through the REST API
Official client libraries	Java	Java, Groovy, PHP, Ruby, Perl, Python, .NET, Javascript Official list of clients
Community client libraries	PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, Clojure	Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x Complete list
3rd-party product integration (open-source)	Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna)	Drupal, Django, Symfony2, Wordpress, CouchBase
3rd-party product integration (commercial)	DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR	SearchBlox, Hortonworks Data Platform, MapR etc Complete list
Output	JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java	JSON, XML/HTML (via plugin)

Infrastructure

Feature	Solr 6.2.1	ElasticSearch 5.0
Master-slave replication	Only in non-SolrCloud. In SolrCloud, behaves identically to ES.	Not an issue because shards are replicated across nodes.
Integrated snapshot and restore	Filesystem	Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories

Indexing

Feature	Solr 6.2.1	ElasticSearch 5.0
Data Import	DataImportHandler - JDBC, CSV, XML, Tika, URL, Flat File	[DEPRECATED in 2.x] Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues
Partial Doc Updates	with stored fields	with _source field
Custom Analyzers and Tokenizers
Per-field analyzer chain
Per-doc/query analyzer chain
Index-time synonyms		Supports Solr and Wordnet synonym format
Query-time synonyms	especially via hon-lucene-synonyms	Technically, yes, but practically no because multi-word/phrase query-time synonyms are not supported. See ES docs and hon-lucene-synonyms blog for nuances.
Multiple indexes
Near-Realtime Search/Indexing
Complex documents
Schemaless	4.4+
Multiple document types per schema	One set of fields per schema, one schema per core
Online schema changes	Schemaless mode or via dynamic fields.	Only backward-compatible changes.
Apache Tika integration
Dynamic fields
Field copying		via multi-fields
Hash-based deduplication		Murmur plugin or ER plugin

Searching

Feature	Solr 6.2.1	ElasticSearch 5.0
Lucene Query parsing
Structured Query DSL	Need to programmatically create queries if going beyond Lucene query syntax.
Span queries	via SOLR-2703
Spatial/geo search
Multi-point spatial search
Faceting		Top N term accuracy can be controlled with shard_size
Advanced Faceting	New JSON faceting API as of Solr 5.x	blog post
Geo-distance Faceting
Pivot Facets
More Like This
Boosting by functions
Boosting using scripting languages
Push Queries	JIRA issue	Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping
Query Re-Ranking		via Rescoring or a plugin
Index-based Spellcheck		Phrase Suggester
Wordlist-based Spellcheck
Autocomplete
Query elevation		workaround
Intra-index joins	via parent-child query	via has_children and top_children queries
Inter-index joins	Joined index has to be single-shard and replicated across all nodes.
Resultset Scrolling	New to 4.7.0	via scan search type
Filter queries		also supports filtering by native scripts
Filter execution order	local params and cache property
Alternative QueryParsers	DisMax, eDisMax	query_string, dis_max, match, multi_match etc
Negative boosting	but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes	it can search across multiple compatible collections
Result highlighting
Custom Similarity
Searcher warming on index reload		Warmers API
Term Vectors API

Customizability

Feature	Solr 6.2.1	ElasticSearch 5.0
Pluggable API endpoints
Pluggable search workflow	via SearchComponents
Pluggable update workflow	via UpdateRequestProcessor
Pluggable Analyzers/Tokenizers
Pluggable QueryParsers
Pluggable Field Types
Pluggable Function queries
Pluggable scoring scripts
Pluggable hashing
Pluggable webapps		[site plugins DEPRECATED in 5.x] blog post
Automated plugin installation		Installable from GitHub, maven, sonatype or elasticsearch.org

Distributed

Feature	Solr 6.2.1	ElasticSearch 5.0
Self-contained cluster	Depends on separate ZooKeeper server	Only Elasticsearch nodes
Automatic node discovery	ZooKeeper	internal Zen Discovery or ZooKeeper
Partition tolerance	The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function.	Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See this
Automatic failover	If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards.
Automatic leader election
Shard replication
Sharding
Automatic shard rebalancing		it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags.
Change # of shards	Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime.	each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime.
Shard splitting
Relocate shards and replicas	can be done by creating a shard replicate on the desired node and then removing the shard from the source node	can move shards and replicas to any node in the cluster on demand
Control shard routing	shards or _route_ parameter	routing parameter
Pluggable shard/replica assignment	Rule-based replica assignment	Probabilistic shard balancing with Tempest plugin
Consistency	Indexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index.	Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available.

Misc

Feature	Solr 6.2.1	ElasticSearch 5.0
Web Admin interface	bundled with Solr	Marvel or Kibana apps
Visualisation	Banana (Port of Kibana)	Kibana
Hosting providers	WebSolr, Searchify, Hosted-Solr, IndexDepot, OpenSolr, gotosolr	Found, ObjectRocket, bonsai.io, Indexisto, qbox.io, IndexDepot, Compose.io, Sematext Logsene

Thoughts...

I'm embedding my answer to this "Solr-vs-Elasticsearch" Quora question verbatim here:

1. Elasticsearch was born in the age of REST APIs. If you love REST APIs, you'll probably feel more at home with ES from the get-go. I don't actually think it's 'cleaner' or 'easier to use', but just that it is more aligned with web 2.0 developers' mindsets.

2. Elasticsearch's Query DSL syntax is really flexible and it's pretty easy to write complex queries with it, though it does border on being verbose. Solr doesn't have an equivalent, last I checked. Having said that, I've never found Solr's query syntax wanting, and I've always been able to easily write a custom SearchComponent if needed (more on this later).

3. I find Elasticsearch's documentation to be pretty awful. It doesn't help that some examples in the documentation are written in YAML and others in JSON. I wrote a ES code parser once to auto-generate documentation from Elasticsearch's source and found a number of discrepancies between code and what's documented on the website, not to mention a number of undocumented/alternative ways to specify the same config key.

By contrast, I've found Solr to be consistent and really well-documented. I've found pretty much everything I've wanted to know about querying and updating indices without having to dig into code much. Solr's schema.xml and solrconfig.xml are *extensively* documented with most if not all commonly used configurations.

4. Whilst what Rick says about ES being mostly ready to go out-of-box is true, I think that is also a possible problem with ES. Many users don't take the time to do the most simple config (e.g. type mapping) of ES because it 'just works' in dev, and end up running into issues in production.

And once you do have to do config, then I personally prefer Solr's config system over ES'. Long JSON config files can get overwhelming because of the JSON's lack of support for comments. Yes you can use YAML, but it's annoying and confusing to go back and forth between YAML and JSON.

5. If your own app works/thinks in JSON, then without a doubt go for ES because ES thinks in JSON too. Solr merely supports it as an afterthought. ES has a number of nice JSON-related features such as parent-child and nested docs that makes it a very natural fit. Parent-child joins are awkward in Solr, and I don't think there's a Solr equivalent for ES Inner hits.

6. ES doesn't require ZooKeeper for it's 'elastic' features which is nice coz I personally find ZK unpleasant, but as a result, ES does have issues with split-brain scenarios though (google 'elasticsearch split-brain' or see this: Elasticsearch Resiliency Status).

7. Overall from working with clients as a Solr/Elasticsearch consultant, I've found that developer preferences tend to end up along language party lines: if you're a Java/c# developer, you'll be pretty happy with Solr. If you live in Javascript or Ruby, you'll probably love Elasticsearch. If you're on Python or PHP, you'll probably be fine with either.

Something to add about this: ES doesn't have a very elegant Java API IMHO (you'll basically end up using REST because it's less painful), whereas Solrj is very satisfactory and more efficient than Solr's REST API. If you're primarily a Java dev team, do take this into consideration for your sanity. There's no scenario in which constructing JSON in Java is fun/simple, whereas in Python its absolutely pain-free, and believe me, if you have a non-trivial app, your ES json query strings will be works of art.

8. ES doesn't have in-built support for pluggable 'SearchComponents', to use Solr's terminology. SearchComponents are (for me) a pretty indispensable part of Solr for anyone who needs to do anything customized and in-depth with search queries.

Yes of course, in ES you can just implement your own RestHandler, but that's just not the same as being able to plug-into and rewire the way search queries are handled and parsed.

9. Whichever way you go, I highly suggest you choose a client library which is as 'close to the metal' as you can get. Both ES and Solr have *really* simple search and updating search APIs. If a client library introduces an additional DSL layer in attempt to 'simplify', I suggest you think long and hard about using it, as it's likely to complicate matters in the long-run, and make debugging and asking for help on SO more problematic.

In particular, if you're using Rails + Solr, consider using rsolr/rsolr
instead of sunspot/sunspot if you can help it. ActiveRecord is complex code and sufficiently magical. The last thing you want is more magic on top of that.

---

To conclude, ES and Solr have more or less feature-parity and from a feature standpoint, there's rarely one reason to go one way or the other (unless your app lives/breathes JSON). Performance-wise, they are also likely to be quite similar (I'm sure there are exceptions to the rule. ES' relatively new autocomplete implementation, for example, is a pretty dramatic departure from previous Lucene/Solr implementations, and I suspect it produces faster responses at scale).

ES does offer less friction from the get-go and you feel like you have something working much quicker, but I find this to be illusory. Any time gained in this stage is lost when figuring out how to properly configure ES because of poor documentation - an inevitablity when you have a non-trivial application.

Solr encourages you to understand a little more about what you're doing, and the chance of you shooting yourself in the foot is somewhat lower, mainly because you're forced to read and modify the 2 well-documented XML config files in order to have a working search app.

---

EDIT on Nov 2015:

ES has been gradually distinguishing itself from Solr when it comes to data analytics. I think it's fair to attribute this to the immense traction of the ELK stack in the logging, monitoring and analytic space. My guess is that this is where Elastic (the company) gets the majority of its revenue, so it makes perfect sense that ES (the product) reflects this.

We see this manifesting primarily in the form of aggregations, which is a more flexible and nuanced replacement for facets. Read more about aggregations here: Migrating to aggregations

Aggregations have been out for a while now (since 1.4), but with the recently released ES 2.0 comes pipeline aggregations, which let you compute aggregations such as derivatives, moving averages, and series arithmetic on the results of other aggregations. Very cool stuff, and Solr simply doesn't have an equivalent. More on pipeline aggregations here: Out of this world aggregations

If you're currently using or contemplating using Solr in an analytics app, it is worth your while to look into ES aggregation features to see if you need any of it.

Resources

My other sites may be of interest if you're new to Lucene, Solr and Elasticsearch:
The Solr wiki and the Elasticsearch Guide are your friends.

Contribute

If you see any mistakes, or would like to append to the information on this webpage, you can clone the GitHub repo for this site with:

git clone https://github.com/superkelvint/solr-vs-elasticsearch

and submit a pull request.

Popular books related to Search

Discussion

blog comments powered by Disqus

Apache Solr vs Elasticsearch的更多相关文章

【搜索引擎】SOLR VS Elasticsearch(2019技术选型参考)
SOLR是什么 (官方的解释) Solr是基于Apache Lucene构建的流行的.快速的.开源的企业搜索平台. Solr也是高度可靠.可伸缩和容错的,提供分布式索引.复制和负载平衡查询.自动故障转 ...
02 Apache Solr: 概览 Solr在信息系统架构中的位置
概述: Apache Solr是一个用JAVA语言构建在Apache Lucene项目上的开源的企业级搜索平台.主要特性包含:全文搜索.命中高亮.片段式搜索.实时索引.动态集群.数据库集成. ...
01 Apache Solr:提升检索体验为什么是Solr
背景: 最近开发一个大型的仓储管理平台项目,项目的前身是无数个版本的历史悠久的基于CS模式的Windows桌面程序.然后对于每一个客户,我们可能需要为之定制比较个性化的特殊功能.于是,有一个 ...
Solr vs. Elasticsearch谁是开源搜索引擎王者
当前是云计算和数据快速增长的时代,今天的应用程序正以PB级和ZB级的速度生产数据,但人们依然在不停的追求更高更快的性能需求.随着数据的堆积,如何快速有效的搜索这些数据,成为对后端服务的挑战.本文,我们 ...
搜索引擎solr和elasticsearch
刚开始接触搜索引擎,网上收集了一些资料,在这里整理了一下分享给大家. 一.关于搜索引擎搜索引擎(Search Engine)是指根据一定的策略.运用特定的计算机程序从互联网上搜集信息,在对信息进行组 ...
solr与Elasticsearch对比
搜索引擎:Solr与Elasticsearch比较分析 Elasticsearch是一个实时的分布式搜索和分析引擎.它可以帮助你用前所未有的速度去处理大规模数据. 它可以用于全文搜索,结构化搜索以及分 ...
开源搜素引擎：Lucene、Solr、Elasticsearch、Sphinx优劣势比较
https://blog.csdn.net/belalds/article/details/82667692 开源搜索引擎分类 1.Lucene系搜索引擎,java开发,包括: Lucene Solr ...
转 Solr vs. Elasticsearch谁是开源搜索引擎王者
转 https://www.cnblogs.com/xiaoqi/p/6545314.html Solr vs. Elasticsearch谁是开源搜索引擎王者当前是云计算和数据快速增长的时代,今天 ...
solr和ElasticSearch(ES)的区别?
Solr2004年诞生 ElasticSearch 2010年诞生 ES更新 ElasticSearch简介: ElasticSearch是一个实时的分布式的搜索引擎和分析引擎.它可以帮助你用前所未有 ...

随机推荐

[Mac] mac linux 多线程下载利器 axel
> 之前做过一些文件下载的统计,发现谷歌浏览器chrome和火狐firefox, 一般都是单线程的下载文件,360浏览器却是多线程的下载. 现在切换到了mac上,发现没有360哪个浏览器,就像 ...
Android进阶(二十二)设置TextView文字水平垂直居中
设置TextView文字水平垂直居中有2种方法可以设置TextView文字居中: 一:在xml文件设置:android:gravity="center" 二:在程序中设置:m_T ...
Linux下利用ssh远程文件传输传输命令 scp
在linux下一般用scp这个命令来通过ssh传输文件. 一.scp是什么? scp是secure copy的简写,用于在Linux下进行远程拷贝文件的命令,和它类似的命令有cp,不过cp只是在本机进 ...
学生信息管理小系统（以XML为存储方式）
为了更好地应用XML,就写了这个小项目. 下面是我的项目的目录结构项目思路 dao是Date Access Object 数据访问层,主要是负责操作数据 domain是实体层,类似于bean层,放置 ...
最简单的基于FFMPEG+SDL的视频播放器 ver2 （采用SDL2.0）
===================================================== 最简单的基于FFmpeg的视频播放器系列文章列表: 100行代码实现最简单的基于FFMPEG ...
JAVA之旅（二十一）——泛型的概述以及使用，泛型类，泛型方法，静态泛型方法，泛型接口，泛型限定，通配符
JAVA之旅(二十一)--泛型的概述以及使用,泛型类,泛型方法,静态泛型方法,泛型接口,泛型限定,通配符不知不觉JAVA之旅已经写到21篇了,不得不感叹当初自己坚持要重学一遍JAVA的信念,中途也算 ...
谈谈Ext JS的组件——布局的使用方法续一
盒子布局盒子布局主要作用是以水平(Ext.layout.container.HBox)或垂直方式(Ext.layout.container.VBox)来划分容器区域.这也是比较常有的布局方式. 使用 ...
NDK工具开发Jni,Android studio jni开发
NDK工具开发JNI 对于JNI的作用,我这边就不详细说明了,百度google 有很多这样的介绍,这边着重详解AS使用NDK工具开发,调C的流程. 1,创建工程 2,创建native方法 //该类的路 ...
网站开发进阶(二十九)HTML特殊转义字符
HTML特殊转义字符参考文献 http://tool.oschina.net/commons?type=2 美文美图
tomcat整合apache
历时4个多小时,终于把tomcat与apache整合起来了. 中间出了各种各样的问题,现记录一下,也希望能对后来者有点帮助. 背景 apache与tomcat的区别联系大家都知道: tomcat能处理 ...

Apache Solr vs Elasticsearch

Apache Solr vs Elasticsearch

The Feature Smackdown

API

Infrastructure

Indexing

Searching

Customizability

Distributed

Misc

Thoughts...

Resources

Contribute

Popular books related to Search

Discussion

Apache Solr vs Elasticsearch的更多相关文章

随机推荐

热门专题