Character Sets, Collation, Unicode :: utf8_unicode_ci vs utf8_general_ci
w
Hi,
You can check and compare sort orders provided by these two collations here:
http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html
http://www.collation-charts.org/mysql60/mysql604.utf8_unicode_ci.european.html
utf8_general_ci is a very simple collation. What it does - it just
- removes all accents
- then converts to upper case
and uses the code of this sort of "base letter" result letter to compare.
For example, these Latin letters: ÀÁÅåāă (and all other Latin letters "a"
with any accents and in any cases) are all compared as equal to "A".
utf8_unicode_ci uses the default Unicode collation element table (DUCET).
The main differences are:
1. utf8_unicode_ci supports so called expansions and ligatures, for example:
German letter ß (U+00DF LETTER SHARP S) is sorted near "ss"
Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".
utf8_general_ci does not support expansions/ligatures, it sorts
all these letters as single characters, and sometimes in a wrong order.
2. utf8_unicode_ci is *generally* more accurate for all scripts.
For example, on Cyrillic block:
utf8_unicode_ci is fine for all these languages:
Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian.
While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic.
Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian
are sorted not well.
The disadvantage of utf8_unicode_ci is that it is a little bit
slower than utf8_general_ci.
So when you need better sorting order - use utf8_unicode_ci,
and when you utterly interested in performance - use utf8_general_ci.
Character Sets, Collation, Unicode :: utf8_unicode_ci vs utf8_general_ci的更多相关文章
- 3个问题:MySQL 中 character set 与 collation 的理解;utf8_general_ci 与 utf8_unicode_ci 区别;uft8mb4 默认collation:utf8mb4_0900_ai_ci 的含义
MySQL 中 character set 与 collation 的理解 出处:https://www.cnblogs.com/EasonJim/p/8128196.html 推荐: 编码使用 uf ...
- mysql补充(1)校对集utf8_unicode_ci与utf8_general_ci
创建数据库并设置编码utf-8 多语言(补充1 2) create database mydb default character set utf8 collate utf8_general_ci; ...
- Firebird Character Sets and Collations
Firebird Character Sets and Collations Every CHAR or VARCHAR field can (or, better: must) have a cha ...
- 【转】Mysql中的排序规则utf8_unicode_ci、utf8_general_ci的区别总结
Mysql中utf8_general_ci与utf8_unicode_ci有什么区别呢?在编程语言中,通常用unicode对中文字符做处理,防止出现乱码,那么在MySQL里,为什么大家都使用utf8_ ...
- Mysql中的排序规则utf8_unicode_ci、utf8_general_ci的区别总结
Mysql中utf8_general_ci与utf8_unicode_ci有什么区别呢?在编程语言中,通常用unicode对中文字符做处理,防止出现乱码,那么在MySQL里,为什么大家都使用utf8_ ...
- 10.1.5 Connection Character Sets and Collations
10.1.5 Connection Character Sets and Collations Several character set and collation system variables ...
- utf8_unicode_ci与utf8_general_ci
下面摘录一下Mysql 5.1中文手册中关于utf8_unicode_ci与utf8_general_ci的说明: 当前,utf8_unicode_ci校对规则仅部分支持Unicode校对规则算法.一 ...
- Mysql中的排序规则utf8_unicode_ci、utf8_general_ci总结
Mysql中utf8_general_ci与utf8_unicode_ci有什么区别呢?在编程语言中,通常用unicode对中文字符做处理,防止出现乱码,那么在MySQL里,为什么大家都使用utf8_ ...
- utf8_unicode_ci、utf8_general_ci区别
摘录一下Mysql 5.1中文手册中关于utf8_unicode_ci与utf8_general_ci的说明: 当前,utf8_unicode_ci校对规则仅部分支持Unicode校对规则算法.一 ...
随机推荐
- DataProtectionConfigurationProvider加密web.config文件
web.config 文件中经常会包含一些敏感信息,最常见的就是数据库连接字符串了,为了防止该信息泄漏,最好是将相关内容加密. Aspnet_regiis.exe命令已经提供了加密配置文件的方法,系统 ...
- 更新Bash路径的缓存
---恢复内容开始--- 1.登陆一个新的vps时候,发现git的版本是1.8的,太久了,于是就源码安装了新的版本2.4. 2.老版本在/usr/bin/git,新版本安装的/usr/local/bi ...
- 蓝桥杯第五届B组 李白打酒
外面的小雨下着,风吹着,在实验室敲着代码 时隔一年之后再次做这道题,依然神一样的回溯出来: 标题:李白打酒 话说大诗人李白,一生好饮.幸好他从不开车. 一天,他提着酒壶,从家里出来,酒壶中有酒2斗.他 ...
- 完工尚需绩效指数 TCPI
转自:http://www.cnblogs.com/allenblogs/archive/2010/12/09/1901404.html TCPI To Complete Performance In ...
- 关于dbutils中QueryRunner看批量删除语句batch
//批量删除 public void delBooks(String[] ids) throws SQLException { QueryRunner qr = new QueryRunner(C3P ...
- python练习题4-判断日期是一年的第几天
题目:输入某年某月某日,判断这一天是这一年的第几天? 程序分析:以3月5日为例,应该先把前两个月的加起来,然后再加上5天即本年的第几天,特殊情况,闰年且输入月份大于2时需考虑多加一天: 程序源代码: ...
- SIGBUS 和 SIGSEGV
一.导致SIGSEGV 1.试图对仅仅读映射区域进行写操作 . 2.訪问的内存已经被释放,也就是已经不存在或者越界. 3.官方说法是: SIGSEGV --- Segment Fault. ...
- 使用randA()生成randB()
randA()表示可以随机生成1--A的整数 rand7()生成rand5() int Rand5(){ int x = ~(1<<31); // max int while(x > ...
- hdu6070 Dirt Ratio 二分+线段树
/** 题目:hdu6070 Dirt Ratio 链接:http://acm.hdu.edu.cn/showproblem.php?pid=6070 题意:给定n个数,求1.0*x/y最小是多少.x ...
- 由「Metaspace容量不足触发CMS GC」从而引发的思考
https://mp.weixin.qq.com/s/1VP7l9iuId_ViP1Z_vCA-w 某天早上,毛老师在群里问「cat 上怎么看 gc」. 好好的一个群 看到有 GC 的问题,立马做出小 ...