Yesterday I had fun time repairing 1.5Tb ext3 partition, containing many millions of files. Of course it should have never happened – this was decent PowerEdge 2850 box with RAID volume, ECC memory and reliable CentOS 4.4 distribution but still it did. We had “journal failed” message in kernel log and filesystem needed to be checked and repaired even though it is journaling file system which should not need checks in normal use, even in case of power failures. Checking and repairing took many hours especially as automatic check on boot failed and had to be manually restarted.

Same may happen with Innodb tables. They are designed to never crash, surviving power failures and even partial page writes but still they can get corrupted because of MySQL bugs, OS Bugs or hardware bugs, misconfiguration or failures.

Sometimes
corruption kind be mild, so ALTER TABLE to rebuild the table fixes it.
Sometimes table needs to be dropped and recovered from backup but in
certain cases you may need to reimport whole database – if corruption is
happens to be in undo tablespace or log files.

So do not forget
to have your recovery plan this kind failures. This is one thing you
better to have backups for. Backups however take time to restore,
especially if you do point in time recovery using binary log to get to
actual database state.

The good practice to approach this kind of
problem is first to have enough redundancy. I always assume any
component, such as piece of hardware or software can fail, even if this
piece of hardware has some internal redundancy by itself, such as RAID
or SAN solutions.

If you can’t afford full redundancy for
everything (and probably even if you do) the good idea is to keep your
objects smaller so if you need to do any maintenance with them it will
take less times. Smaller RAID volumes would typically rebuild faster,
smaller database size per system (yet another reason to like medium
end commodity hardware) makes it faster to recover, smaller tables
allow per table backup and recovery to happen faster.

With MySQL
and blocking ALTER TABLE there is yet another reason to keep tables
small, so you do not have to use complicated scenarios to do simple
things. Assume for example you need to add extra column to 500GB
Innodb table. It will probably take long hours or even days for ALTER
TABLE to complete and about 500GB of temporary space will be required
which you simply might not have. You can of course use MASTER-MASTER
replication and run statement on one server, switch role and then do it
on other, but if alter table takes several days do you really can
afford having no box to fall back to for such a long time ?

On
other hand if you would have 500 of 1GB tables it would be very easy –
you can simply move small pieces of data offline for a minute and alter
them live. Also all process will be much faster this way as whole
indexes will well fit in memory for such small tables.

Not to mention splitting 500 tables to several servers will likely be easy than splitting one big one.

There
are bunch of complications with many tables of course, it is not always
easy to partition your data appropriately, also code gets complicated
but for many applications it is worth the trouble

At NNSEEK
for example we have data split at 256 groups of tables. Current data
size is small enough so even single table would not be big problem but
it is much easier to write your code to handle split from very beginning
rather than try to add in later on when there are 100 helper scripts
written etc.

For the same reason I would recommend setting up
multiple virtual servers even if you work with physical one in the
beginning. Different accounts with different permissions will be good
enough. Doing so will ensure you will not have problems once you will
really need to scale to multiple servers.

 
参考:
http://www.mysqlperformanceblog.com/2006/10/08/small-things-are-better/

随机推荐

  1. java 获取图片大小(尺寸)

    1,获取本地图片大小(尺寸) File picture=new File(strSrc);BufferedImage sourceImg=ImageIO.read(new FileInputStrea ...

  2. atlas+mysql主主集群实现读写分离

     atlas+mysql主主集群实现读写分离 前言: 目前线上系统数据库采用的是主主架构.其中一台主仅在故障时切换使用,(仅单台服务器对外提供服务,当一台出现问题,切换至另一台).该结构很难支撑较大并 ...

  3. 掘金 Android 文章精选合集

    掘金 Android 文章精选合集 掘金官方 关注 2017.07.10 16:42* 字数 175276 阅读 50053评论 13喜欢 669 用两张图告诉你,为什么你的 App 会卡顿? - A ...

  4. 【多线程】 Task

    [多线程] Task 一. 常用方法: 1. ContinueWith : 当前 Task 完成后, 执行传入的 Task 2. Delay : 创建一个等待的 Task,只有在调用 Wait 方法时 ...

  5. js面向对象过程

    var a = new  b(); 等价于 var a={}; a=b.prototype; b.call(a);

  6. 开发react的一些记录

    1.keyboard事件返回的对象SyntheticKeyboardEvent全部是null 解决方法:SyntheticKeyboardEvent的type,which,timeStamp可以得到你 ...

  7. tomcat8编码设置和gc异常解决

    用startup.bat启动 编码解决: 用编辑器打开catalina.bat文件找到set "JAVA_OPTS=%JAVA_OPTS% %JSSE_OPTS% " 更改为 se ...

  8. Mysql性能优化一:SQL语句性能优化

    这里总结了52条对sql的查询优化,下面详细来看看,希望能帮助到你 1, 对查询进行优化,应尽量避免全表扫描,首先应考虑在 where 及 order by 涉及的列上建立索引. 2,应尽量避免在 w ...

  9. oracle 数据库字段名与实体类字段名称不匹配的处理方法

    之前公司一直都使用sql server 即使数据库字段名称与实体类名称不相同 可以使用诸如: select id as userId from tb_user 这种写法,可换到了oracle 之后坑爹 ...

  10. cartographer 安装修改

    装置:VLP16+IMU+单板机 目的:利用传感器数据,实现real time 的建模 结果:失败,但之前的步骤都正常,出问题的地方可能是imu出错. 稍后附上 launch文件,lua文件,urdf ...