https://www.brentozar.com/archive/2013/05/monitoring-ssd-performance/

Everyone wants to make sure they’re getting the best performance out of their solid state storage. If you’re like a lot of people, you want to make sure you’re getting what you paid for, but how do you know for sure that the drive is performing well?

Watch that Average

The first way to monitor performance it to use some perfmon counters. Although there are a lot of perfmon counters that seem helpful, we’re only going to look at two:

  • PhysicalDisk\Avg. Disk Sec/Read
  • PhysicalDisk\Avg. Disk Sec/Write

As soon as you get a solid state drive in your server, start monitoring these numbers. Over time you’ll be able to trend performance over time and watch for poor performance. When the SSDs pass out of your valid performance guidelines (and they probably will), you can pull them out of the storage one at a time and reformat them before adding them back into the RAID array. Note it isn’t necessary to do this

Although it’s risky, this approach can work well for detecting performance problems while they’re happening. The downside is that we don’t have any idea that the drives are about to fail – we can only observe the side effects of writing to the SSDs. As SSD health gets worse, this average is going to trend upwards. Of course, you could also be doing something incredibly dumb with your hardware, so we can’t really use average performance as a potential indicator of impending hardware failure.

Which SMART Attributes Work for SSDs?

What if we could watch SSD wear in real time? It turns out that we’ve been able to do this for a while. Many vendors offer SMART status codes to return detailed information about the status of the drive. Rotational drives can tell you how hot the drive is, provide bad sector counts, and a host of other information about drive health.

SSDs are opaque, right? Think again.

SSD vendors started putting information in SMART counters to give users a better idea of SSD performance, wear, and overall health. Although the SMART counters will vary from vendor to vendor (based on the disk controller), Intel publish documentation on the counters available with their SSDs – check out the “SMART Attributes” section of theIntel 910 documentation. These are pretty esoteric documents, you wouldn’t want to have to parse that information yourself. Thankfully, there are easier ways to get to this information; we’ll get to that in a minute.

Which SMART Attributes Should I Watch?

There are a few things to watch in the SMART status of your SSDS:

  • Write Amplification
  • Media Wear-out Indicator
  • Available Reserved Space

Write Amplification, roughly, is a measure of the ratio of writes issued by your OS compared to the number of writes performed by the SSD. A lower score is better – this can even drop below 1 when the SSD is able to compress your data. Although the Write Amplification doesn’t help you monitor drive health directly, it provides a view of how your use pattern will change the SSD’s lifespan.

The Media Wear-Out Indicator gives us a scale from 100 to 0 of the remaining flash memory life. This starts at 100 and drifts toward 0. It’s important to note that your drive will keep functioning after Media Wear-Out Indicator reports 0. This is, however, a good value to watch.

Available Reserved Space measures the original spare capacity in the drive. SSD vendors provide additional storage capacity to make sure wear leveling and garbage collection can happen appropriately. Like Media Wear-Out Indicator, this starts at 100 and will drift toward 0 over time.

It’s worth noting that each drive can supply additional information. The Intel 910 also monitors battery backup failure and provides two reserved space monitors – one at 10% reserved space available and a second at 1% reserved space available. If you’re going to monitor the SMART attributes of your SSDs, it’s worth doing a quick search to find out what your SSD controllers support.

How do I Watch the SMART Attributes of my SSD?

This is where things could get ugly. Thankfully, we’ve got smartmontools. There are two pieces of smartmontools and we’re only interested in one: smartctl. Smartctl is a utility to view the SMART attributes of a drive. On my (OS X) laptop, I can run smartctl -a disk1 to view the SMART attributes of the drive. On Windows you can either use the drive letter for a basic disk, like this:

smartctl -a X:

Things get trickier, though, for certain PCI-Express SSDs. Many of these drives, the Intel 910 included, present one physical disk per controller on the PCI-Express card. In the case of the Intel 910, there are four. In these scenarios you’ll need to look at each controller’s storage individually. Even if you have configured a larger storage volume using Windows RAID, you can still read the SMART attributes by looking at the physical devices underneath the logical disk.

The first step is to get a list of physical devices using WMI:

wmic diskdrive list brief

The physical device name will be in the DeviceID column. Once you have the physical device name, you can view the SMART attributes with smartctl like this:

  1. smartctl -a /dev/pd0 -q noserial

Run against my virtual machine, it looks like this:

  1. C:\Windows\system32> smartctl -a /dev/pd0 -q noserial
  2. smartctl 6.1 2013-03-16 r3800 [x86_64-w64-mingw32-win8] (sf-6.1-1)
  3. Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
  4. === START OF INFORMATION SECTION ===
  5. Device Model: Windows 8-0 SSD
  6. Serial Number: 0RETRD4FE6AMF823QE7R
  7. Firmware Version: F.2FKG1C
  8. User Capacity: 68,719,476,736 bytes [68.7 GB]
  9. Sector Size: 512 bytes logical/physical
  10. Rotation Rate: Solid State Device
  11. Device is: Not in smartctl database [for details use: -P showall]
  12. ATA Version is: ATA8-ACS, ATA/ATAPI-5 T13/1321D revision 1
  13. SATA Version is: SATA 2.6, 3.0 Gb/s
  14. Local Time is: Sat Apr 27 08:35:03 2013 PDT
  15. SMART support is: Unavailable - device lacks SMART capability.

Unsurprisingly, my virtual drive doesn’t display much information. But a real drive looks something like this:

Intel 910 smartctl output

Holy cow, that’s a lot of information. The Intel 910 clearly has a lot going on. There are two important criteria to watch, simply because they can mean the difference between a successful warranty claim and an unsuccessful one

  • SS Media used endurance indicator
  • Current Drive Temperature

The Intel 910 actually provides more information via SMART, but to get to it, we have to use Intel’s command line tools. By using the included isdct.exe, we can get some very helpful information about battery backup failure (yup, your SSD is protected by a battery), reserve space in the SSD, and the drive wear indicator. Battery backup failure is a simple boolean value – 0 for working and 1 for failure. The other numbers are stored internally as a hexadecimal number, but the isdct.exe program translates them from hex to decimal. These numbers start at zero and work toward 100.

If you’re enterprising, you can take a look at the vendor specification and figure out how to read this data in the SMART payload. Or, if you’re truly lazy, you can parse the text coming out of smartcl or isdct (or the appropriate vendor tool) and use that to fuel your reports. Some monitoring packages even include all SMART counters by default.

The Bad News

The bad news is that if you’re using a hardware RAID controller, you may not be able to see any of the SMART attributes of your SSDs. If you can’t get accurate readings from the drives and you’ll have to resort to using the Performance Monitor counters I mentioned at the beginning of the article. RAID controllers that support smartmontools are listed in the smartctl documentation.

Special thanks go out to a helpful friend who let us abuse their QA Intel 910 cards for a little while in order to get these screenshots.

Monitoring SSD Performance::www.brentozar.com的更多相关文章

  1. 理解 OpenStack Swift (3):监控和一些影响性能的因素 [Monitoring and Performance]

    本系列文章着重学习和研究OpenStack Swift,包括环境搭建.原理.架构.监控和性能等. (1)OpenStack + 三节点Swift 集群+ HAProxy + UCARP 安装和配置 ( ...

  2. Measuring & Optimizing I/O Performance

    By Ilya Grigorik on June 23, 2009 Measuring and optimizing IO performance is somewhat of a black art ...

  3. Monitoring and Tuning the Linux Networking Stack: Receiving Data

    http://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-networking-stack-receiving-data/ ...

  4. Monitor and diagnose performance in Java SE 6--转载

    Java SE 6 provides an in-depth focus on performance, offering expanded tools for managing and monito ...

  5. [Windows Azure] Monitoring SQL Database Using Dynamic Management Views

    Monitoring Windows Azure SQL Database Using Dynamic Management Views 5 out of 7 rated this helpful - ...

  6. MySQL Performance Tuning: Tips, Scripts and Tools

    With MySQL, common configuration mistakes can cause serious performance problems. In fact, if you mi ...

  7. Performance Co-Pilot

    Install Performance Co-Pilot 提前安装依赖 [root@iZrj97j6t7ih9hgz1me35hZ ~]# cat install.sh yum install -y ...

  8. PatentTips - Systems, methods, and devices for dynamic resource monitoring and allocation in a cluster system

    BACKGROUND  1. Field  The embodiments of the disclosure generally relate to computer clusters, and m ...

  9. SSD: Single Shot MultiBox Detector论文阅读摘要

    论文链接: https://arxiv.org/pdf/1512.02325.pdf 代码下载: https://github.com/weiliu89/caffe/tree/ssd Abstract ...

随机推荐

  1. tcp 在调用connect失败后要不要重新socket

    tcp 在调用connect失败后要不要重新socket http://blog.csdn.net/occupy8/article/details/48253251

  2. 【bzoj1068】【SCOI2007】压缩

    一道区间dp f[i][j][0/1]表示[i,j]区间是否加入M,并且之前一位有M的最小长度 可以理解为在第一位之前有一个M 那么就可以转移了. #include<bits/stdc++.h& ...

  3. C 基础框架开发

    引言 有的人真的是天命所归 延安时期炸弹 投到他院子都 没炸. 有些事无法改变 是命! 我们也快'老'了, 常回家看看. 前言 扯淡结束了,今天分享的可能有点多,都很简单,但是糅合在一起就是有点复杂. ...

  4. 获取file中字段,写入到TXT文件中

    一下代码省略了很多,哈哈哈 a.txt文件 uid,type,pointx,pointy,name1,9,911233763,543857286,区间测速起点3,9,906371086,5453354 ...

  5. IntelJ IDEA 进行Java Web开发+热部署+一些开发上的问题

    基本上像放弃MyEclipse或者Eclipse了,因为IDEA现在也有对应的版本旗舰版和社区版了,而且使用更贴心,更给力,为什么还要选一个难用的要死的东西呢? 最近要开发一个Java Web项目,所 ...

  6. Disruptor 线程间共享数据无需竞争

    队列的作用是缓冲 缓冲到 队列的空间里.. 线程间共享数据无需竞争 原文 地址  作者  Trisha   译者:李同杰 LMAX Disruptor 是一个开源的并发框架,并获得2011 Duke’ ...

  7. 微信小程序-二维码汇总

    小程序二维码在生活中的应用场景很多,比如营销类一物一码,扫码开门,扫码付款等...小程序二维码分两种? 1.普通链接二维码 即跟普通的网站链接生成的二维码是一个意思,这种二维码的局限性如下: 对于普通 ...

  8. 关于k8s里的service互访,有说法

    昨天,测试了一个项目的接入.明白了以下几个坑: 1,traefik有可能有性能问题,如果daemonset安装,可重建.也需要通过8580端口查看性能. 2,集群中的service访问自己时,好像性能 ...

  9. [BZOJ5305][Haoi2018]苹果树 组合数

    题目描述 小 C 在自己家的花园里种了一棵苹果树, 树上每个结点都有恰好两个分支. 经过细心的观察, 小 C 发现每一天这棵树都会生长出一个新的结点. 第一天的时候, 果树会长出一个根结点, 以后每一 ...

  10. 如何使用 Java 对 List 中每个对象元素按时间顺序进行排序

    如何使用 Java 对 List 中每个对象元素按时间顺序进行排序 Java 实现 import java.text.SimpleDateFormat; import java.util.ArrayL ...