Research on testing method of solid-state disk used in intelligent security field

This paper summarizes the specific requirement of SSD in intelligent security field and comparison of related lifespan and endurance test method.


Introduction
SSD, or solid state drive, is a type of storage device that uses semiconductor flash memory (NAND Flash) as its medium. Unlike traditional mechanical hard disks, solid state disks store data in semiconductors and use sheer electronic circuits without any mechanical parts. [1][2][3][4][5][6][7][8][9] This determines that solid state disks are very different from mechanical hard disks in terms of performance, power consumption, reliability and so on.
The most remarkable feature of SSD is that it is made of an array of NAND Flash, which works individually as electronic memory chip. Its main components are the controller and the memory chip. The internal structure of SSD is simpler than that of traditional mechanical hard disks. In detail, SSD hardware includes following components: controller chip, flash memory, DRAM, PCB (power chip, resistor, capacitor, etc.), communication interface (SATA, SAS, PCIE); SSD-related software includes firmware, which schedules the reading and writing of data from the interface end to the media end, as well as algorithms for managing the lifespan and reliability embedded in the core of SSD, and other SSD-related inner algorithms.

Analysis of the requirements and technical characteristics of storage products applied in the intelligent security field
With the development of information technology and urgent demand for high speed data reading and writing, SSD has been highly sought after by users because of its physical characteristics.
Especially in public transportation, security monitoring and other applications that need to store audio and video information with high speed and large amount of data, SSD has obvious technical advantages because of its stable, high-speed, safe and reliable features.

Technique roadmap
The core storage medium of SSD is flash memory. The performance of flash memory directly affects the performance and integrity of data storage. Flash memory is generally divided into SLC, MLC, TLC and QLC, which classffied as storage unit by the number of bits. TLC (Trinary-Level Cell): A single storage cell stores 3 bit of data. TLC read and write slowly, has short lifespan (500 ~ 1500 Erase Cycle), the cost is cheap.
After years of development, flash memory has experienced the transformation from 2D to 3D stacking process. The fundamental goal is to design and produce more bits per unit area of silicon wafer (mm2), and lower the cost per GB.

Performance Indicator
Performance indicators for SSD typically include IOPS, which reflects random read and write characteristics; throughput capacity, which reflects sequential read and write performance, and response time/delay, which reflects latency. Detailed analysis as below.
IOPS indicates the number of IO requests per second completed by the device, typically the number of responses to small blocks of data read and write commands. SSD for applications such as video surveillance have different performance requirements compare to traditional network storage. From the performance perspective, the bandwidth demand of single channel data stream is not high, most disk arrays can easily satisfy the need. For the large-scale multiple data streams case, network bandwidth is not major concern, due to multi-path concurrency converts sequential reading and writing to random reading and writing, the performance requirement of the disk subsystem is demanding. This requires SSD data blocks to reach 4KB or above. The larger the IOPS parameter, the better the SSD instantaneous random reading and writing performance will be.
Throughput capacity is equvalent to bandwidth, which represents the number of data transfers per second by read and write commands. SSD used in video surveillance has the characteristics of a large number of monitoring nodes and the coexistence of multiple channels, data stream rate increased from 500kbps to 2Mbps, large-scale multi-path stream concurrency has become a common scene in video surveillance, this requires SSD to have enough throughput, the current bandwidth based on SATA 3.0 communication protocol framework can achieve 6Gbps of data transmission.
Response time, or latency namely, represents response time for each command from the time issued to the time it receives a status reply. The latency has two terms: average delay and maximum delay. The shorter the response time, the better the performance of SSD. Based on video surveillance workload write scenarios, the maximum response time of SSD should be under 500ms, as well as the maximum read response time is required to be within 100ms, this means that in a test cycle, all commands executed are no more than 500 ms, otherwise there will be obvious lag, seriously affecting the user experience.
Lifespan is also an important performance indicator for SSD. There are two main ways to measure SSD life, which is "data writing per day (DWPD)" or "Total writing bits (TWB)". Thereinto, DWPD represents the number of times a user can fill up the disk per day over the life of the SSD; TWB represents the total number of bytes that can be written over the lifetime of the SSD. SSD used in video surveillance needs a relatively long service life due to the requirements of storage time and stream rate. For a 1TB SSD, for example, the disk needs to fill up in less than a week, that is, DWPD≥0.143, and the continuous service life should be more than 5 years. SSD data reliability indicators including UBER and MTBF. Uber (Uncorrectable Bit Error Rate) is a measure of data corruption rate. It means the ratio of the number of data errors per bit read to the total number of data errors per bit read after any particular error correction mechanism has been applied. As a storage device used in specific scenario, the most important concern of users of SSD for video surveillance is the reading accuracy after data is saved. Uber describes the probability of data errors and provides users with intuitive probability data to indicate the possibility of abnormal data, generally, UBER is required to limit within 10E-15. MTBF reflects the continuous operation time without failure of the product, and also the reliability of SSD. The main consideration of MTBF is the failure rate of each components in the product. Because the failure rate of the components will be different in different environments and under different use conditions, just like a SSD in the laboratory and outdoor monitoring platform, the reliability is bound to be different; another example is that the failure rate of a capacitor with a rated voltage of 6V is certainly different when the actual voltage is 25V and 5V. For SSD, the JESD218A specification defines a way to measure the daily read and write volume of SSD, but what kind of workload to be deployed on the MTBF test still need to be considered. For instance, a SSD uses normal workload that writes 20GB of data per day for 5 years, and the SSD's MTBF can reach 1 million hours on this workload; but if the workload is reduced to 10GB per day, MTBF will become 2 million hours; similarly, 5GB per day, MTBF will be 4 million hours.

Lifespan test analysis
As mentioned above, the lifespan of SSD is one of the most important performance metrics, which can be measured by DWPD and TBW. From the perspective of economic benefit, the larger the DWPD, correspondingly the higher the csot of SSD. Users need to determine which SSD is optimized for scenario and how long the DWPD of the SSD they use will be, so as to balance performance and economy.
Generally speaking, SSD needs to be layered according to the lifetime and data usage frequency, and then technically identifies the data according to the heat and cold property, and puts it into the corresponding layer. Taking typical scenarios of video surveillance as an example, online data processing has a large amount of written data, that is, hot data, which has higher performance requirements. Thus can be stored in T1-WISSD layer, which has higher individual disk cost and lower total volume; the mediam layer mainly stores reading data, namely warm data, which also has high performance demand, therefore can be stored in T2-RISSD layer; most of data in bottom layer are cold data and seldom be read and written, so ordinary SSD can easily meet the need. Overall, video surveillance online data processing uses about 40% of the SSD, which is the application type with high demand for SSD. Data layer structure and SSD application are shown in the following figure.
Where: Y: nomial usage time, eg., 5years In general, a 200GB SSD has a lifespan of 3600TB over a five-year period, that is, it can be written everyday on average 3600TB/(5*365)=1972GB, which equal to writing 10 pieces disk every day, namely, 10DWPD.

Data reliability test analysis
The storage medium of SSD is flash memory, which has a natural data bit turnover rate. This turnover rate is mainly caused by the errors of P/E Cycle, Read Disturb, Program Disturb, and Data Retention, etc. Although SSD controller chip and firmware design will correct the error data, the error data still can not be corrected under certain conditions, so it is necessary to reflect the probability that the data error code cannot be corrected by the unrepairable error bit rate UBER. After the original data bit of flash memory is flipped and the BCH code is protected by ECC, it can be calculated and converted to UBER. The following figure shows UBER converted from ECC coding length, protection strength, bit rate, etc.

Figure 2. Relationship between UBER and Protection Strength
MTBF reflects the continuous running time of a product without failure, and it is also a reliability indicator of SSD. There are some standards for MTBF calculation, and the most commonly used standards are MIL-HDBK-217, GJB/Z 299B and Bellcore, which are used for military products and civilian products respectively. Among them, MIL-HDBK-217 is proposed by the Reliability Analysis Research Institute of U.S. Department of Defense and Rome Laboratory. It has become an industry standard and is specifically used for the calculation of MTBF of military products, GJB/Z 299B is a military standard in China and Bellcore was developed by AT&T Bell Lab, which has become the industry standard for MTBF computing in civilian products.

Conclusion
This paper introduces the test basis of core performance index of solid state disk applied in video surveillance field, and discusses specific application scenarios, which is helpful for domestic SSD products test and certificate.