AI For System Papers Index

发布于 2020-07-14 | 标签: 存储、 AI | 3分钟 | 443字数 | 浏览量::

Index for AI-4-Systems researches.

Some aspects in intelligent storage field.

Continuous update.

AI For Systems

Disk Failure Prediction

FAST20 - Making Disk Failure Predictions SMARTer!
- Slides
- We present analysis and findings from one of the largest disk failure prediction studies covering a total of 380,000 hard drives over a period of two months across 64 sites of a large leading data center operator.
- Our proposed machine learning based models predict disk failures with 0.95 F-measure and 0.95 Matthews correlation coefficient (MCC) for 10-days prediction horizon on average.
- Findings:
  - SMART attributes do not always have the strong predictive capability at long prediction horizon windows for all disks
  - The value of performance metrics (related to capacity, throughput, etc.)
    - Exhibit more variations before the actual drive failure
    - Show distinguishable behavior from healthy disks
  - Prediction can be further improved by incorporating the location information. (site, room, rack, and server)
    - Disks in close spatial neighborhood
      - Affected by the same environmental factors (such as humidity and temperature)
      - Experience similar vibration level (known to affect the reliability of disks)
  - Data:
  - ML Models:
    - Bayes classifier (Bayes)
    - Random forest (RF)
    - Gradient boosted decision trees (GBDT)
    - Long short-term memory network (LSTM)
    - Convolutional neural network with long short-term memory (CNN-LSTM)
  - Conclusion:
    - SPL group performs the best across all ML models (performance and location features improve the effectiveness of prediction)
    - The improvement of adding location info is limited and pronounced only in the presence of performance features
    - CNN-LSTM performs close to the best in all situations
    - Trade-off between models with respect to different availability of feature sets
  - Prediction Horizon:

Storage System Tuning

SIGMOD19 - An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning

Optimize IO Behavior

DATE20 - A Machine Learning Based Write Policy for SSD Cache in Cloud Block Storage
- Source Code
- Based on our analysis on a typical cloud block storage system, approximately 47.09% writes are write-only, i.e., writes to the blocks which have not been read during a certain time window
- We propose an ML-WP, Machine Learning Based Write Policy, which reduces write traffic to SSDs by avoiding writing write-only data.
- Main challenge in this approach is to identify write-only data in a real-time manner. Last choost Naive Bayes algorithm.
- Appropriate features:
  - last access timestamp
  - last address information
  - average write size
  - Big request ratio (> 64KB)
  - Small request ratio (< 8KB>)
- Experimental results show that, compared with the industry widely deployed writeback policy, ML-WP decreases write traffic to SSD cache by 41.52%, while improving the hit ratio by 2.61% and reducing the average read latency by 37.52%.

AI For System Papers Index

AI For Systems

Disk Failure Prediction

Storage System Tuning

Optimize IO Behavior

感谢您的支持，我会继续努力的!