%include "default.mgp" %default 1 bgrad 4 4 4 76 1 "black" "blue" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page %nodefault %center, size 7, font "standard", fore "white", vgap 20 Disk And Storage Changes %center, size 4, font "standard", fore "white", vgap 20 Alan Cox %page In The Beginning MFM/RLL controllers Interleave Register command set Sector at a time/polled 250-500K/sec C/H/S addressing - requires OS mapping 30-100MB, 4MB RAM, 1K OS block sizes %page In The Beginning Early ISA SCSI Dumb adapters Smart adapters Scatter Gather limits ISA 16Mb limit Disconnect %page Early IDE IDE Processor driven But we did finally get IRQs Forward MFM/RLL controller access cycles down cable to disk Saved electronics/money Moved the intelligence %page ATA Standards ATA Rapid expansion in disk size Logical Block Addressing Rename registers to LBA bits Initial DMA support VLB makes faster modes useful No programming standards %page ATA on PCI PCI SFF8018 Unifies basic IDE on PCI functionality Unifies basic DMA with BIOS mode setting From Disk To Device Functionality evolving from read/write I/O device Door locking Feature reporting Identify %page Limits And Usage Changes Limits Tied to physical C/H/S assumptions still Bigger I/O Multi sector I/O with one interrupt 4K+ I/O becomes a bigger win Block sizes of file systems now often 4K Evolving Linux page cache favours 4K I/Os %page ATA4 ATAPI SCSI type commands for CD to ATA Causes older chips headaches SCSI layer ide-scsi and a translator Alternative ide-cd driver Caches Caches becoming common OS needs to flush them %page Ultra DMA Ultra DMA CRC error handling Mode changes can now be required after boot. Makes a mess Data transfer is fast Disk read rate is fast Command issue time is slow Seeks still take forever %page Taskfiles - ATA as messages Taskfiles Redefine the registers as a message In serial ATA it *is* a message ACPI suspend/restore use taskfiles drivers/ide talks taskfiles ..ish Command set evolving SMART, HPA, Locking Still no solution to seek or command rate %page LBA48 - Large Disk LBA48 Two sets of writes to each register Bigger disk Bigger I/O size Bigger controller breakage Even slower command issue rates Bigger I/O size rarely useful %page Serial ATA Serial ATA Moves away from SFF-8018 64bit addressing for DMA Multiple outstanding commands - NCQ Controllers all have differing I/O models Hotplug at disk and controller level AHCI becoming new standard model Attempts to retrofit bits to drivers/ide - not pretty New libata layer uses the block/scsi layer work %page Higher Level Happenings Problems Databases want ordering/storage guarantees Journalling file systems SCSI layer starts using TCQ Not much older ATA can do or needed to Multiple Outstanding Commands Which commands How do we merge them How do we be fair What about ordering for journals %page Barriers and Ordering Barriers 2.6 evolving explicit barrier functionality ATA can use this with cache flush commands SCSI can use TCQ Error handling for ordered commands is hard %page Pluggable schedulers Pluggable schedulers Different tasks have different needs Latency v fairness v throughput Disk improvements low relative to CPU gains Seek time is little improved in years (5-6x) Disk throughput has risen 80x We now have more CPU and more need to be smart %page Anticipatory Scheduler Anticipatory Scheduler Classic one way elevator Adds opportunistic _short_ back seeks FIFO expiry times to avoid starvation Read and write batching when possible Tries to learn "think v I/O" time of apps Batch I/O better by delaying Good for general I/O Dire for swap %page Deadline Scheduler Deadline Scheduler Similar to anticipatory Focus heavily on deadline hitting Don't delay I/O for batching Optionally do minimal merging for smart subsystems Good for databases %page CFQ and No-Op CFQ (Completely Fair Queuing) Draws on the CFQ ideas in networking Try and provide balanced bandwidth to processes Oriented to desktops CPU and memory usage to manage is higher Fairness costs throughput No-Op For smart storage systems Minimise CPU and memory cost Minimise latency Simply vomit it all directly at the disk controller %page Unsolved Problems Application smartness Opening millions of files is expensive Random gratuitous seeking is costly too Gtk/gnome startup Research problems Re-ordering binaries so they page in linearly Log structured swap Anticipatory swap-in %page Storage Has Changed Dramatically Historically Simple algorithms to save CPU Tightly linked to disk OS could understand disk layout Today Disk is a storage appliance CPU power is cheap I/O is a bottleneck Disk seek time has not scaled Need for more smarter algorithms. May all change again with new technology %page %page