The Flexible I/O Tester (fio) was originally written as a test tool for the kernel block I/O stack. Over the years, fio however gained many features and detailed performance statistics output that turned this tool into a standard benchmark application for storage devices.
fio source code is available on GitHub.
Support for zoned block devices was added to fio with version 3.9. All previous versions do not provide guarantees for write command ordering with host managed zoned block devices. Executing workloads is still possible, but requires writing complex fio scripts.
Changes to fio to support zoned block devices include several new options
allowing a user to control zoned block device compliant workloads. fio
already implemented the option
--zonemode which allows defining workloads
operating on disjoint ranges of blocks. This option was reused to define the new
zbd zone mode.
When the zbd zone mode is used by an fio job, the
--zonerange option is
ignored and the
--zonesize option is automatically set to the device zone
size. Furthermore, the behavior of read and write commands is modified as
Read and write commands are split when a zone boundary is crossed.
For sequential writes, the write stream is always started from a zone write pointer position. If the next zone to be written is not empty, the write stream "jumps" to that zone write pointer and resumes.
For random write workloads, write commands are always issued at the write pointer position of the target zone of the write command.
Any write command targeting a sequential zone that is full (entirely written) will trigger a reset of the zone write pointer before issuing the write I/O.
By default, all read commands always target written sectors, that is, sectors between the start sector and the write pointer position of sequential write zones. This behavior can be disabled, allowing read commands to be issued to any sector, using the new option
Additionally, finer control over the workload operation can be added with the following new options.
--max_open_zones This option limits the number of zones that are being written by a workload. With this option, a random write workload cannot issue write commands targeting more zones than the limit set. Once a zone that is being written becomes full, another zone is chosen and writes are allowed to target the zone, resulting in a constant number of zones being written always at most equal to the
--zone_reset_threshold and --zone_reset_frequency These two options allow a user to emulate the execution of zone reset commands being issued by an application.
In addition to these options, the zbd zone mode automatically enables job synchronization to ensure that a workload spanning multiple threads or processes can concurrently execute write I/Os targeting the same zone.
As discussed in the kernel support
direct write I/O is mandatory for zoned block devices. The zbd zone mode,
when enabled, enforces this requirement by checking that the option
--direct=1 is specified for any job executing write I/Os.
--size options must specify values that are aligned to the
device zone size.
This section provides various examples showing how to use fio new zbd zone mode. In all examples, a 15TB ZAC host managed SATA disk is used. The disk zone size is 256 MiB. The disk has 524 conventional zones starting at offset 0. The first sequential write required zone of the disk starts at sector 274726912 (512 B sector unit), that is, at the byte offset 140660178944.
The following command sequentially writes the first 4 sequential zones of the disk using the libaio I/O engine with a queue depth of 8.
The first four sequential write required zones of the disk are now full.
With the disk in this state, executing the same command again without the zbd zone mode enabled, fio will attempt to write to full zones, resulting in I/O errors.
With the zbd zone mode enabled, the same command executed again with the zones full succeeds.
Note that fio output in this case indicates the number of zones that were reset prior to writing.
With the disk previous state preserved (with the first four sequential write zones full), the previous command can be changed to read operations targeting the written zones.
If the zones are reset before executing this command, no read I/O will be executed as fio will be enable to find zones with written sectors.
Forcing the execution of read I/Os targeting empty zones can be done using the
The higher IOPS performance observed with this test compared to the previous one (i.e. IOPS=1411 vs. IOPS=951) results from the disk not physically executing any media access as there is no data to read (no written sectors). The disks returns a fill pattern as data without seeking to the sectors specified by the read commands.
The following command randomly write sequential write zones of the disk using 4 jobs, each job operating at a queue depth of 4 (overall queue depth of 16 for the disk). The run time is set to 30 seconds.
zbc_report_zones can be used to explore the state of the disk at the end of this workload execution.
This indicates that 4498+128=4626 zones were written to, with none of the sequential write zones fully written (no full zone). Switching the operation mode to read, the sectors written in this last run can be randomly read.
Resetting all sequential write zones of the disk and executing again the random read workload leads to similar results as for the previous sequential read workload case, that is, no read I/O is executed.
Changing the range to be read to include the conventional zones of the disk will result in read I/Os being executed.
The SCSI generic direct access interface can also be used with the zbd zone mode, as long as the block device file (/dev/sdX) is used to specify the disk. The zbd zone mode will not be enabled if the SCSI generic node file (/dev/sgY) is used to specify the disk.
The example below illustrates the use of the sg I/O engine with 8 jobs executing a 64KB random write workload to sequential write zones.
SCSI generic direct access bypasses the block layer I/O scheduler. For zoned block devices, this means that the deadline I/O scheduler zone write locking is enable to provide write command ordering guarantees. However, the zbd mode ensures mutual exclusion between jobs for write access to the same zone. Such synchronization is in essence identical to zone write locking and execute all write commands without any error.
A typical zoned block device compliant application will write zones sequentially until the zone is full, then switch to another zone and continue writing. Multiple threads may be operating in this manner, with each thread operating on a different zone.
Such typical behavior can be emulated using the option
together with a number of I/O operations specified at the end of the
--rw=randwrite argument. Below is an example script of 4 jobs sequentially
writing zones up to full using 512KB write operations (that is, 512 I/Os per
256 MB zone). The zones being written are chosen randomly within disjoint zone
ranges for each job. This is controlled with the
arguments. The script file
streams.fio achieving such workload is shown below.
The result for this script execution is shown below.