Benchmarking MyRocks - ZenFS
MyRocks is a storage engine for MySQL based on RocksDB, an embeddable, persistent key-value store. Percona MyRocks is an implementation for Percona Server for MySQL.
-- Percona
ZenFS is a file system plugin that utilizes RocksDB's FileSystem interface to place files into zones on a raw zoned block device.
-- ZenFS
The benchmarking tool sysbench can be used to report the performance of MyRocks backed by ZNS drives.
sysbench is a scriptable multi-threaded benchmark tool based on LuaJIT. It is most frequently used for database benchmarks, but can also be used to create arbitrarily complex workloads that do not involve a database server. -- sysbench
Setup
Percona MyRocks comes with ZNS support through ZenFS. The 'Installation' and 'Configuration' sections of this document must be followed to install and prepare Percona MyRocks for the sysbench benchmarks below.
Also follow the sysbench Building and Installing From Source instructions.
Modifying Linux Open File Limits
After a successful installation the number of open files descriptors for the root user and the MySQL application need to be increased by following the steps below. The limits may need to be modified to meet your workload goals and objectives, and hardware specification.
- Edit the limit configuration file
/etc/security/limits.conf
and add entries with the item parameternofile
for the MySQL userroot
:
root hard nofile 500000
root soft nofile 500000
- Edit the common-session file
/etc/pam.d/common-session
and add the statement below:
session required pam_limits.so
- Edit the system control configuration file
/etc/sysctl.conf
, adding the max file value:
fs.file-max = 2097152
- Edit system control file and set the number file limit to higher value:
# systemctl edit --full mysql
LimitNOFILE = 500000
- Reload the daemon and check if changes have been applied:
# systemctl daemon-reload
# service mysql restart
# sysctl -p
# cat /proc/$(pgrep -x mysqld)/limits
# cat /proc/sys/fs/file-max
Benchmarking
Finally the test database can be created:
# mysql -u root -p
mysql> create database sbtest;
mysql> quit
To check whether RocksDB is enabled the following command can be issued in the mysql console:
mysql> show engines;
Sysbench Prepare
Prepare the database tables using the bulk load feature of the Percona Server
which saves data on load time by setting the bulkload flag and disabling
binlogging in /etc/mysql/mysql.conf.d/mysqld.cnf
:
# service mysql stop
# mv /etc/mysql/mysql.conf.d/mysqld.cnf /etc/mysql/mysql.conf.d/mysqld.cnf-bkup
# cp <path-to>/bulkload-mysqld.cnf /etc/mysql/mysql.conf.d/mysqld.cnf
# service mysql restart
# /usr/local/share/sysbench/oltp_write_only.lua --db-driver=mysql --mysql-user=root --time=0 --create_secondary=off --mysql-password=<password> --mysql-host=localhost --mysql-db=sbtest --mysql-storage-engine=rocksdb --table-size=250000000 --tables=20 --threads=64 --report-interval=5 prepare
Make sure to substitute the mysql-password
parameter.
Given that table-size * tables = db-entries
, 10 million db-entries
correspond to ~2GB of capacity. Thus, for this example a 1TB database will be created.
MySQL first searches the configuration file at /etc/my.cnf
which should
reference /etc/mysql/mysql.conf.d/mysqld.cnf
(MySQL
documentation).
bulkload-mysqld.cnf
:
[mysqld]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
datadir = /var/lib/mysql
log-error = /var/log/mysql/error.log
loose-rocksdb-fs-uri=zenfs://dev:<short_block_device_name>
default-storage-engine=rocksdb
plugin-load-add=rocksdb=ha_rocksdb.so
default-tmp-storage-engine=MyISAM
# binary log
disable-log-bin
# general
table_open_cache=256K
table_open_cache_instances=64
rocksdb_bulk_load=1
rocksdb_block_cache_size=64G
# connections
ssl=0
connect_timeout=120
back_log=4K
max_connections=4K
max_prepared_stmt_count=4M
# RocksDB
rocksdb_max_open_files=-1
rocksdb_max_background_jobs=8
rocksdb_max_total_wal_size=4G
rocksdb_block_size=16K
rocksdb_table_cache_numshardbits=6
# triggering compaction if there are many sequential deletes
rocksdb_compaction_sequential_deletes_count_sd=1
rocksdb_compaction_sequential_deletes=199999
rocksdb_compaction_sequential_deletes_window=200000
rocksdb_default_cf_options="write_buffer_size=512M;target_file_size_base=1G;max_bytes_for_level_base=512M;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;max_write_buffer_number=8;compression=kLZ4Compression;bottommost_compression=kLZ4Compression;compression_opts=-14:4:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;compaction_pri=kMinOverlappingRatio"
rocksdb_max_subcompactions=4
rocksdb_use_direct_reads=ON
rocksdb_use_direct_io_for_flush_and_compaction=ON
Replace <short_block_device_name>
with the device identifier of your ZNS drive
e.g. nvme0n1
.
With rocksdb_block_cache_size=64G
RocksDB alone will use up to 64GB of the
system memory. This value can be adjusted according to the memory constraints.
Run sysbench workloads
Before starting the benchmarking workloads the bulkload must be disabled and binlogging should be enabled.
# service mysql stop
# mv /etc/mysql/mysql.conf.d/mysqld.cnf /etc/mysql/mysql.conf.d/mysqld.cnf-bulkload-bkup
# cp <path-to>/workload-mysqld.cnf /etc/mysql/mysql.conf.d/mysqld.cnf
# service mysql restart
# /usr/local/share/sysbench/oltp_update_index.lua --db-driver=mysql --mysql-user=root --time=60 --create_secondary=off --mysql-password=<password> --mysql-host=localhost --mysql-db=sbtest --mysql-storage-engine=rocksdb --table-size=250000000 --tables=20 --threads=64 --report-interval=5 run
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 64
Report intermediate results every 5 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
[ 5s ] thds: 64 tps: 11740.20 qps: 70494.16 (r/w/o: 0.00/47003.17/23490.99) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 64 tps: 11771.37 qps: 70609.00 (r/w/o: 0.00/47064.27/23544.73) lat (ms,95%): 9.39 err/s: 0.00 reconn/s: 0.00
[ 15s ] thds: 64 tps: 12180.41 qps: 73105.84 (r/w/o: 0.00/48744.63/24361.21) lat (ms,95%): 8.90 err/s: 0.20 reconn/s: 0.00
[ 20s ] thds: 64 tps: 12531.23 qps: 75172.57 (r/w/o: 0.00/50110.31/25062.26) lat (ms,95%): 8.58 err/s: 0.20 reconn/s: 0.00
[ 25s ] thds: 64 tps: 13180.11 qps: 79099.27 (r/w/o: 0.00/52738.45/26360.82) lat (ms,95%): 8.28 err/s: 0.20 reconn/s: 0.00
[ 30s ] thds: 64 tps: 13615.39 qps: 81692.15 (r/w/o: 0.00/54461.37/27230.78) lat (ms,95%): 7.84 err/s: 0.00 reconn/s: 0.00
[ 35s ] thds: 64 tps: 14092.34 qps: 84546.84 (r/w/o: 0.00/56362.36/28184.48) lat (ms,95%): 7.17 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 64 tps: 14534.30 qps: 87213.40 (r/w/o: 0.00/58144.80/29068.60) lat (ms,95%): 7.30 err/s: 0.00 reconn/s: 0.00
[ 45s ] thds: 64 tps: 14967.48 qps: 89800.88 (r/w/o: 0.00/59865.32/29935.56) lat (ms,95%): 6.55 err/s: 0.40 reconn/s: 0.00
[ 50s ] thds: 64 tps: 15048.07 qps: 90288.80 (r/w/o: 0.00/60192.67/30096.13) lat (ms,95%): 6.55 err/s: 0.00 reconn/s: 0.00
[ 55s ] thds: 64 tps: 15939.32 qps: 95627.92 (r/w/o: 0.00/63752.48/31875.44) lat (ms,95%): 6.43 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 64 tps: 15795.73 qps: 94790.17 (r/w/o: 0.00/63195.52/31594.66) lat (ms,95%): 6.09 err/s: 0.00 reconn/s: 0.00
SQL statistics:
queries performed:
read: 0
write: 3308279
other: 1654141
total: 4962420
transactions: 827068 (13782.32 per sec.)
queries: 4962420 (82694.12 per sec.)
ignored errors: 5 (0.08 per sec.)
reconnects: 0 (0.00 per sec.)
Throughput:
events/s (eps): 13782.3199
time elapsed: 60.0093s
total number of events: 827068
Latency (ms):
min: 0.49
avg: 4.64
max: 33.85
95th percentile: 7.98
sum: 3838911.90
Threads fairness:
events (avg/stddev): 12922.9375/27.80
execution time (avg/stddev): 59.9830/0.00
Make sure to substitute the mysql-password
parameter.
Depending on the desired workload oltp_update_index.lua
might be replaced
along with different sysbench parameters like --table-size
.
For read workloads the database should contain the specified table dimensions.
ZNS shines at the point when conventional SSDs are in their steady
state.
Therefore, filling the drive with a sequential workload and overwriting it with a
random workload afterwards should be considered.
workload-mysqld.cnf
:
[mysqld]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
datadir = /var/lib/mysql
log-error = /var/log/mysql/error.log
loose-rocksdb-fs-uri=zenfs://dev:<short_block_device_name>
default-storage-engine=rocksdb
plugin-load-add=rocksdb=ha_rocksdb.so
default-tmp-storage-engine=MyISAM
# binary log
binlog_space_limit=4G
sync_binlog=1K
# general
table_open_cache=256K
table_open_cache_instances=64
rocksdb_bulk_load=0
rocksdb_block_cache_size=64G
# connections
ssl=0
connect_timeout=120
back_log=4K
max_connections=4K
max_prepared_stmt_count=4M
# RocksDB
rocksdb_max_open_files=-1
rocksdb_max_background_jobs=8
rocksdb_max_total_wal_size=4G
rocksdb_block_size=16K
rocksdb_table_cache_numshardbits=6
# triggering compaction if there are many sequential deletes
rocksdb_compaction_sequential_deletes_count_sd=1
rocksdb_compaction_sequential_deletes=199999
rocksdb_compaction_sequential_deletes_window=200000
rocksdb_default_cf_options="write_buffer_size=512M;target_file_size_base=1G;max_bytes_for_level_base=512M;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;max_write_buffer_number=8;compression=kLZ4Compression;bottommost_compression=kLZ4Compression;compression_opts=-14:4:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;compaction_pri=kMinOverlappingRatio"
rocksdb_max_subcompactions=4
rocksdb_use_direct_reads=ON
rocksdb_use_direct_io_for_flush_and_compaction=ON
Replace <short_block_device_name>
with the device identifier of your ZNS drive
e.g. nvme0n1
.
With rocksdb_block_cache_size=64G
RocksDB alone will use up to 64GB of the
system memory. This value can be adjusted according to the memory constraints.