Pencarian

Rss Posts

 

 

 

MariaDB 5.3.4 benchmarks

Feb 19, 2012

MariaDB 5.3 has reached the release candidate milestone, and the 5.3 version promises a lot of new features and optimization (i.e in optimizer http://kb.askmonty.org/en/what-is-mariadb-53#query-optimizer). No surprise I wanted to check how all improvements affect general performance.
So I why don’t we run old good sysbench benchmark.

For the benchmark I took:

HP ProLiant DL380 G6 box
sysbench multitables oltp rw workload, 16 tables, 500mil rows each, total datasize about 30GB
working threads from 1 to 256
Versions: MariaDB 5.3.4, MySQL 5.5.20
Data is stored on RAID10 HDD partition
Like in all my recent benchmarks, I make throughput measurements each 10 sec, so we can see the stability of the throughout

The raw results, configuration and scripts are available on our Benchmarks Launchpad
The graphical results:
Throughput (more is better)

Threads MariaDB 5.3.4 MySQL 5.5.20 Ratio
1 252 271 0.9298893
2 412 588 0.7006803
4 801 1097 0.7301732
8 1709 2205 0.7750567
16 3197 4076 0.7843474
32 3303 4166 0.7928469
64 3336 4150 0.8038554
128 3800 4170 0.9112710
256 3710 4131 0.8980876

I was surprised to see that MariaDB shows 20-30% worse throughput.
It seems many changes resulted to performance hit in general. I wonder whether MariaDB team runs performance regression benchmarks, and if they do, why do we see such performance decline.
Follow @VadimTk

Can we improve MySQL variable handling ?

Feb 19, 2012

MySQL Settings (also known as Server Variables) have interesting property. When you set variable in running server this change is not persisted in any way and server will be back to old value upon restart. MySQL also does not have option to re-read config file without restarting as some other software so approach to change config file and when instruct server to re-read it also does not work. This leads to runtime settings being different from settings set in config file, and unexpected change on restart a frequent problem.

pt-config-diff is the tool which can help with this problem a lot, being able to compare settings in my.cnf to those server is currently running with. The problem however this only works well if settings are set in my.cnf as if default option was used and we change it in run time we can’t detect such change easily because MySQL Server does not seems to have an easy way to check what was the default value for given Server Variable.
The only way I’m aware about is running the server from command line with –no-defaults –verbose –help options:

pz@ubuntu:~$ /usr/sbin/mysqld –no-defaults –verbose –help

timed-mutexes FALSE
tmp-table-size 16777216
tmpdir /tmp
transaction-alloc-block-size 8192
transaction-isolation REPEATABLE-READ
transaction-prealloc-size 4096
updatable-views-with-limit YES
userstat FALSE
verbose TRUE
wait-timeout 28800

Which is however rather ugly and only works with shell access to the server which is not always the case.
Interesting enough MySQL Allows you to SET variable to default value (compile time default, not the one server was started with) yet there seems not to be a way to read it:

mysql> set global sort_buffer_size=DEFAULT;
Query OK, 0 rows affected (0.00 sec)

mysql> select @@global.sort_buffer_size;
+—————————+
| @@global.sort_buffer_size |
+—————————+
| 2097152 |
+—————————+
1 row in set (0.00 sec)

This could be used as technique to detect the value for DEFAULT variables for SESSION variables, yet for some GLOBAL variables setting them back and forth would not be safe.
The simple change which would make dealing with MySQL variables in automated way a lot more convenient would be extending INFORMATION_SCHEMA.GLOBAL_VARIABLES Currently as of MySQL 5.5 it contains only variable name and value. Yet I would suggest adding few more columns such as DEFAULT – to hold compile time default value for variable and STARTUP to hold the value the server was started with.
It also might be good idea to extend SELECT syntax to ease querying of variable global value Right now I can select:

mysql> select @@global.sort_buffer_size;
+—————————+
| @@global.sort_buffer_size |
+—————————+
| 2097152 |
+—————————+
1 row in set (0.00 sec)

If I could only refer to “default” or “startup” in addition to “global” and “session” prefixes which are available now it would be quite nice.

The benefit of keeping the InnoDB transaction log in cache

Feb 19, 2012

I was getting inconsistent performance results while running sysbench to generate a workload of point-updates and point-lookups. The rate of rows updated per second would vary between 200 and 600 and the variance appeared to be random. From PMP there was a lot of contention on the transaction log system mutex. It took me about one day to guess at the root cause — the InnoDB transaction log was not in the OS buffer cache and 512-byte aligned log writes frequently required disk reads to get a 4kb aligned page into the cache to apply the write. The problem is avoided when either the transaction log remains in the OS buffer cache or you use the Percona patch that adds all_o_direct as an option for innodb_flush_method.?The problem was harder to debug than it should be because InnoDB didn’t report log write latency via a separate metric. All synchronous writes, log and doublewrite buffer, were reported via the “Sync writes” line and that combines large/slow writes to the doublewrite buffer with small/fast writes to the transaction log:Sync writes: 4836209719 requests, 0 old, 5009.82 bytes/r, svc: 498087.30 secs, 0.10 msecs/r?I have a diff out to fix that:Log writes: 2328355 requests, 0 old, 955.39 bytes/r, svc: 14.44 secs, 0.01 msecs/rDoublewrite buffer writes: 19718 requests, 12 old, 946736.45 bytes/r, svc: 41.29 secs, 2.09 msecs/r?I suspect this is even harder to debug in official MySQL which doesn’t have any of the metrics above in SHOW INNODB STATUS output or SHOW STATUS counters. Perhaps the performance schema makes this easier to debug but I don’t know much about that feature.?I then reproduced the problem by starting the benchmark with the transaction log in the OS buffer cache and then running?echo 1 > /proc/sys/vm/drop_caches to remove it from cache. The results below show the impact on both the rate of rows read and updated per second. The test used 128 client threads doing point-lookups and 32 client threads doing point-updates. The database was 240G on disk and the InnoDB buffer cache was 30G. The rates drop significantly when the cache is dropped.??The results are great prior to removing the transaction log from cache. On a server with 8 10k RPM SAS disks I was able to get 2800 point-lookups and 500 point-updates per second. Using 16kb InnoDB pages the server sustained 2500 page reads/second from disk and 500 page writes/second to disk.??The final graph uses logscale for the y-axis to plot the rate of rows updated/second and the average latency for a log write in microseconds. The update rate drops when the log write latency spikes to more than 10ms per write. It was less than 10us prior to that.?

NoSQL performance numbers – MySQL and Redis

Feb 18, 2012

Links to performance numbers posted wrt various NoSQL solutions:
A top 20 global website announced they have migrated from MySQL to Redis. There will be a keynote and everything. It doesn’t say how big the Redis Cluster is, but they serve 100M pages / day, and clock 300k Redis queries / second.

https://groups.google.com/forum/?fromgroups#!topic/redis-db/d4QcWV0p-YM

Btw, they mention that MySQL remains as the master data store from which the Redis indexes are generated.
(The reason I don’t mention the name of this Redis user is simply I feat my mom is sometimes reading my blog…)
read more

MariaDB: the new MySQL? Interview with Michael Monty Widenius.

Sep 29, 2011

?I want to ensure that the MySQL code base (under the name of MariaDB) will survive as open source, in spite of what Oracle may do.? — Michael ?Monty? Widenius. Michael ?Monty? Widenius is the main author of the original version of the open-source MySQL database and a founding member of the MySQL AB company. [...]

InnoDB at Oracle OpenWorld

Sep 28, 2011

Sunny and I will be presenting at the Oracle OpenWorld next week:

Introduction to InnoDB, MySQL’s Default Storage Engine,? 10/04/11 Tuesday 01:15 PM, ? Marriott Marquis – Golden Gate C3, ? ? Calvin Sun
InnoDB Performance Tuning,? 10/04/11 Tuesday? 03:30 PM, ? Marriott Marquis – Golden Gate C2, ? Sunny Bains

The first session is for beginners, who are new to InnoDB and MySQL. The second session will cover many new performance features in MySQL 5.5 and 5.6, and share some tuning tips to maximize MySQL performance.
What to learn more about MySQL? There will be something for everyone. Come to join us!

 

MySQL Connection Timeouts

Apr 20, 2011

Sometimes on very busy MySQL server you will see sporadic connection timeouts, such as Can’t connect to MySQL server on ‘mydb’ (110). If you have connects timed in your
application you will see some successful connections taking well over the second. The problem may start very slow and be almost invisible for long time, for example having one out of million
of connection attempts to time out… when as the load growths it may become a lot more frequent.

If you time the connect you will often see connection times are being close to 3 and 9 seconds. These are “magic” numbers which I remember from years ago, which correspond to SYN packet being dropped during connection attempt and being resent. 3 seconds corresponds to 1 packet being dropped and 9 seconds correspond to two. If this is happening it is possible you have network issues or more likely you have listen queue overflow. You can check if it is the case by running netstat -s and finding something like:

38409 times the listen queue of a socket overflowed
38409 SYNs to LISTEN sockets dropped

This means some SYN packets have to be dropped because kernel buffer of connection requests on LISTEN socket is overflow – MySQL is not accepting connections as quickly as it needs.
There are 2 tuning places you need to consider if this is what is happening.
First – Linux kernel net.ipv4.tcp_max_syn_backlog This is size of kernel buffer for all sockets.
Default I have on my kernel is 2048 though it might vary for different versions, you might need to increase it to 8192 or so if you have intense connection. I’ll explain the math below.
Second – is MySQL parameter back_log which has default value of just 50. You may want to set it to 1000 or even higher. You may also need to increase
net.core.somaxconn kernel setting which contains the maximum depth of listen queue allowed. The kernel I’m running has it set to just 128 which would be too low for many
conditions.
Now lets look more into the problem and do some Math. First lets look into how MySQL accepts connection. There is single main thread which is accepting connections coming to LISTEN
sockets. Once there is connection coming it it needs to create a new socket for incoming connection and create a new thread or take one out of the thread cache. From this point on MySQL processes network communication in multiple threads and can benefit from multiple cores but this work done by main thread does not.
Usually main thread is able to accept connections pretty quickly, however if it stalls waiting on mutex or doing any other work such as launching new thread takes a lot of time you can have the listen queue to overflow. Lets look at the database which accepts 1000 of connects/sec in average. This is a high number but you can see ones even higher. In most cases because of “random arrivals” nature of traffic you will see some seconds where as much as 3000 connections come in. Under such conditions the default back_log of 50 is enough just for 17 milliseconds, and if main thread stalls somewhere longer than, some SYN packets may be lost.
I would suggest sizing your tcp_max_syn_backlog and back_log value to be enough for at least 2 seconds worth of connection attempts. For example If I have 100 connects/sec which means I should plan for 300 connections using 3x for “peak multiplier”. This means they should be set to at least 600.
Setting it to cover much more than 2 seconds does not make much sense because if client does not get a response within 3 seconds it will consider SYN packet is lost and will send the new one anyway.
There is something else. If you’re creating 1000 of connections a second to MySQL Server you might be pushing your luck and at very least you’re using a lot of resources setting up and tearing down connections. Consider using persistent connections or connection pool at least for applications which are responsible for most of connections being created.

Setting up replication with XtraBackup

Apr 19, 2011

I attended Vadim Tkachenko’s talk on XtraBackup during MySQL conference in Santa Clara last week. Backups are obviously very important, but the use case I had in mind is this:
Replicating a database that has Innodb tables in it, while keeping both master and slave on line if possible.
Tangent: by the way, I love the native backup utility that was once promised in MySQL 6.0, similar to SQL Server’s way of backup. It was like running “BACKUP myDb to DISK = ‘/backupDirectory/myDb.bak’” under mysql client, but I digress…
I have used mysqldump to accomplish this in the past, but I wondered how XtraBackup would fare in this task, especially after hearing Vadim’s talk and reading news on Percona’s development effort. To cut to the chase, this is my conclusion. Reproducing steps are listed immediately afterwards.
1. innobackupex provides a consisten database backup, spitting out log file and log positions in stdout, which is nice and useful for slave initiation;
2. It works with both MyISAM and innodb tables;
3. If MyISAM tables are all you have, just run innobackupex –prepare /directoryWhereBackupIs, and then move the database directory from under /directoryWhereBackupIs to under your slave’s datadir, then make the necessary group and owner change to said directory and its content files, and you are ready to run the “change master” command and start slave;
4. If the database has innodb tables, then in addition to step 3, you will also need to stop mysql on slave, move the ibdata1 file to datadir, then restart mysql, and run “change master…” and “start slave” commands. It does not matter if you are using innodb_file_per_table or not.
It will be nice if I can keep the slave up and running during this step when the database has innodb tables in it. Did I do anything wrong? Is there a better way? What if the slave has a database that has innodb tables and thus uses ibdata1 to begin with? What do you do then? Should I play with Tungsten’s replication? What are the compelling reasons to use Tungsten’s replication?
In any case, from my limited testing, I think I will use innobackupex for future replication creation tasks, if I can afford a mysqld restart. Overall, it feels a bit easier than mysqldump approach that I’ve been using in the past.
Here are the steps needed to reproduce:
1. Fire up 2 Rackspace CentOS 5.5 servers. Rackspace cloud servers beat Amazon EC2 servers hands down, in my view, for developing/sandboxing purposes;
2. Install the required mysql client, server, and XtraBackup on both servers;
3. Make /etc/my.cnf by cloning the sample cnf files under /usr/share/my-small.cnf. 3 minimum changes were necessary: log-bin=mysql-bin, server-id=a unique number, datadir=/var/lib/mysql. The first 2 are necessary for replication, the last is needed for innobackupex
Well, while you are at it, on slave, add in read-only and skip-slave-start if appropriate. That’s best practice for read only slave.
4. Add master server’s public key to authorized_keys on slave, to facilitate easy ssh connection.
5. On master, run this command:

innobackupex –databases=test –stream=tar /tmp/ –slave-info | ssh root@slave "tar xfi – -C /root"
When it finishes, you should see something like this:
110419 18:54:21 innobackupex: completed OK!
tar: Read 6656 bytes from -

Take note of 3 lines immediately above it, where it states the binlog file and log position, like this:

innobackupex: MySQL binlog position: filename 'mysql-bin.000002', position 2515

6. On slave, run this command:

innobackupex –apply-log /locationWhereBackupIs

then, assuming the database name is test, run the 2 commands below to change the group and owner to mysql:

chgrp -R mysql test
chown -R mysql test

move the directory under mysqld’s datadir:

mv test/ /mysql/datadir

If test database has innodb tables in it, stop mysql on slave, then copy ibdata1 to datadir, restart mysql.
7. On master, open up port 3306 if it is not already open, then create the replication account:

grant replication slave, replication client on *.* to repl@'50.56.121.%' identified by 'p@ssw0rd';

8. On slave, run:

change master to master_host='50.56.121.96', master_user='repl', master_password='p@ssw0rd', master_log_file='see output from innobackupex backup command on master', master_log_pos=numFrominnobackupexOutputOnMaster;

start slave;

show slave status\G

MySQL & NoSQL Survey

Mar 13, 2011

Hello,

Could you please take the time and fill in this short survey about using MySQL and NoSQL in companies.
I will publish the results in a week.

Thank you for your time.

<p>Loading…</p>

A cool terminal tip for Mac users

Mar 13, 2011

If you use a Mac, and you are dealing with many similar tasks at once, like examining many database servers in different terminals, you may like this one.I have been using iTerm 2 for a while, and my handling of parallel tasks has improved a lot. (No, I am not talking about Parallel replication, although I have applied this trick while testing that technology as well.)iTerm2 has some cool features, and probably the most striking one is split panes. That alone would be a good reason for giving iTerm2 a try. But the one that I use the most, often in combination with Split Panes, is called Send Input to all tabs.Here is how it works. Let’s say I need to use 4 servers at once, and perform a non-repeating operation in all of them.So I open a separate window and I split the screen into 5 panes. I connect to each server in the first four panes, and I open a vim instance in the fifth.With that done, I enable the magic option.A word of caution. This option sends the input to all the open tabs in your current window. If you don’t want this to happen, do as I do, and open a separate window. Then make sure that all tabs, and eventually split panes, are supposed to receive your input. The application asks you for confirmation.After that, whatever I type on one pane will be mirrored on all the panes. So I will see the commands running on my four servers, and being logged in a text file in the fifth one. All with just single command, I have all servers under control at once: