Pencarian

Rss Posts

 

 

 

The benefit of keeping the InnoDB transaction log in cache

Feb 19, 2012

I was getting inconsistent performance results while running sysbench to generate a workload of point-updates and point-lookups. The rate of rows updated per second would vary between 200 and 600 and the variance appeared to be random. From PMP there was a lot of contention on the transaction log system mutex. It took me about one day to guess at the root cause — the InnoDB transaction log was not in the OS buffer cache and 512-byte aligned log writes frequently required disk reads to get a 4kb aligned page into the cache to apply the write. The problem is avoided when either the transaction log remains in the OS buffer cache or you use the Percona patch that adds all_o_direct as an option for innodb_flush_method.?The problem was harder to debug than it should be because InnoDB didn’t report log write latency via a separate metric. All synchronous writes, log and doublewrite buffer, were reported via the “Sync writes” line and that combines large/slow writes to the doublewrite buffer with small/fast writes to the transaction log:Sync writes: 4836209719 requests, 0 old, 5009.82 bytes/r, svc: 498087.30 secs, 0.10 msecs/r?I have a diff out to fix that:Log writes: 2328355 requests, 0 old, 955.39 bytes/r, svc: 14.44 secs, 0.01 msecs/rDoublewrite buffer writes: 19718 requests, 12 old, 946736.45 bytes/r, svc: 41.29 secs, 2.09 msecs/r?I suspect this is even harder to debug in official MySQL which doesn’t have any of the metrics above in SHOW INNODB STATUS output or SHOW STATUS counters. Perhaps the performance schema makes this easier to debug but I don’t know much about that feature.?I then reproduced the problem by starting the benchmark with the transaction log in the OS buffer cache and then running?echo 1 > /proc/sys/vm/drop_caches to remove it from cache. The results below show the impact on both the rate of rows read and updated per second. The test used 128 client threads doing point-lookups and 32 client threads doing point-updates. The database was 240G on disk and the InnoDB buffer cache was 30G. The rates drop significantly when the cache is dropped.??The results are great prior to removing the transaction log from cache. On a server with 8 10k RPM SAS disks I was able to get 2800 point-lookups and 500 point-updates per second. Using 16kb InnoDB pages the server sustained 2500 page reads/second from disk and 500 page writes/second to disk.??The final graph uses logscale for the y-axis to plot the rate of rows updated/second and the average latency for a log write in microseconds. The update rate drops when the log write latency spikes to more than 10ms per write. It was less than 10us prior to that.?

InnoDB at Oracle OpenWorld

Sep 28, 2011

Sunny and I will be presenting at the Oracle OpenWorld next week:

Introduction to InnoDB, MySQL’s Default Storage Engine,? 10/04/11 Tuesday 01:15 PM, ? Marriott Marquis – Golden Gate C3, ? ? Calvin Sun
InnoDB Performance Tuning,? 10/04/11 Tuesday? 03:30 PM, ? Marriott Marquis – Golden Gate C2, ? Sunny Bains

The first session is for beginners, who are new to InnoDB and MySQL. The second session will cover many new performance features in MySQL 5.5 and 5.6, and share some tuning tips to maximize MySQL performance.
What to learn more about MySQL? There will be something for everyone. Come to join us!

 

Tuning InnoDB Configuration

Dec 21, 2010

I had earlier written a post on tuning the MySQL server configuration which was more geared towards the MyISAM storage engine. While that is not because I didn’t intend on ignoring InnoDB but because I had planned a whole post on tuning InnoDB related configuration. So this post is the post that I had planned, I have discussed the major configuration parameters in here that should help you out most of the times.

SHOW ENGINE INNODB STATUS truncation, innodb_truncated_status_writes

Oct 29, 2010

Another piece of good news for MySQL 5.5 – the output of SHOW ENGINE INNODB STATUS has now been increased from 64kB, to 1MB. For those running with systems that have thousands of running transactions, or large lock outputs, it should take quite a bit more to force truncation now.
We also added a new status variable to track when truncation happens as well – innodb_truncated_status_writes, so you can detect this should you have automated monitoring depending on this output.
Bug#56922 for details.

Bookmark It

Hide Sites

InnoDB memory allocation, ulimit, and OpenSUSE

Aug 23, 2010

I recently encountered an interesting case. A customer reported that mysqld crashed on start on OpenSUSE 11.2 kernel 2.6.31.12-0.2-desktop x86_64 ? with 96 GB RAM when the innodb_buffer_pool_size was set to anything more than 62 GB. I decided to try it with 76 GB. The error message was an assert due to a failed malloc() in ut_malloc_low() in ut/ut0mem.c inside InnoDB source code. InnoDB wraps the majority of its memory allocations in ut_malloc_low(), so to get an idea of the pattern of requested allocations I added a debugging fprintf() to tell me how much was being allocated and whether it was successful.
I discovered something interesting. I expected the allocation to fail on the 76 GB of the buffer pool, due to some weird memory mapping issue and a continuous block of 76 GB not being available. However, that is not what happened. 76 GB buffer was allocated successfully. What was failing is the allocation of 3.37GB after that. What in the world could InnoDB need that was 3.37 GB? There was nothing in the settings that asked for anything close to 3 GB explicitly.
Source code is the ultimate documentation, and I took advantage of that. My good friend GDB guided me to buf_pool_init() in buf/buf0buf.c. There I found the following:
buf_pool->frame_mem = os_mem_alloc_large(
UNIV_PAGE_SIZE * (n_frames + 1),
TRUE, FALSE);
That was the buffer pool itself, the 76 GB of it. And now the buffer pool’s friend:
buf_pool->blocks = ut_malloc(sizeof(buf_block_t) * max_size);
3.6 GB of it!
From the comments in the code (InnoDB code actually has very good comments), max_size is the maximum number of buffer pool pages (16K each), n_frames which is the same thing unless AWE is used, but it was not used, so I did not worry about it.
What shall we call that friend? It is used for storing some meta information about buffer pool pages. The most natural name I could come up with from reading the source code is the blocks array.
Thus we can see that we are allocating another chunk that is in proportion to the setting of innodb_buffer_pool_size for the blocks array. The exact proportions will probably vary from version to version, but roughly about 1 G for every 25 G of the buffer pool. This can become significant in the proper innodb_buffer_pool_size estimations when the system has a lot of RAM and you want to have the largest possible innodb_buffer_pool_size. Do not forget to give the blocks array some room!
While this was an interesting investigation, it nevertheless did not explain why there was not enough room for a 76 GB buffer pool. Even with the extra 3.37 GB allocation, there was still some free memory. Or was there? Maybe some hidden monster was eating it up? I quickly wrote this hack to prove or disprove the monster’s presence.
I verified that I could allocate and initialize two chunks of 40 GB from two separate processes, but not 80 GB from one. In fact, 80GB allocation failed right in malloc(), did not even get to initialization. I tested it with allocating 70 GB concurrently in each process so as to overrun physical memory + swap. Both allocations were successful, one initialized successfully, the other was killed by the OOM kill during initialization.
This smelled like a low ulimit, and sure enough it was. ulimit -m ulimited; ulimit -v unlimited did the magic, and mysqld successfully started with an 80 GB buffer pool. Apparenly OpenSUSE defaults are set in proportion to physical memory to keep the memory-hungry applications from taking the system down. On this particular system (96 GB physical memory, 2 GB swap it decided to set the virtual memory ulimit (-v) to 77.27 GB, and the physical memory (-m) to 80.40 GB).

Entry posted by sasha |
No comment
Add to: | | | |

Rilis Perdana Keuangan Guyub versi KG-20100512-1

May 12, 2010

Dengan mengucap Bismillahirrahmanirrahim
Hari ini 12 Mei 2010, kami merilis versi perdana dari Keuangan Guyub – Aplikasi Keuangan Open Source untuk UKM Jasa.
Bagi teman-teman yang tertarik mencoba bisa download langsung versi perdana di http://keuangan-guyub.googlecode.com/files/KG-20100512-1.zip.
Untuk cara instalasi oleh bisa dilihat di http://code.google.com/p/keuangan-guyub/wiki/PanduanSingkatInstalasi.
Petunjuk penggunaan bisa di download di http://keuangan-guyub.googlecode.com/files/DokumentasiPengguna-KG-20100512-1.pdf.
Dan kami juga sangat mengharapkan saran, masukan, kritikan dan kontribusi [...]

innodb_file_per_table

Feb 10, 2010

Recently I attempted to use this parameter in one of our InnoDB table and had an experience to make a note of. So this is what actually happened.As we know InnoDB writes all the table information into one tablespace file ibdata1.Obviously this lead to a disk space issue, since the ibdata1 file grew to 90G+ and it was not possible to defragment tablespace using the Alter method (Infact is is hardly possible to do in a production environment, since it leads to a huge downtime.)So I decided to go for innodb_file_per_table on our other production database server which was being configured from scratch.Now this database had around 15 tables and it had very high inserts/deletes/updates. For few days, things went ok. But in few days, I started getting complains from development team regarding performance on the tables and queue being backed up. Apparently whenever a delete / update was made on a table, all the other tables slow down and queries started taking more than 4 to 5 secs to complete.Dug around, ran some more optimisations , flush tables but not much difference. I was able to get 660 to 661 qps tops.On digging further and discussing with developers, we figured out that innodb_file_per_table is the culprit.InnoDB uses fsync() instead of fdatasync() to flush both data and log files on non-Windows system. fdatasync() is like fsync(), except it flushes only files data, not its metadata (last modified time etc.). Therefor fsync() can cause more IO. innodb_file_per_table causes each file to be fsync()ed separately which means writes to multiple tables can’t be combined into a single IO operation. This requires InnoDB to perform higher total number of fsync() operations, leading to a IO contention in my case. Finally we had to migrate the database to another machine without innodb_file_per_table since it is also not possible to undo the existing tables for which files are created per table.As quoted in MySQL documentation : The –innodb_file_per_table option affects only table creation, not access to existing tables. If you start the server with this option, new tables are created using .ibd files, but you can still access tables that exist in the shared tablespace. If you start the server without this option, new tables are created in the shared tablespace, but you can still access any tables that were created using multiple tablespaces.Once we migrated the performance started going up , almost upto 1500 qps.So moral of the story : Better test the parameter innodb_file_per_table in a test environment, with the load higher than expected on production. I have read many blogs and many people are benefited by the use of this parametere, but as a general practice I would not favor its use. Sure it can be useful to reclaim some disk space, but I dont think it would be worth at the cost of performance.

InnoDB, InnoDB-plugin vs XtraDB on fast storage

Jan 13, 2010

To continue fun with FusionIO cards, I wanted to check how MySQL / InnoDB performs here. For benchmark I took MySQL 5.1.42 with built-in InnoDB, InnoDB-plugin 1.0.6, and XtraDB 1.0.6-9 ( InnoDB with Percona patches).
As benchmark engine I used tpcc-mysql with 1000 warehouses ( which gives around 90GB of data + indexes) on my workhourse Dell PowerEdge R900 ( details about box ).
On storage configuration: FusionIO 160GB SLC and 320GB MLC cards are configured in software RAID0 to store InnoDB datafiles. For InnoDB logs and system tablespace I used partition on regular RAID10 with regular hard drives, here I followed Yoshinori Matsunobu#8217;s recommendations http://yoshinorimatsunobu.blogspot.com/2009/05/tables-on-ssd-redobinlogsystem.html and taking fact that FusionIO is not perfect for sequential writes http://www.mysqlperformanceblog.com/2010/01/11/fusionio-320gb-mlc-benchmarks/
Full results I put on page http://www.percona.com/docs/wiki/benchmark:fusionio:mysql:start, here are my thoughts and interesting facts.
First, chart with results for InnoDB vs InnoDB-plugin during runs (values are in new order transactions per minute, more is better) :

As you see InnoDB-plugin is doing much better here, it allows to utilize multiple IO threads,
which as we saw ( http://www.mysqlperformanceblog.com/2010/01/11/fusionio-320gb-mlc-benchmarks/ ) is necessary to get most throughput from FusionIO.
Also you may see from graph some waves for InnoDB-plugin. Here we observe innodb_adaptive_flushing in action (which is ON by default), and I think innodb_adaptive_flushing in InnoDB-plugin is not quite balanced, it may do overaggressive flushing, when it is not necessary.
But looking on CPU stats (see graph later), I guess InnoDB-plugin spends most time in buffer_pool mutex, contention here is not fully resolved yet in InnoDB-plugin.
Now, let#8217;s take XtraDB. In additional to multiple IO threads, we have patch to decrease contention on buffer_pool mutex, plus separate purge thread. Also we use different adaptive_checkpoint algorithm.
The results are:

So I guess buffer_pool improvements play here for XtraDB, and looking on summary result:

InnoDB 9439.316 NOTPM
InnoDB-plugin-1.0.6 15299.333 NOTPM
XtraDB-1.0.6-9 26160.551 NOTPM

InnoDB-plugin is 1.6x times better InnoDB, and XtraDB is 1.7x times better InnoDB-plugin.
Now on CPU usage and disk utilization.
Disk throughput:

CPU (user) usage:

Even with improvements, XtraDB performs less 150MB/s in disk writes (from benchmarks we
saw FusionIO can do much more) and with 45-50% of idle CPU.
I assume we still see significant contentions inside XtraDB, and there big room for improvements. As for InnoDB-plugin, I#8217;d wish InnoDB team makes some actions on buffer_pool mutex problem.
Finally I wanted to check what if we put innodb transactional logs and system tablespace on FusionIO also, there is graph for that:

It is not so bad, with final result 23038.283 NOTPM, it is only about 12% worse than with logs on separate partition.
And to make reference point, I run the same but with all files on RAID10 with regular disks,
the graph is there:

with final result: 2873.783 NOTPM ( about 88% worse than all files on FusionIO)
To summarize

MySQL InnoDB/InnoDB-plugin/XtraDB is not fully able to utilize throughput of FusionIO. XtraDB is doing better job with internal contention, but much more can be done.
Still you can have very impressive performance improvement in IO-intensive or IO-bound workloads. You may want to use InnoDB-plugin or XtraDB to get better results.

Putting logs on separate partition may be good idea, but only if you have possibility to do that. Making special setup for that may be not worth improvements

Entry posted by Vadim |
No comment
Add to: | | | |