Pencarian

Rss Posts

 

 

 

Mastering the art of indexing

Apr 29, 2010

Check out this SlideShare Presentation: I’d have to say that this is the coolest presentation on Indexing and how it relates to INNODB. I have written on this subject many times in the pass but this presentation covers all the bases and does a great job at explaining WHY you should do certain statements over others.More mastering the art of indexingView more presentations from Yoshinori Matsunobu.

MySQL Conference Review – Brian Moon

Apr 17, 2010

I am back home from a good week at the 2010 O’Reilly MySQL Conference & Expo. I had a great time and got to see some old friends I had not seen in a while.

Oracle gave the opening keynote and it went pretty much like I thought it would. Oracle said they will keep MySQL alive. They talked about the new 5.5 release. It was pretty much the same keynote Sun gave last year. Time will tell what Oracle does with MySQL.

The expo hall was sparse. Really sparse. There were a fraction of the booths compared to the past. I don’t know why the vendors did not come. Maybe because they don’t want to compete with Oracle/Sun? In the past you would see HP or Intel have a booth at the conference. But, with Oracle/Sun owning MySQL, why even try. Or maybe they are not allowed? I don’t know. It was just sad.

I did stop by the Maatkit booth and was embarrassed to tell Baron (its creator) I was not already using it. I had heard people talk about it in the past, but never stopped to see what it does. It would have only saved me hours and hours of work over the last few years. Needless to say it is now being installed on our servers. If you use MySQL, just go install Maatkit now and start using it. Don’t be like me. Don’t wait for years, writing the same code over and over to do simple maintenance tasks.

Gearman had a good deal of coverage at the conference. There were three talks and a BoF. All were well attended. Some people seemed to have an AHA! moment where they saw how Gearman could help their architecture. I also got to sit down with the PECL/gearman maintainers and discuss the recent bug I found that is keeping me from using it.

I spoke about Memcached as did others. Again, there was a BoF. It was well attended and people had good questions about it. There seemed to be some FUD going around that memcached is somehow inefficient or not keeping up with technology. However, I have yet to see numbers or anything that proves any of this. They are just wild claims by people that have something to sell. Everyone wants to be the caching company since there is no “Memcached, Inc.”. There is no company in charge. That is a good thing, IMO.

That brings me to my favorite topic for the conference, Drizzle. I wrote about Drizzle here on this blog when it was first announced. At the time MySQL looked like it was moving forward at a good pace. So, I had said that it would only replace MySQL in one part of our stack. However, after what, in my opinion, has been a lack of real change in MySQL, I think I may have changed my mind. Brian Aker echoed this sentiment in his keynote address about Drizzle. He talked about how MySQL AB and later Sun had stopped focusing on the things that made MySQL popular and started trying to be a cheap version of Oracle. That is my interpretation of what he said, not his words.

Why is Drizzle different? Like Memcached and Gearman, there is no “Drizzle, Inc.”. It is an Open Source project that is supported by the community. It is being supported by companies like Rackspace who hired five developers to work on it. The code is kept on Launchpad and is completely open. Anyone can create a branch and work on the code. If your patches are good, they will be merged into the main branch. But, you can keep your own branch going if you want to. Unlike the other forks, Drizzle has started over in both the code and the community. I personally see it as the only way forward. It is not ready today, but my money is on Drizzle five or ten years from now.

Can we get faster expression handling in MySQL

Mar 30, 2010

Andrew from Sphinx continues to work on improving SQL (or SphinxQL) support and now he published benchmarks comparing arithmetic expression handling in Sphinx to one in MySQL. The result ? Sphinx scored 3x to 20x faster. Andrew goes to explain results are not 100% comparable (as we can see in the table results are even different) and some performance can be attributed to Sphinx using different typing for expression. Though I#8217;m wondering how much of Performance difference that really explain. I doubt it is 20x. Andrew goes on to explain there are further optimizations possible for Sphinx such as JIT compilation and expression optimization which he claims can increase the lead even further
Andrew is kind enough to provide explanations and benchmark results though this is far from the first notion of this problem I hear about. Many other companies looking to optimize MySQL problem, especially in analytics space when expressions should be computed over hundreds of millions of rows have expressed their concerns with this part of MySQL execution. I#8217;m wondering if we#8217;re going to see any improvements in this space by Oracle, MariaDB or Drizzle. I think Drizzle is in advantage here #8211; with simplifying MySQL functionality a lot they probably can also simplify expression handling to be faster.

Entry posted by peter |
3 comments
Add to: | | | |

Logging with MySQL – Brian Moon

Mar 24, 2010

I was reading a post by Cassandra is my NoSQL solution but..“. In the post, Dathan explains that he uses Cassandra to store clicks because it can write a lot faster than MySQL. However, he runs into problems with the read speed when he needs to get a range of data back from Cassandra. This is the number one problem I have with NoSQL solutions.

SQL is really good at retrieving a set of data based on a key or range of keys. Whereas NoSQL products are really good at writing things and retrieving one item from storage. When looking at redoing our architecture a few years ago to be more scalable, I had to consider these two issues. For what it is worth, the NoSQL market was not nearly as mature as it is now. So, my choices were much more limited. In the end, we decided to stick with MySQL. It turns out that a primary or unique key lookup on a MySQL/InnoDB table is really fast. It is sort of like having a key/value storage system. And, I can still do range based queries against it.

But, back to Dathan’s problem: clicks. We store clicks at dealnews. Lots of clicks. We also store views. We store more views than we do clicks. So, lots of views and lots of clicks. (Sorry for the vague numbers, company secrets and all. We are a top 1,000 Compete.com site during peak shopping season.) And we do it all in MySQL. And we do it all with one server. I should disclose we are deploying a
second server, but it is more for high availability than processing
power. Like Dathan, we only use about the last 24 hours of data at any given time. There are three keys for us doing logging like this in MySQL.

Use MyISAM

MyISAM supports concurrent inserts. Concurrent inserts means that inserts can add rows to the end of a table while selects are being performed on other parts of the data set. This is exactly the use case for our logging. There are caveats with range queries as pointed out by the MySQL Performance Blog.

Rotating tables

MySQL (and InnoDB in particular) really sucks at deleting rows. Like, really sucks. Deleting causes locks. Bleh. So, we never delete rows from our logging tables. Instead, nightly we rotate the tables. RENAME TABLE is an (near) atomic process in MySQL. So, we just create a new table.

create table clicks_new like clicks;rename table clicks to clicks_2010032500001, clicks_new to clicks;

Tada! We now have an empty table for today’s clicks. We now drop any table with a date stamp that is longer than x days old. Drops are fast, we like drops.

For querying these tables, we use UNION. It works really well. We just issue a SHOW TABLES LIKE ‘clicks%’ and union the query across all the tables. Works like a charm.

Gearman

So, I get a lot of flack at work for my outright lust for Gearman. It is my new duct tape. When you have a scalability problem, there is a good chance you can solve it with Gearman. So, how does this help with logging to MySQL? Well, sometimes, MySQL can become backed up with inserts. It happens to the best of us. So, instead of letting that pile up in our web requests, we let it pile up in Gearman. Instead of having our web scripts write to MySQL directly, we have them fire Gearman background jobs with the logging data in them. The Gearman workers can then write to the MySQL server when it is available. Under normal operating procedure, that is in near real time. But, if the MySQL server does get backed up, the jobs just queue up in Gearman and are processed when the MySQL server is available.

BONUS! Insert Delayed

This is our old trick before we used Gearman. MySQL (MyISAM) has a neat feature where you can have inserts delayed until the table is available. The query is sent to the MySQL server and it answers with success immediately to the client. This means your web script can continue on and not get blocked waiting for the insert. But, MySQL will only queue up so many before it starts erroring out. So, it is not as fool proof as a job processing system like Gearman.

Summary

To log with MySQL:

  • Use MyISAM with concurrent inserts
  • Rotate tables daily and use UNION to query
  • Use delayed inserts with MySQL or a job processing agent like Gearman

Happy logging!

PS: You may be asking, “Brian, what about Partitioned Tables?” I asked myself that before deploying this solution. More importantly, in IRC I asked Brian Aker about MySQL partitioned tables. I am paraphrasing, but he said that if I ever think I might alter that table, I would not trust it with the partitions in MySQL. So, that kind of turned me off of them.

Presenting Cluster tutorial at MySQL UC (and discount code!)

Feb 26, 2010

Together with Geert and Andrew I#8217;ll be teaching the MySQL Cluster tutrial at this year#8217;s MySQL Cluster User Conference #8211; Santa Clara, on April 12th. If you#8217;re interested in using MySQL Cluster but aren#8217;t sure how to get started (or you#8217;ve used it but would like some tips) then this is a great opportunity. Check out the tutorial description.
If you register by 15 March then you get the early-bird price and if you use this #8216;friend of a speaker#8217; code then you get an additional 25% off: mys10fsp
mys10fsp

innodb_file_per_table

Feb 10, 2010

Recently I attempted to use this parameter in one of our InnoDB table and had an experience to make a note of. So this is what actually happened.As we know InnoDB writes all the table information into one tablespace file ibdata1.Obviously this lead to a disk space issue, since the ibdata1 file grew to 90G+ and it was not possible to defragment tablespace using the Alter method (Infact is is hardly possible to do in a production environment, since it leads to a huge downtime.)So I decided to go for innodb_file_per_table on our other production database server which was being configured from scratch.Now this database had around 15 tables and it had very high inserts/deletes/updates. For few days, things went ok. But in few days, I started getting complains from development team regarding performance on the tables and queue being backed up. Apparently whenever a delete / update was made on a table, all the other tables slow down and queries started taking more than 4 to 5 secs to complete.Dug around, ran some more optimisations , flush tables but not much difference. I was able to get 660 to 661 qps tops.On digging further and discussing with developers, we figured out that innodb_file_per_table is the culprit.InnoDB uses fsync() instead of fdatasync() to flush both data and log files on non-Windows system. fdatasync() is like fsync(), except it flushes only files data, not its metadata (last modified time etc.). Therefor fsync() can cause more IO. innodb_file_per_table causes each file to be fsync()ed separately which means writes to multiple tables can’t be combined into a single IO operation. This requires InnoDB to perform higher total number of fsync() operations, leading to a IO contention in my case. Finally we had to migrate the database to another machine without innodb_file_per_table since it is also not possible to undo the existing tables for which files are created per table.As quoted in MySQL documentation : The –innodb_file_per_table option affects only table creation, not access to existing tables. If you start the server with this option, new tables are created using .ibd files, but you can still access tables that exist in the shared tablespace. If you start the server without this option, new tables are created in the shared tablespace, but you can still access any tables that were created using multiple tablespaces.Once we migrated the performance started going up , almost upto 1500 qps.So moral of the story : Better test the parameter innodb_file_per_table in a test environment, with the load higher than expected on production. I have read many blogs and many people are benefited by the use of this parametere, but as a general practice I would not favor its use. Sure it can be useful to reclaim some disk space, but I dont think it would be worth at the cost of performance.

328 Thousand Queries Per Second

Feb 10, 2010

Well, I did it again. Two days ago I improved my server’s QPS from 20k to 109k.
Today, after upgrading the MySQL version which allowed me to allocate more memory to the key buffer size (12 Gb), I managed to get it to 328k QPS.

The server isnbsp;similarnbsp;to the one from the previous post. Its a 4 year old, but quite meaty server.

The server ran triggers and stored functions that I wrote to help connect different data sets together. Basically reference another table using fairly complicated search logic, get an ID back and store it. The code was then tweaked to be as fast and light as possible.

This time I ran:

99 million row table to reference a 101 million row table
45 million row table to reference the same 101 million row table
35 million row table (same reference)
29 million row table (same reference)
28 million row table (same reference)

All at the same time..

The server hovered consistently around the 300k QPS mark, with a here and there high of 322-328k.
Also, I had to modify the code of Mtop (the application used to monitor MySQL in the picture) to ask for Queries and not Questions from MySQL after I upgraded it.

I’m very happy with the result. I can’t wait to get my hands on a stronger server to try to get a higher number, and in the meantime I will try to keep tweaking the settings on this server to hopefully get it from a load of 4 to 10.

My sessions at the MySQL Conference amp; Expo 2010

Feb 10, 2010

A preliminary schedule is now available for the MySQL Conference Expo 2010 (in Santa Clara, California, USA). I have two talks and a tutorial, currently scheduled as follows:MySQL Cluster Tutorial, Mon 08:30 on 12 Apr 2010, together with Andrew Hutchings and Andrew MorganMySQL Cluster: An Introduction, Tue 11:55 on 13 Apr 2010Connecting MySQL and Python, Tue 15:05 on 13 Apr 2010Schedule might change a bit, so I suggest keeping an eye on the conference website.And please remember: we might have changed colors a few times past 2 years, but we are still the nice and friendly people we were back in MySQL AB!

Introducing percona-patches for 5.1

Feb 10, 2010

Our patches for 5.0 have attracted significant interest.? You can read about SecondLife#8217;s experience here, as well as what Flickr had to say on their blog.? The main improvements come in both performance gains and improvements to diagnostics (such as the improvements to the slow log output, and INDEX_STATISTICS).
Despite having many requests to port these patches to 5.1, we simply haven#8217;t had the bandwidth as our main focus has been on developing XtraDB and XtraBackup.? Thankfully a customer (who prefers to stay unnamed) as stood up and sponsored the work to move the patches to 5.1.
To refresh, the most interesting patches are:

Performance patches for InnoDB ?. Although many patches are present in XtraDB / InnoDB-plugin, the RC status of plugin does not allow to install it on product for some customer#8217;s policies.
Important fixes are:

- InnoDB IO threads
- Adaptive checkpointing
- Buffer pool mutex split
- Reimplemented read-write locks

Diagnostic patches.

#8211; We provide much more statistics in slow.log, i.e. execution plan, InnoDB timing, profiling info
- Userstat patch

Different patches to help with day to day usage of MySQL ?

Two new features which not available for 5.0:

In slow.log for Stored Procedure call you can see profiling for each individial query from this procedure, not just call storproc()
With userstat you can get additional THREADS_STATISTICS which show similar information to USER/CLIENT_STATISTICS but per THREAD granularity (it#8217;s useful if you have connection pool)

On this stage the patches are available only in source code, you
can get them from Launchpad https://code.launchpad.net/~percona-dev/percona-patches/5.1.43.? Binaries are also on the way, and will be ready soon. We are running intensive stress testing loads on them to provide stable and quality packages.
And to finalize are results for tpce-like benchmark, where I compare MySQL-5.1.43 vs percona-5.1.43.
The results made for TPCE configuration with 2000 customers and 300 tradedays and 16 concurrent users on our R900 server. The dataset is about 25GB, fully fitting into buffer_pool, so disk does not really matter, but data was stored on FusionIO 320GB MLC card.
On chart with results I show amount of TradeResults transactions per 10 sec during 3600 session (more is better)

As you see with percona patches you can get just about 10x improvement.
Yeah, that sounds too cool, but let me explain where difference comes from.
As I mentioned in tpce workload details the load is very SELECT intensive and these SELECTS are mainly scans by secondary keys ( not Primary Keys), so it hits problems in InnoDB rw-lock implementations and in buffer_pool mutex contention, which alredy fixed in percona-patches ( and in XtraDB and InnoDB-plugin also).
So you are welcome to try it!

Entry posted by Vadim |
2 comments
Add to: | | | |

Dedy Hariyadi: Buku MySQL 5 Gratis dari Achmad Solichin

Feb 08, 2010


tanya siapa?

Kabar yang sangat menggembirakan bagi yang sering bersinggungan dengan database MySQL. Pak Achmad Solichin yang saat ini bekerja sebagai dosen dan merangkap sebagai kepala?Laboratorium Komputer Universitas Budi Luhur membuat buku dengan judul MySQL 5, dari Pemula hingga Mahir. Buku setebal 119 halaman termasuk halaman depan memberikan gambaran tentang penggunaan MySQL dari tahap dasar sampai tahap lanjutan. Buku MySQL 5, dari Pemula hingga Mahir ini terdiri dari 11 bab. Saya fikir buku ini cocok untuk anda miliki bagi yang sering menggunakan MySQL sebagai database server.
Tanpa fikir panjang saya pun langsung mengunduh buku tersebut. Sebelum mengunduh anda diwajibkan untuk mendaftar terlebih dahulu pada blog Achmad Solichin yang beralamat di http://achmatim.net. Setelah mendaftar silakan cek pada email anda, selanjutnya login menggunakan password yang dikirim pada email anda. Jika anda berhasil?login langkah selanjutnya mengunduh buku?MySQL 5, dari Pemula hingga Mahir?pada http://achmatim.net/download/21/.?Untuk lebih lengkapnya anda dapat langsung merujuk pada http://achmatim.net/2010/01/30/buku-gratis-mysql-5-dari-pemula-hingga-mahir/. Selamat belajar MySQL 5.