Pencarian

Rss Posts

 

 

 

Making ?Insert Ignore? Fast, by Avoiding Disk Seeks

Jul 06, 2010


In my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use ?replace into? or ?insert ignore? over normal inserts, because the semantics of these statements do NOT require disk seeks. In my post last week, I explained how the command ?replace into? can be fast with TokuDB’s fractal trees. Today, I explain how “insert ignore” can be fast, using a strategy that is very similar to what we do with “replace into”.

The semantics of “insert ignore” are similar to that of “replace into”:

if the primary (or unique) key does not exist: insert the new row
if the primary (or unique) key does exist: do nothing

B-trees have the same problem with “insert ignore” that they have with “replace into”. They perform a lookup of the primary key, incurring a disk seek. We have already shown how fractal trees do not incur this disk seek for “replace into”, so let’s see how we can avoid disk seeks with “insert ignore”.

The only difference with “replace into” is when the primary (or unique) key exists, instead of overwriting the old row with the new row, we disregard the new row. So, all we need to do is tweak our tombstone messaging scheme (that we use for deletes and “replace into”) so that when “insert ignore” commands do not overwrite old rows with new rows. Similar to deletes and replace into, with this scheme, “insert ignore? can be two orders of magnitude faster than insertions into a B-tree.

Here is what we do. We insert a message into the fractal tree, with a new message “ii”, to signify that we are doing an “insert ignore”. The only difference between this message and the normal “i” message for insertions is what we do on queries and merges. On queries, if the message is an “ii”, then the value in the LOWER node is read, and not the higher node. On merges, if the higher node has a message of “ii”, the value in the LOWER node takes precedence over the value in the higher node.

Let’s look at an example that is similar to what we looked at for “replace into”:

create table foo (a int, b int, primary key (a));

Suppose the fractal tree for this table looks as follows:

-

- -

- – - -

….

(i (1,1)) (i (2,2)) (i (3,3)) (i (4,4)) … (i (1000,1000)) … (i (2^32, 2^32))

The ?i? stands for insertion message. Now suppose we do:

insert ignore into foo values (1000, 1001).

With fractal trees, we insert (ii (1000,1001)) into the top node. The tree then looks as such:

(ii (1000,1001))

- -

- – - -

….

(i (1,1)) (i (2,2)) (i (3,3)) (i (4,4)) … (i (2^32, 2^32))

So upon querying the key ?1000′, a cursor notices that (1000,1001) has a message of “ii”. If it finds another value for the key 1000 in a lower node, it reads that value, otherwise, it reads (1000,1001). Because (1000,1000) is located in a lower node, the cursor returns (1000,1000) to the user. On merges, the message in the lower node, (1000,1000) overwrites the message in the higher node, (1000,1001).

While “insert ignore” can be fast, there are caveats (indexes, triggers, replication), just as there are with “replace into”. In a future posting, I will get into some of them.

Open source or Open Core or Commercial… Does it matter??

Jul 06, 2010

This is my 2 cents in the Open Source vs. Open Code vs. Commercial debate. And it’s a long one…Maybe some of you reading this are offended already, but bear with me, I’ll get there. The way I see the Open Source model, having worked with OSS at MySQL for 6+ years now, is that this is a great way of developing software. Not brilliant, but great, but I’ll get there also.Users of OSS, in my mind, are OSS users for one or more of three reasons:It’s Open – The users using OSS for this reason believes that being open is in and of itself a great thing, enough so to use OSS even when non-OSS is less expensive and/or better.Cost – OSS is typically less expensive than non-OSS, and this is the reason these users get here. There are then 2 subgroups here, one that represents users that just aren’t funded at all, many websites are in this category, the users building Joomla and Drupal sites and the like, I think you get the point. The second group are those that have funding, but would rather spend their money on luxury items and a new car than of a software license.Technology – This is a category that many think they are in, but I don’t think this is mostly not the case. These are the users on a unique piece of software that is either not existing as non-OSS, or where the OSS variations are so much more powerful than the commercial counterparts. In all honesty, although I am aware these cases exist, I do not think that that there are THAT many. But there are those there Cost + Technology plays in, i.e. even though a commercial option exists, it is just too expensive.OK, so now we know (what I think) are the reasons that Open Source exists, is in wide use and is growing. For the first group, the ones that see Openness as a good enough reason in and of itself, I think this is a smaller group of the total number of users. But that openness is not really, in my mind, well defined.If Oracle would take the sourcecode for the Oracle database and release it under GPL, then it would be Open in most peoples mind I guess. But that piece of code is massive, and few people outside the Oracle developers would have the time, resources and knowledge to understand, extend and modify it, so what how Open is it really then? I think to an extent MySQL is case in point here, although it is GPL licenced and the sourcecode is open and free, there are few outside contributors, as compared to the large number of users. I think most users building a website using Drupal cares much about MySQL being open or closed or whatever. I think most of them care about the cost being low. And one sure could argue that low cost comes from the source being open, that is probably true to a large extent, but that doesn’t mean that commercial software or non-OSS also can be low cost (shareware for example).What this boils down to, in my mind then, is that although we all enjoy the low cost of OSS, less care about it really being open and if so how, and more about it being inexpensive. And I say that as someone who doesn’t actually mind reading sourcecode, and this is something I do on a regular basis, read and sometimes tinker with the MySQL source. But I really do not think that I am typical here.And all this is not to say that there is something wrong with OSS, quite the opposite, but often it is more about cost than actual openness. And this is worrying, but there are exceptions. Linux is one such example, although the kernel is since long ago developed by a rather small closely knit community, utilities and programs surrounding and extending the kernel, such as modules, the GNU packages and that stuff, are developed separately from this group, by individuals or groups of them with specific needs or knowledge. The key here is the open interfaces. You don’t have to understand every aspect of the Linux kernel to develop a well working and efficient utility or even kernel module.But I do not think that even Linux is developed enough in this area as I would like it to see. To me, who really believe that Open Source Software is a good thing and an excellent model for development, I would like to see an even more “contributor friendly” architecture. I think Unix got a long way here in it’s early days, with the principles of simple and easy to use APIs (like pipes) and programs could do one hing and do it well. But those days are gone now, that was 30 – 40 years ago or so, and we need to develop things, and I haven’t seen that happening. FSF and GPL and all that defines to extent the framework for distributed software in terms of legalities and many other aspects, but there is little help in how to make the software that can now in theory be read by anyone truly open. If we assume that Oracle made their sourcecode GPL, but did not provide any documentation on how the sourcecode works (which is not a requirement of GPL) and removed all the sourcecode comments (which is not a requirement either), how open would that be, really? I do not think it would help much in terms of openness, to be honest. Sure, it would be open for someone who wanted to hickjack some intricate part of the Oracle sourcecode, but that would need a large investment in investigating the code, so this would probably only be reasonable for a some other large corporate entity. But the code would really be open for the rest of us.Instead of discussing Open Source vs. Open Code vs. Commercial, I think it would be much more interesting to discuss how we develop software that truly is contributor friendly. Code that is easy to add to, code that lives in an environment where changes and additions can easily be made, reviewed and tested. Code that allows itself to be built by anyone, anywhere so that I can test my code on a 16 CPU x86 box somewhere in australia, provided by a nice person I don’t even know, although I am located in Sweden. Code that is required to have proper commenting, proper structured APIs and natural points for injecting new and changed code. And above all, code that lets someone with excellent domain knowledge (in for example indexing algorithms, GIS, text search, APIs, disk management etc., if we talk about databases) to write code and test, without being a database expert or even knowing the inner details of the system he/she writes code in, and not being brilliant developers themselves.Is this a dream? Maybe, Is Drizzle the answer (I know someone will suggest that), and I say no, it’s just not enough, it’s just more of the same (plugins), it doesn’t really provide anything new in how we develop things or how those developments are published and distributed.In short, I think the Open Source vs. Open Code debate is just nitpicking and boring. Neither model just isn’t good enough to be truly friendly and open for contribution. The difference lies more in how and with what we we can commercialize our efforts, which is a valid concern, but my main concern, as you can see, is that I believe that neither model is truly open. And I would rather see a truly contributor friendly Open Code model than the current state of affairs./Karlsson

Upgrading to MySQL 5.1

Jul 01, 2010

We have been using MySQL 5.1 on a few servers for which partitioning is a much better way to purge old data than delete. We have been working to upgrade more servers despite claims that some of us may have made in the past about using MySQL 4.0 or 5.0 forever.

We spent a lot of time to confirm that MySQL 5.1 was stable and performant using benchmarks and our production workload. mk-upgrade from Maatkit was one of the tools we used. Concurrent dump/reload tests were done to measure performance and check for data drift after reload. A custom tool that replays production workload was run to compare performance between MySQL 5.0 and 5.1. We started with MySQL 5.1.38 and now are at MySQL 5.1.47 with several backports for bugs that will be fixed in more recent 5.1 releases or in 5.5.

We found a few serious bugs in MySQL 5.1 during this process. We fixed some of the bugs, worked with MySQL support to debug some of them and waited for MySQL to fix many others. MySQL support and developers were a huge help. It is great to have so much access to experts. MySQL has been getting things done at an amazing rate this year.

I am excited about MySQL 5.1 and 5.5. With a few recent changes to the Facebook patch we have been able to increase peak QPS by more than 2X and peak IOPs by more than 3X using benchmarks. There are more improvements to be done. Whether or not we match the benchmark results in production, I much prefer an RDBMS that can exceed 100,000 QPS and IOPs than one that is saturated at 10,000. Any of the changes we make for 5.1 will look even better with MySQL 5.5 given support for multiple InnoDB buffer pool instances and some of the changes above the storage engine layer that aren’t easy to describe in a few sentences.

Benchmarking MySQL ACID performance with SysBench

Jun 20, 2010

A couple of question I get a lot from MySQL customers is “how will this hardware upgrade improve my transactions per second (TPS)” and “what level of TPS will MySQL perform on this hardware if I’m running ACID settings?” Running sysbench against MySQL with different values for per-thread and global memory buffer sizes, ACID settings, and other settings gives me concrete values to bring to the customer to show the impact that more RAM, faster CPUs, faster disks, or cnf changes have on the server. Here are some examples for a common question: “If I’m using full ACID settings vs non-ACID settings what performance am I going to get from this server?”
Let’s find out by running sysbench with the following settings (most are self explanatory – if not the man page can explain them):

sysbench –test=oltp –db-driver=mysql –oltp-table-size=1000000 –mysql-engine-trx=yes –oltp-test-mode=complex –oltp-read-only=off –oltp-dist-type=special –max-requests=0 –num-threads=8 –max-time=120 –init-rng=on run

MySQL Settings:
In the first test MySQL is set to the following ACID related settings. This will give us results for TPS performance without full ACID compliance – very common settings on a server that is handling blogs, ad serving, general business websites, and other roles where full ACID is not required and performance is valued over the benefits of full ACID. These are important settings when we look at the difference in performance when we change to full ACID in the second test.

innodb_flush_log_at_trx_commit = 0
sync_binlog=0
transaction-isolation=REPEATABLE-READ

System configuration and InnoDB buffer pool size:

XEON E5345 Series 2.33ghz 8-core, 16GB RAM, Local SATA 7.2K disks
innodb_buffer_pool_size = 10G

Full result set from sysbench:
Summary OLTP test statistics:

queries performed:
transactions: ? ? ? ? ? ? ? ? ? ? ? ?172426 (1436.83 per sec.)
read/write requests: ? ? ? ? ? ? ? ? 3276664 (27304.51 per sec.)
other operations: ? ? ? ? ? ? ? ? ? ?344882 (2873.91 per sec.)

Non-ACID results:
We can simplify the results by looking at the following TPS results for this non-ACID test:

transactions: ? ? ? ? ? ? ? ? ? ? ? ?172426 (1436.83 per sec.)

Full ACID results:
Let’s go ahead and run the test again with different ACID settings. This will give us the TPS results for full ACID compliance:

innodb_flush_log_at_trx_commit = 1
sync_binlog=1
transaction-isolation=REPEATABLE-READ

We get the following results for TPS:

transactions: ? ? ? ? ? ? ? ? ? ? 3197 ? (26.58 per sec.)
read/write requests: ? ? ? ? ? ? ? ? 60743 ?(505.04 per sec.)
other operations: ? ? ? ? ? ? ? ? ? ?6394 ? (53.16 per sec.)

Final Results:
So as you can see the difference between full ACID settings and not (on the same server with only those values on the cnf being changed) results in a huge difference in performance on this standard database server. We can now hand this data to the customer and they will know what impact the settings will have on their application’s performance and what to expect when running full ACID vs non-ACID.
More info on using sysbench here:?http://sysbench.sourceforge.net

Want to understand MySQL indexes?

Jun 15, 2010

I have introduced some new people to MySQL recently and had to back track the years to figure out how I learned what I learned about MySQL indexes.  A quick way to get up to speed about MySQL indexes is these three podcasts by Sheeri Cabral.OurSQL Episode 13: The Nitty Gritty of IndexesOurSQL Episode 17: Hashing it outOurSQL Episode 18: De-myth-tifying IndexesThose three episodes do a good job of explaining how indexes work so that you have a better understanding of how MySQL indexes find your data.

Datetime & Timestamp manipulation / migration explained

Jun 15, 2010

Are you doing some datetime manipulation or maybe you are migrating from some database technology to MySQL or possibly using milliseconds?
Here is an example on how to go about it:
Say you have the following date: MAR 16 2008 09:12:51:893AM
SELECT DATE_FORMAT(STR_TO_DATE(‘MAR 16 2008 09:12:51:893AM’,'%M %d %Y %h:%i:%s:%f%p’),’%Y%m%d%k%i%s.%f’); –> 2008031691251.893000

What if its PM rather than AM
SELECT DATE_FORMAT(STR_TO_DATE(‘MAR 16 2008 09:12:51:893PM’,'%M %d %Y %h:%i:%s:%f%p’),’%Y%m%d%k%i%s.%f’); –> 20080316211251.893000

Ok so this is just simple string manipulation where:
%M is the month name
%d is day number
%Y is the year
%h is the hour
%i is the minute
%s is the second
%f is the microsecond
%p is the period: ante or post meridiem
In the DATE_FORMAT part we se a %k which is in 24hr format in order to loose the period.
A more detailed list is found here
Here is a demo:
mysql Tue Jun 15 12:32:37 2010 > CREATE TABLE test.abc(a DECIMAL(17,3)) ENGINE=MYISAM;
Query OK, 0 rows affected (0.03 sec)
mysql Tue Jun 15 12:32:45 2010 > INSERT INTO abc VALUES ( DATE_FORMAT(STR_TO_DATE(‘MAR 16 2008 09:12:51:893PM’,'%M %d %Y %h:%i:%s:%f%p’),’%Y%m%d%k%i%s.%f’) );
Query OK, 1 row affected (0.01 sec)
mysql Tue Jun 15 12:32:51 2010 > SELECT * FROM abc;
+——————–+
| a ? ? ? ? ? ? ? ? ?|
+——————–+
| 20080316211251.893 |
+——————–+
1 row in set (0.00 sec)

mysql Tue Jun 15 12:32:56 2010 > SELECT TIMESTAMP(a) FROM abc;
+—————————-+
| TIMESTAMP(a) ? ? ? ? ? ? ? |
+—————————-+
| 2008-03-16 21:12:51.893000 |
+—————————-+
1 row in set (0.00 sec)

MySQL Cluster Powers Leading Document Management Web Service

May 24, 2010

A new customer case-study is available for download from http://www.mysql.com/why-mysql/case-studies/mysql_cs-cluster_docudesk_WebServices.php
The DocQ web service eliminates the limitations of sharing physical documents by offering a complete paperless business solution; providing a single place where customers can manage, archive, and send their important documents. DocQ supports secure business transactions and the services to store, edit, collaborate, and publish business documents.

The database needed to deliver the high levels of write throughput, low latency responsiveness and continuous availability demanded by the service
A sharded, multi-master MySQL solution with memcached was rejected due to the complexity of integration and management
MySQL Cluster was selected as it met all of the requirements of the service with one, integrated solution out of the box
MySQL Cluster is handling on average 1 million queries per day across both in-memory and disk-based tables, with the database growing at up to 2% daily
MySQL Cluster handles document metadata and text, PHP session state, ACLs, job queues and tracking of document actions for billing

Using the row buffer in Drizzle (and MySQL)

May 24, 2010

Here#8217;s another bit of the API you may need to use in your storage engine (it also seems to be a rather unknown. I believe the only place where this has really been documented is ha_ndbcluster.cc, so here goes#8230;.
Drizzle (through inheritance from MySQL) has its own (in memory) row format (it could be said that it has several, but we#8217;ll ignore that for the moment for sanity). This is used inside the server for a number of things. When writing a Storage Engine all you really need to know is that you#8217;re expected to write these into your engine and return them from your engine.
The row buffer format itself is kind-of documented (in that it#8217;s mentioned in the MySQL Internals documentation) but everywhere that#8217;s ever pointed to makes the (big) assumption that you#8217;re going to be implementing an engine that just uses a more compact variant of the in-memory row format. The notable exception is the CSV engine, which only ever cares about textual representations of data (calling val_str() on a Field is pretty simple).
The basic layout is a NULL bitmap plus the data for each non-null column:
Except that the NULL bitmap is byte aligned. So in the above diagram, with four nullable columns, it would actually be padded out to 1 byte:

Each column is stored in a type-specific way.
Each Table (an instance of an open table which a Cursor is used to iterate over parts of) has two row buffers in it: record[0] and record[1]. For the most part, the Cursor implementation for your Storage Engine only ever has to deal with record[0]. However, sometimes you may be asked to read a row into record[1], so your engine must deal with that too.
A Row (no, there#8217;s no object for that#8230; you just get a pointer to somewhere in memory) is made up of Fields (as in Field objects). It#8217;s really made up of lots of things, but if you#8217;re dealing with the row format, a row is made up of fields. The Field objects let you get the value out of a row in a number of ways. For an integer column, you can call Field::val_int() to get the value as an integer, or you can call val_str() to get it as a string (this is what the CSV engine does, just calls val_str() on each Field).
The Field objects are not part of a row in any way. They instead have a pointer to record[0] stored in them. This doesn#8217;t help you if you need to access record[1] (because that can be passed into your Cursor methods). Although the buffer passed into various Cursor methods is usually record[0] it is not always record[0]. How do you use the Field objects to access fields in the row buffer then? The answer is the Field::move_field_offset(ptrdiff_t) method. Here is how you can use it in your code:

ptrdiff_t row_offset= buf – table-gt;record[0];
(**field).move_field_offset(row_offset);
(do things with field)
(**field).move_field_offset(-row_offset);

Yes, this API completely sucks and is very easy to misuse and abuse #8211; especially in error handling cases. We#8217;re currently discussing some alternatives for Drizzle.
This blog post (but not the whole blog) is published under the?Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).

Rilis Perdana Keuangan Guyub versi KG-20100512-1

May 12, 2010

Dengan mengucap Bismillahirrahmanirrahim
Hari ini 12 Mei 2010, kami merilis versi perdana dari Keuangan Guyub – Aplikasi Keuangan Open Source untuk UKM Jasa.
Bagi teman-teman yang tertarik mencoba bisa download langsung versi perdana di http://keuangan-guyub.googlecode.com/files/KG-20100512-1.zip.
Untuk cara instalasi oleh bisa dilihat di http://code.google.com/p/keuangan-guyub/wiki/PanduanSingkatInstalasi.
Petunjuk penggunaan bisa di download di http://keuangan-guyub.googlecode.com/files/DokumentasiPengguna-KG-20100512-1.pdf.
Dan kami juga sangat mengharapkan saran, masukan, kritikan dan kontribusi [...]

Quick notes: Monty Program Group Blog; Rename Maria

Apr 30, 2010

A couple of things:

There is now a Monty Program Group Blog. Its brand new, and in a company where most people spend time on writing tonnes of code, expect posts to be sporadic, but of great technical nature. We#8217;ll also cover things like events, conferences, etc. i.e. where can you meet a Monty Program person. Do subscribe to our RSS or ATOM feed.
At the MySQL Conference recently, Monty announced the rename the Maria engine contest. The competition is still running, and the winner gets a System76 Meerkat NetTop. Some interesting names have already shown up. Monty says, #8220;Make it tell a story#8221;.

Related posts:Monty speaks about Maria
Trying to reliably make MyISAM crash; Maria is sturdy as
Monty is the world#8217;s first MySQL fellow