Pencarian

Rss Posts

 

 

 

Berita pada kategori ‘Basisdata’

Want to understand MySQL indexes?

Jun 15, 2010

I have introduced some new people to MySQL recently and had to back track the years to figure out how I learned what I learned about MySQL indexes.  A quick way to get up to speed about MySQL indexes is these three podcasts by Sheeri Cabral.OurSQL Episode 13: The Nitty Gritty of IndexesOurSQL Episode 17: Hashing it outOurSQL Episode 18: De-myth-tifying IndexesThose three episodes do a good job of explaining how indexes work so that you have a better understanding of how MySQL indexes find your data.

Datetime & Timestamp manipulation / migration explained

Jun 15, 2010

Are you doing some datetime manipulation or maybe you are migrating from some database technology to MySQL or possibly using milliseconds?
Here is an example on how to go about it:
Say you have the following date: MAR 16 2008 09:12:51:893AM
SELECT DATE_FORMAT(STR_TO_DATE(‘MAR 16 2008 09:12:51:893AM’,'%M %d %Y %h:%i:%s:%f%p’),’%Y%m%d%k%i%s.%f’); –> 2008031691251.893000

What if its PM rather than AM
SELECT DATE_FORMAT(STR_TO_DATE(‘MAR 16 2008 09:12:51:893PM’,'%M %d %Y %h:%i:%s:%f%p’),’%Y%m%d%k%i%s.%f’); –> 20080316211251.893000

Ok so this is just simple string manipulation where:
%M is the month name
%d is day number
%Y is the year
%h is the hour
%i is the minute
%s is the second
%f is the microsecond
%p is the period: ante or post meridiem
In the DATE_FORMAT part we se a %k which is in 24hr format in order to loose the period.
A more detailed list is found here
Here is a demo:
mysql Tue Jun 15 12:32:37 2010 > CREATE TABLE test.abc(a DECIMAL(17,3)) ENGINE=MYISAM;
Query OK, 0 rows affected (0.03 sec)
mysql Tue Jun 15 12:32:45 2010 > INSERT INTO abc VALUES ( DATE_FORMAT(STR_TO_DATE(‘MAR 16 2008 09:12:51:893PM’,'%M %d %Y %h:%i:%s:%f%p’),’%Y%m%d%k%i%s.%f’) );
Query OK, 1 row affected (0.01 sec)
mysql Tue Jun 15 12:32:51 2010 > SELECT * FROM abc;
+——————–+
| a ? ? ? ? ? ? ? ? ?|
+——————–+
| 20080316211251.893 |
+——————–+
1 row in set (0.00 sec)

mysql Tue Jun 15 12:32:56 2010 > SELECT TIMESTAMP(a) FROM abc;
+—————————-+
| TIMESTAMP(a) ? ? ? ? ? ? ? |
+—————————-+
| 2008-03-16 21:12:51.893000 |
+—————————-+
1 row in set (0.00 sec)

MySQL Cluster Powers Leading Document Management Web Service

May 24, 2010

A new customer case-study is available for download from http://www.mysql.com/why-mysql/case-studies/mysql_cs-cluster_docudesk_WebServices.php
The DocQ web service eliminates the limitations of sharing physical documents by offering a complete paperless business solution; providing a single place where customers can manage, archive, and send their important documents. DocQ supports secure business transactions and the services to store, edit, collaborate, and publish business documents.

The database needed to deliver the high levels of write throughput, low latency responsiveness and continuous availability demanded by the service
A sharded, multi-master MySQL solution with memcached was rejected due to the complexity of integration and management
MySQL Cluster was selected as it met all of the requirements of the service with one, integrated solution out of the box
MySQL Cluster is handling on average 1 million queries per day across both in-memory and disk-based tables, with the database growing at up to 2% daily
MySQL Cluster handles document metadata and text, PHP session state, ACLs, job queues and tracking of document actions for billing

Using the row buffer in Drizzle (and MySQL)

May 24, 2010

Here#8217;s another bit of the API you may need to use in your storage engine (it also seems to be a rather unknown. I believe the only place where this has really been documented is ha_ndbcluster.cc, so here goes#8230;.
Drizzle (through inheritance from MySQL) has its own (in memory) row format (it could be said that it has several, but we#8217;ll ignore that for the moment for sanity). This is used inside the server for a number of things. When writing a Storage Engine all you really need to know is that you#8217;re expected to write these into your engine and return them from your engine.
The row buffer format itself is kind-of documented (in that it#8217;s mentioned in the MySQL Internals documentation) but everywhere that#8217;s ever pointed to makes the (big) assumption that you#8217;re going to be implementing an engine that just uses a more compact variant of the in-memory row format. The notable exception is the CSV engine, which only ever cares about textual representations of data (calling val_str() on a Field is pretty simple).
The basic layout is a NULL bitmap plus the data for each non-null column:
Except that the NULL bitmap is byte aligned. So in the above diagram, with four nullable columns, it would actually be padded out to 1 byte:

Each column is stored in a type-specific way.
Each Table (an instance of an open table which a Cursor is used to iterate over parts of) has two row buffers in it: record[0] and record[1]. For the most part, the Cursor implementation for your Storage Engine only ever has to deal with record[0]. However, sometimes you may be asked to read a row into record[1], so your engine must deal with that too.
A Row (no, there#8217;s no object for that#8230; you just get a pointer to somewhere in memory) is made up of Fields (as in Field objects). It#8217;s really made up of lots of things, but if you#8217;re dealing with the row format, a row is made up of fields. The Field objects let you get the value out of a row in a number of ways. For an integer column, you can call Field::val_int() to get the value as an integer, or you can call val_str() to get it as a string (this is what the CSV engine does, just calls val_str() on each Field).
The Field objects are not part of a row in any way. They instead have a pointer to record[0] stored in them. This doesn#8217;t help you if you need to access record[1] (because that can be passed into your Cursor methods). Although the buffer passed into various Cursor methods is usually record[0] it is not always record[0]. How do you use the Field objects to access fields in the row buffer then? The answer is the Field::move_field_offset(ptrdiff_t) method. Here is how you can use it in your code:

ptrdiff_t row_offset= buf – table-gt;record[0];
(**field).move_field_offset(row_offset);
(do things with field)
(**field).move_field_offset(-row_offset);

Yes, this API completely sucks and is very easy to misuse and abuse #8211; especially in error handling cases. We#8217;re currently discussing some alternatives for Drizzle.
This blog post (but not the whole blog) is published under the?Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).

PostgreSQL 8.4.4

May 17, 2010

8.4.4 is the latest release in the 8.4 series

Quick notes: Monty Program Group Blog; Rename Maria

Apr 30, 2010

A couple of things:

There is now a Monty Program Group Blog. Its brand new, and in a company where most people spend time on writing tonnes of code, expect posts to be sporadic, but of great technical nature. We#8217;ll also cover things like events, conferences, etc. i.e. where can you meet a Monty Program person. Do subscribe to our RSS or ATOM feed.
At the MySQL Conference recently, Monty announced the rename the Maria engine contest. The competition is still running, and the winner gets a System76 Meerkat NetTop. Some interesting names have already shown up. Monty says, #8220;Make it tell a story#8221;.

Related posts:Monty speaks about Maria
Trying to reliably make MyISAM crash; Maria is sturdy as
Monty is the world#8217;s first MySQL fellow

Mastering the art of indexing

Apr 29, 2010

Check out this SlideShare Presentation: I’d have to say that this is the coolest presentation on Indexing and how it relates to INNODB. I have written on this subject many times in the pass but this presentation covers all the bases and does a great job at explaining WHY you should do certain statements over others.More mastering the art of indexingView more presentations from Yoshinori Matsunobu.

MySQL Conference Review – Brian Moon

Apr 17, 2010

I am back home from a good week at the 2010 O’Reilly MySQL Conference & Expo. I had a great time and got to see some old friends I had not seen in a while.

Oracle gave the opening keynote and it went pretty much like I thought it would. Oracle said they will keep MySQL alive. They talked about the new 5.5 release. It was pretty much the same keynote Sun gave last year. Time will tell what Oracle does with MySQL.

The expo hall was sparse. Really sparse. There were a fraction of the booths compared to the past. I don’t know why the vendors did not come. Maybe because they don’t want to compete with Oracle/Sun? In the past you would see HP or Intel have a booth at the conference. But, with Oracle/Sun owning MySQL, why even try. Or maybe they are not allowed? I don’t know. It was just sad.

I did stop by the Maatkit booth and was embarrassed to tell Baron (its creator) I was not already using it. I had heard people talk about it in the past, but never stopped to see what it does. It would have only saved me hours and hours of work over the last few years. Needless to say it is now being installed on our servers. If you use MySQL, just go install Maatkit now and start using it. Don’t be like me. Don’t wait for years, writing the same code over and over to do simple maintenance tasks.

Gearman had a good deal of coverage at the conference. There were three talks and a BoF. All were well attended. Some people seemed to have an AHA! moment where they saw how Gearman could help their architecture. I also got to sit down with the PECL/gearman maintainers and discuss the recent bug I found that is keeping me from using it.

I spoke about Memcached as did others. Again, there was a BoF. It was well attended and people had good questions about it. There seemed to be some FUD going around that memcached is somehow inefficient or not keeping up with technology. However, I have yet to see numbers or anything that proves any of this. They are just wild claims by people that have something to sell. Everyone wants to be the caching company since there is no “Memcached, Inc.”. There is no company in charge. That is a good thing, IMO.

That brings me to my favorite topic for the conference, Drizzle. I wrote about Drizzle here on this blog when it was first announced. At the time MySQL looked like it was moving forward at a good pace. So, I had said that it would only replace MySQL in one part of our stack. However, after what, in my opinion, has been a lack of real change in MySQL, I think I may have changed my mind. Brian Aker echoed this sentiment in his keynote address about Drizzle. He talked about how MySQL AB and later Sun had stopped focusing on the things that made MySQL popular and started trying to be a cheap version of Oracle. That is my interpretation of what he said, not his words.

Why is Drizzle different? Like Memcached and Gearman, there is no “Drizzle, Inc.”. It is an Open Source project that is supported by the community. It is being supported by companies like Rackspace who hired five developers to work on it. The code is kept on Launchpad and is completely open. Anyone can create a branch and work on the code. If your patches are good, they will be merged into the main branch. But, you can keep your own branch going if you want to. Unlike the other forks, Drizzle has started over in both the code and the community. I personally see it as the only way forward. It is not ready today, but my money is on Drizzle five or ten years from now.

Can we get faster expression handling in MySQL

Mar 30, 2010

Andrew from Sphinx continues to work on improving SQL (or SphinxQL) support and now he published benchmarks comparing arithmetic expression handling in Sphinx to one in MySQL. The result ? Sphinx scored 3x to 20x faster. Andrew goes to explain results are not 100% comparable (as we can see in the table results are even different) and some performance can be attributed to Sphinx using different typing for expression. Though I#8217;m wondering how much of Performance difference that really explain. I doubt it is 20x. Andrew goes on to explain there are further optimizations possible for Sphinx such as JIT compilation and expression optimization which he claims can increase the lead even further
Andrew is kind enough to provide explanations and benchmark results though this is far from the first notion of this problem I hear about. Many other companies looking to optimize MySQL problem, especially in analytics space when expressions should be computed over hundreds of millions of rows have expressed their concerns with this part of MySQL execution. I#8217;m wondering if we#8217;re going to see any improvements in this space by Oracle, MariaDB or Drizzle. I think Drizzle is in advantage here #8211; with simplifying MySQL functionality a lot they probably can also simplify expression handling to be faster.

Entry posted by peter |
3 comments
Add to: | | | |

Logging with MySQL – Brian Moon

Mar 24, 2010

I was reading a post by Cassandra is my NoSQL solution but..“. In the post, Dathan explains that he uses Cassandra to store clicks because it can write a lot faster than MySQL. However, he runs into problems with the read speed when he needs to get a range of data back from Cassandra. This is the number one problem I have with NoSQL solutions.

SQL is really good at retrieving a set of data based on a key or range of keys. Whereas NoSQL products are really good at writing things and retrieving one item from storage. When looking at redoing our architecture a few years ago to be more scalable, I had to consider these two issues. For what it is worth, the NoSQL market was not nearly as mature as it is now. So, my choices were much more limited. In the end, we decided to stick with MySQL. It turns out that a primary or unique key lookup on a MySQL/InnoDB table is really fast. It is sort of like having a key/value storage system. And, I can still do range based queries against it.

But, back to Dathan’s problem: clicks. We store clicks at dealnews. Lots of clicks. We also store views. We store more views than we do clicks. So, lots of views and lots of clicks. (Sorry for the vague numbers, company secrets and all. We are a top 1,000 Compete.com site during peak shopping season.) And we do it all in MySQL. And we do it all with one server. I should disclose we are deploying a
second server, but it is more for high availability than processing
power. Like Dathan, we only use about the last 24 hours of data at any given time. There are three keys for us doing logging like this in MySQL.

Use MyISAM

MyISAM supports concurrent inserts. Concurrent inserts means that inserts can add rows to the end of a table while selects are being performed on other parts of the data set. This is exactly the use case for our logging. There are caveats with range queries as pointed out by the MySQL Performance Blog.

Rotating tables

MySQL (and InnoDB in particular) really sucks at deleting rows. Like, really sucks. Deleting causes locks. Bleh. So, we never delete rows from our logging tables. Instead, nightly we rotate the tables. RENAME TABLE is an (near) atomic process in MySQL. So, we just create a new table.

create table clicks_new like clicks;rename table clicks to clicks_2010032500001, clicks_new to clicks;

Tada! We now have an empty table for today’s clicks. We now drop any table with a date stamp that is longer than x days old. Drops are fast, we like drops.

For querying these tables, we use UNION. It works really well. We just issue a SHOW TABLES LIKE ‘clicks%’ and union the query across all the tables. Works like a charm.

Gearman

So, I get a lot of flack at work for my outright lust for Gearman. It is my new duct tape. When you have a scalability problem, there is a good chance you can solve it with Gearman. So, how does this help with logging to MySQL? Well, sometimes, MySQL can become backed up with inserts. It happens to the best of us. So, instead of letting that pile up in our web requests, we let it pile up in Gearman. Instead of having our web scripts write to MySQL directly, we have them fire Gearman background jobs with the logging data in them. The Gearman workers can then write to the MySQL server when it is available. Under normal operating procedure, that is in near real time. But, if the MySQL server does get backed up, the jobs just queue up in Gearman and are processed when the MySQL server is available.

BONUS! Insert Delayed

This is our old trick before we used Gearman. MySQL (MyISAM) has a neat feature where you can have inserts delayed until the table is available. The query is sent to the MySQL server and it answers with success immediately to the client. This means your web script can continue on and not get blocked waiting for the insert. But, MySQL will only queue up so many before it starts erroring out. So, it is not as fool proof as a job processing system like Gearman.

Summary

To log with MySQL:

  • Use MyISAM with concurrent inserts
  • Rotate tables daily and use UNION to query
  • Use delayed inserts with MySQL or a job processing agent like Gearman

Happy logging!

PS: You may be asking, “Brian, what about Partitioned Tables?” I asked myself that before deploying this solution. More importantly, in IRC I asked Brian Aker about MySQL partitioned tables. I am paraphrasing, but he said that if I ever think I might alter that table, I would not trust it with the partitions in MySQL. So, that kind of turned me off of them.