Pencarian

Rss Posts

 

 

 

Berita pada kategori ‘Basisdata’

MariaDB: the new MySQL? Interview with Michael Monty Widenius.

Sep 29, 2011

?I want to ensure that the MySQL code base (under the name of MariaDB) will survive as open source, in spite of what Oracle may do.? — Michael ?Monty? Widenius. Michael ?Monty? Widenius is the main author of the original version of the open-source MySQL database and a founding member of the MySQL AB company. [...]

InnoDB at Oracle OpenWorld

Sep 28, 2011

Sunny and I will be presenting at the Oracle OpenWorld next week:

Introduction to InnoDB, MySQL’s Default Storage Engine,? 10/04/11 Tuesday 01:15 PM, ? Marriott Marquis – Golden Gate C3, ? ? Calvin Sun
InnoDB Performance Tuning,? 10/04/11 Tuesday? 03:30 PM, ? Marriott Marquis – Golden Gate C2, ? Sunny Bains

The first session is for beginners, who are new to InnoDB and MySQL. The second session will cover many new performance features in MySQL 5.5 and 5.6, and share some tuning tips to maximize MySQL performance.
What to learn more about MySQL? There will be something for everyone. Come to join us!

 

MySQL Connection Timeouts

Apr 20, 2011

Sometimes on very busy MySQL server you will see sporadic connection timeouts, such as Can’t connect to MySQL server on ‘mydb’ (110). If you have connects timed in your
application you will see some successful connections taking well over the second. The problem may start very slow and be almost invisible for long time, for example having one out of million
of connection attempts to time out… when as the load growths it may become a lot more frequent.

If you time the connect you will often see connection times are being close to 3 and 9 seconds. These are “magic” numbers which I remember from years ago, which correspond to SYN packet being dropped during connection attempt and being resent. 3 seconds corresponds to 1 packet being dropped and 9 seconds correspond to two. If this is happening it is possible you have network issues or more likely you have listen queue overflow. You can check if it is the case by running netstat -s and finding something like:

38409 times the listen queue of a socket overflowed
38409 SYNs to LISTEN sockets dropped

This means some SYN packets have to be dropped because kernel buffer of connection requests on LISTEN socket is overflow – MySQL is not accepting connections as quickly as it needs.
There are 2 tuning places you need to consider if this is what is happening.
First – Linux kernel net.ipv4.tcp_max_syn_backlog This is size of kernel buffer for all sockets.
Default I have on my kernel is 2048 though it might vary for different versions, you might need to increase it to 8192 or so if you have intense connection. I’ll explain the math below.
Second – is MySQL parameter back_log which has default value of just 50. You may want to set it to 1000 or even higher. You may also need to increase
net.core.somaxconn kernel setting which contains the maximum depth of listen queue allowed. The kernel I’m running has it set to just 128 which would be too low for many
conditions.
Now lets look more into the problem and do some Math. First lets look into how MySQL accepts connection. There is single main thread which is accepting connections coming to LISTEN
sockets. Once there is connection coming it it needs to create a new socket for incoming connection and create a new thread or take one out of the thread cache. From this point on MySQL processes network communication in multiple threads and can benefit from multiple cores but this work done by main thread does not.
Usually main thread is able to accept connections pretty quickly, however if it stalls waiting on mutex or doing any other work such as launching new thread takes a lot of time you can have the listen queue to overflow. Lets look at the database which accepts 1000 of connects/sec in average. This is a high number but you can see ones even higher. In most cases because of “random arrivals” nature of traffic you will see some seconds where as much as 3000 connections come in. Under such conditions the default back_log of 50 is enough just for 17 milliseconds, and if main thread stalls somewhere longer than, some SYN packets may be lost.
I would suggest sizing your tcp_max_syn_backlog and back_log value to be enough for at least 2 seconds worth of connection attempts. For example If I have 100 connects/sec which means I should plan for 300 connections using 3x for “peak multiplier”. This means they should be set to at least 600.
Setting it to cover much more than 2 seconds does not make much sense because if client does not get a response within 3 seconds it will consider SYN packet is lost and will send the new one anyway.
There is something else. If you’re creating 1000 of connections a second to MySQL Server you might be pushing your luck and at very least you’re using a lot of resources setting up and tearing down connections. Consider using persistent connections or connection pool at least for applications which are responsible for most of connections being created.

Setting up replication with XtraBackup

Apr 19, 2011

I attended Vadim Tkachenko’s talk on XtraBackup during MySQL conference in Santa Clara last week. Backups are obviously very important, but the use case I had in mind is this:
Replicating a database that has Innodb tables in it, while keeping both master and slave on line if possible.
Tangent: by the way, I love the native backup utility that was once promised in MySQL 6.0, similar to SQL Server’s way of backup. It was like running “BACKUP myDb to DISK = ‘/backupDirectory/myDb.bak’” under mysql client, but I digress…
I have used mysqldump to accomplish this in the past, but I wondered how XtraBackup would fare in this task, especially after hearing Vadim’s talk and reading news on Percona’s development effort. To cut to the chase, this is my conclusion. Reproducing steps are listed immediately afterwards.
1. innobackupex provides a consisten database backup, spitting out log file and log positions in stdout, which is nice and useful for slave initiation;
2. It works with both MyISAM and innodb tables;
3. If MyISAM tables are all you have, just run innobackupex –prepare /directoryWhereBackupIs, and then move the database directory from under /directoryWhereBackupIs to under your slave’s datadir, then make the necessary group and owner change to said directory and its content files, and you are ready to run the “change master” command and start slave;
4. If the database has innodb tables, then in addition to step 3, you will also need to stop mysql on slave, move the ibdata1 file to datadir, then restart mysql, and run “change master…” and “start slave” commands. It does not matter if you are using innodb_file_per_table or not.
It will be nice if I can keep the slave up and running during this step when the database has innodb tables in it. Did I do anything wrong? Is there a better way? What if the slave has a database that has innodb tables and thus uses ibdata1 to begin with? What do you do then? Should I play with Tungsten’s replication? What are the compelling reasons to use Tungsten’s replication?
In any case, from my limited testing, I think I will use innobackupex for future replication creation tasks, if I can afford a mysqld restart. Overall, it feels a bit easier than mysqldump approach that I’ve been using in the past.
Here are the steps needed to reproduce:
1. Fire up 2 Rackspace CentOS 5.5 servers. Rackspace cloud servers beat Amazon EC2 servers hands down, in my view, for developing/sandboxing purposes;
2. Install the required mysql client, server, and XtraBackup on both servers;
3. Make /etc/my.cnf by cloning the sample cnf files under /usr/share/my-small.cnf. 3 minimum changes were necessary: log-bin=mysql-bin, server-id=a unique number, datadir=/var/lib/mysql. The first 2 are necessary for replication, the last is needed for innobackupex
Well, while you are at it, on slave, add in read-only and skip-slave-start if appropriate. That’s best practice for read only slave.
4. Add master server’s public key to authorized_keys on slave, to facilitate easy ssh connection.
5. On master, run this command:

innobackupex –databases=test –stream=tar /tmp/ –slave-info | ssh root@slave "tar xfi – -C /root"
When it finishes, you should see something like this:
110419 18:54:21 innobackupex: completed OK!
tar: Read 6656 bytes from -

Take note of 3 lines immediately above it, where it states the binlog file and log position, like this:

innobackupex: MySQL binlog position: filename 'mysql-bin.000002', position 2515

6. On slave, run this command:

innobackupex –apply-log /locationWhereBackupIs

then, assuming the database name is test, run the 2 commands below to change the group and owner to mysql:

chgrp -R mysql test
chown -R mysql test

move the directory under mysqld’s datadir:

mv test/ /mysql/datadir

If test database has innodb tables in it, stop mysql on slave, then copy ibdata1 to datadir, restart mysql.
7. On master, open up port 3306 if it is not already open, then create the replication account:

grant replication slave, replication client on *.* to repl@'50.56.121.%' identified by 'p@ssw0rd';

8. On slave, run:

change master to master_host='50.56.121.96', master_user='repl', master_password='p@ssw0rd', master_log_file='see output from innobackupex backup command on master', master_log_pos=numFrominnobackupexOutputOnMaster;

start slave;

show slave status\G

MySQL & NoSQL Survey

Mar 13, 2011

Hello,

Could you please take the time and fill in this short survey about using MySQL and NoSQL in companies.
I will publish the results in a week.

Thank you for your time.

<p>Loading…</p>

A cool terminal tip for Mac users

Mar 13, 2011

If you use a Mac, and you are dealing with many similar tasks at once, like examining many database servers in different terminals, you may like this one.I have been using iTerm 2 for a while, and my handling of parallel tasks has improved a lot. (No, I am not talking about Parallel replication, although I have applied this trick while testing that technology as well.)iTerm2 has some cool features, and probably the most striking one is split panes. That alone would be a good reason for giving iTerm2 a try. But the one that I use the most, often in combination with Split Panes, is called Send Input to all tabs.Here is how it works. Let’s say I need to use 4 servers at once, and perform a non-repeating operation in all of them.So I open a separate window and I split the screen into 5 panes. I connect to each server in the first four panes, and I open a vim instance in the fifth.With that done, I enable the magic option.A word of caution. This option sends the input to all the open tabs in your current window. If you don’t want this to happen, do as I do, and open a separate window. Then make sure that all tabs, and eventually split panes, are supposed to receive your input. The application asks you for confirmation.After that, whatever I type on one pane will be mirrored on all the panes. So I will see the commands running on my four servers, and being logged in a text file in the fifth one. All with just single command, I have all servers under control at once:

MySQL Workbench 5.2.32 GA Available

Mar 05, 2011

We’re proud to announce the next release of MySQL Workbench, version 5.2.32. This is a maintenance release featuring
a new and improved UI appearance and several corrections and other enhancements.
The tabbed interface has been refreshed to obtain a clearer separation between different modules of Workbench, while improving responsiveness when switching between tabs. The Query Formatter has been rewritten and is now faster and more robust on its handling of queries. The layout of the Administration module has been changed to allow for easier future expansion and use less vertical screen space. Parts that had problems managing MySQL 5.5 servers have been fixed along other total of 53 bugs or enhancement requests have been addressed.
As always, we want to thank everyone for the great feedback we have received. This helps us to continuously improve the functionality and stability of MySQL Workbench – we appreciate all your ideas for improving MySQL Workbench.? Please keep sending us your ideas!
MySQL Workbench 5.2 GA

Data Modeling
Query (replaces the old MySQL Query Browser)
Administration (replaces the old MySQL Administrator)

Please get your copy from our Download site. Sources and binary packages are available for several platforms, including Windows, Mac OS X and Linux.

http://dev.mysql.com/downloads/workbench/

To get started quickly, please take a look at this short tutorial.
MySQL Workbench 5.2 RC Tutorial

http://wb.mysql.com/?p=406

Workbench Documentation can be found here.

http://dev.mysql.com/doc/workbench/en/index.html

In addition to the new Query/SQL Development and Administration modules, version 5.2 features improved stability and performance ? especially in Windows, where OpenGL support has been enhanced and the UI was optimized to offer better responsiveness.
This release also includes improvements to the scripting capabilities of the SQL Editor. You can read more about it in

http://wb.mysql.com/workbench/doc/

For a detailed list of resolved issues, see the change log.

http://dev.mysql.com/doc/workbench/en/wb-change-history.html

If you need any additional info or help please get in touch with us.
Post in our forums, leave comments on our blog pages or if you want to talk to us directly you can visit us on our IRC channel #workbench on irc.freenode.net.
- The MySQL Workbench Team

Xtrabackup for MySQL, and issues with streaming mode

Mar 05, 2011

Yes, it has been quite some time since I blogged, work has been very busy lately.
Currently we have a number of various backup strategies that our partners may use, one of which has been hot backups via xtrabackup (or innobackup/MySQL Enterprise Backup – with a license fee of course).
At this time we have one person dedicated to maintaining the backups, which includes rewriting innobackupex to handle extras and also write wrapper scripts around the original one.
This morning he contacted me because he was running into problems with xtrabackup 1.5 and streaming. No, not the usual performance issues etc, but rather a few random .MYI files were missing and the xtrabackup_checkpoint and xtrabackup_logfiles were missing.
What was interesting was the the MYI,frm and MYD files missing was random – it was mainly static but would change every now and then when his script ran.
At first glance without reviewing his script, it seemed like something was going on with xtrabackup streaming mode, since the help_*.* files contained the checkpoints info, so it looked like it streamed it to the wrong place..
However, after I was informed about it, I decided to look at the script this person had written, and I realized that he did something like..
innobackupex –backup ……. –stream=tar /path 1>/file.tar 2>/logfile.
After testing, its clear that xtrabackup does not like that. You are not able to separate stdout and stderr like this, since it breaks the application. By changing 1>/file.tar 2>/logfile to just >/file.tar it all worked well.
Why does innobackupex write things to STDERR that should be sent to the tar stream? I do not know, but I hope that someone here can help out.
Either way, this is a reminder for all of you that want to use it – you cannot suppress output, since that will break the stream mode.
Have a great spring!

Generating Google line charts with SQL, part II

Mar 03, 2011

This post continues Generating Google line charts with SQL, part I, in pursue of generating time series based image charts.
We ended last post with the following chart:

http://chart.apis.google.com/chart?cht=lc&chs=400×200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL

which has a nice curve, and a proper y-legend, but incorrect x-legend and no ticks nor grids.
To date, Google Image Charts do not support time-series charts. We can’t just throw timestamp values and expect the chart to properly position them. We need to work these by hand.
This is not easily done; if our input consists of evenly spread timestamp values, we are in a reasonable position. If not, what do we do?
There are several solutions to this:

We can present whatever points we have on the chart, making sure to position them properly. This makes for an uneven distribution of ticks on the x-axis, and is not pleasant to watch.
We can extrapolate values for round hours (or otherwise round timestamp resolutions), and so show evenly spread timestamps. I don’t like this solution one bit, since we’re essentially inventing values here. Extrapolation is nice when you know you have nice curves, but not when you’re doing database monitoring, for example. You must have the precise values.
We can do oversampling, then group together several measurements within round timestamp resolutions. For example, we can make a measurement every 2 minutes, yet present only 6 measurements per hour, each averaging up 10 round minutes. This is the approach I take with mycheckpoint.

The latest approach goes even beyond that: what if we missed 30 minutes of sampling? Say the server was down. We then need to “invent” the missing timestamps. Note that we invent the timestamps, we do not invent values. We must present the chart with missing values on our invented timestamps.
I may show how to do this in a future post. Meanwhile, let’s simplify and assume our values are evenly spread.
Sample data
We use google_charts.sql. Note that the timestamp values provided in Part I of this post is skewed, so make sure to use this file.
x-axis values
We use chxl to present with x-axis values. We may be tempted to just list all values. Would that work?
Sadly, no, for two reasons:

Google is not smart enough; whatever we throw at it, it will try to present. So, if we have 288 rows, that’s 288 x-axis values. Not enough room, to be sure! Smarter implementations would automatically hide some values, so as only to present with non-overlapping values.
Our URL will turn out to be too long. Remember: 2048 characters is our maximum limit for GET request!

Also, we must format our timestamp to be of minimal width. In our example, we have a 24 hour range. We therefore present timestamps in hh:MM format. So, a naive approach would be to:

SELECT
CONCAT(
‘http://chart.apis.google.com/chart?cht=lc&chs=400×200&chtt=SQL%20chart&chxt=x,y&chxr=1,’,
ROUND(min_value, 1), ‘,’,
ROUND(max_value, 1),
‘&chd=s:’,
GROUP_CONCAT(
IF(
data IS NULL,
‘_’,
SUBSTRING(
‘ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789′,
1+61*(data – min_value)/(max_value – min_value),
1
)
)
SEPARATOR ”
),
‘&chxl=0:|’,
GROUP_CONCAT(
DATE_FORMAT(ts, ‘%H:%i’)
SEPARATOR ‘|’
)
) FROM chart_data, chart_data_minmax

The resulting URL is just too long.
Solution? Let’s only consider round hour timestamps! Our next attempt looks like this (we also throw in chxs, to show ticks):

SELECT
CONCAT(
‘http://chart.apis.google.com/chart?cht=lc&chs=400×200&chtt=SQL%20chart&chxt=x,y&chxr=1,’,
ROUND(min_value, 1), ‘,’,
ROUND(max_value, 1),
‘&chd=s:’,
GROUP_CONCAT(
IF(
data IS NULL,
‘_’,
SUBSTRING(
‘ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789′,
1+61*(data – min_value)/(max_value – min_value),
1
)
)
SEPARATOR ”
),
‘&chxs=0,505050,10,0,lt’,
‘&chxl=0:|’,
GROUP_CONCAT(
IF(
MINUTE(ts) = 0,
DATE_FORMAT(ts, ‘%H:%i’),
NULL
)
SEPARATOR ‘|’
)
) FROM chart_data, chart_data_minmax

and results with:

http://chart.apis.google.com/chart?cht=lc&chs=400×200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL&chxs=0,505050,10,0,lt&chxl=0:|00:00|01:00|02:00|03:00|04:00|05:00|06:00|07:00|08:00|09:00|10:00|11:00|12:00|13:00|14:00|15:00|16:00|17:00|18:00|19:00|20:00|21:00|22:00|23:00

Too messy, isn’t it?
A word about ticks
You would think: OK, then, let’s just present every 4 round hours timestamps. But there’s a catch: a tick will show only when there’s an x-axis value. It’s nice to have a tick for every hour, but we only want to present values every 4 hours.
Fortunately, we can provide with an unseen value: a space (URL encoded as ‘+‘). So we complicate things up a bit on the chxl to read:

SELECT
CONCAT(
‘http://chart.apis.google.com/chart?cht=lc&chs=400×200&chtt=SQL%20chart&chxt=x,y&chxr=1,’,
ROUND(min_value, 1), ‘,’,
ROUND(max_value, 1),
‘&chd=s:’,
GROUP_CONCAT(
IF(
data IS NULL,
‘_’,
SUBSTRING(
‘ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789′,
1+61*(data – min_value)/(max_value – min_value),
1
)
)
SEPARATOR ”
),
‘&chxs=0,505050,10,0,lt’,
‘&chxl=0:|’,
GROUP_CONCAT(
IF(
MINUTE(ts) = 0,
IF(
HOUR(ts) MOD 4 = 0,
DATE_FORMAT(ts, ‘%H:%i’),
‘+’
),
NULL
)
SEPARATOR ‘|’
)
) FROM chart_data, chart_data_minmax

and get:

http://chart.apis.google.com/chart?cht=lc&chs=400×200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL&chxs=0,505050,10,0,lt&chxl=0:|00:00|+|+|+|04:00|+|+|+|08:00|+|+|+|12:00|+|+|+|16:00|+|+|+|20:00|+|+|+

OK, I cheated
Who says sample data starts with a round hour? We have that hidden assumption here, since the first tick is necessarily a round hour in our code. Yet our data may start at 12:35, for example. Sorry, you’ll have to dig into mycheckpoint’s source code to see a thorough solution. It’s just too much for this post.
Grids
Let’s wrap this up with grids. Grids work by specifying the step size (in percent of overall height/width) and initial offset (again, in percent).
Wouldn’t it be nicer if grids were automatically attached to ticks? I mean, REALLY! What were those guys thinking? (I know, they’re doing great work. Keep it up!)
Problem is, I have no idea how Google chooses to distribute values on the y-axis. I don’t know where y-axis ticks will be placed. So on y-axis, I just choose to split charts to 4 even parts, and draw horizontal grids between them. Percent is 25 (100/4), offset is 0.
But I do have control over the x-axis. In our case, I know how many ticks we’ll be having. Plus, I made life easier by assuming we start with a round hour, so no offset is required.
Umm… How many ticks do we have? Easy: the number of round hours. This can be calculated by: SUM(MINUTE(ts) = 0. Actually, we need to take 1 off.
We now build the chg parameter:

SELECT
CONCAT(
‘http://chart.apis.google.com/chart?cht=lc&chs=400×200&chtt=SQL%20chart&chxt=x,y&chxr=1,’,
ROUND(min_value, 1), ‘,’,
ROUND(max_value, 1),
‘&chd=s:’,
GROUP_CONCAT(
IF(
data IS NULL,
‘_’,
SUBSTRING(
‘ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789′,
1+61*(data – min_value)/(max_value – min_value),
1
)
)
SEPARATOR ”
),
‘&chxs=0,505050,10,0,lt’,
‘&chxl=0:|’,
GROUP_CONCAT(
IF(
MINUTE(ts) = 0,
IF(
HOUR(ts) MOD 4 = 0,
DATE_FORMAT(ts, ‘%H:%i’),
‘+’
),
NULL
)
SEPARATOR ‘|’
),
‘&chg=’, ROUND(100.0/((SUM(MINUTE(ts) = 0) -1)), 2), ‘,25,1,2,0,0′
) FROM chart_data, chart_data_minmax

and get:

http://chart.apis.google.com/chart?cht=lc&chs=400×200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL&chxs=0,505050,10,0,lt&chxl=0:|00:00|+|+|+|04:00|+|+|+|08:00|+|+|+|12:00|+|+|+|16:00|+|+|+|20:00|+|+|+&chg=4.35,25,1,2,0,0

Phew!
Conclusion
So we haven’t worked on offsets. And, this is a single line chart. What about multiple lines? Legend? The following chart:

is harder to achieve. I’m leaving this up to you!

MepSQL Debs for Ubuntu now released – courtesy of cool tweaks to the build system.

Feb 19, 2011

After another week of hacking on MepSQL the DEB files for Ubuntu are now available.(MepSQL is my new “just a hobby” MySQL fork project.)
The Download page has instructions on how to install the packages with a simple apt-get install command. Debian packages will appear soon as they are now easy to add – I mostly just need to add new Amazon images for each.
read more