Pencarian

Rss Posts

 

 

 

Berita pada kategori ‘Basisdata’

MySQL Limitations Part 3: Subqueries

Oct 25, 2010

This is the third in a series on what?s seriously limiting MySQL in certain circumstances (links: part 1, 2). This post is about subqueries, which in some cases execute outside-in instead of inside-out as users expect.
It’s easy to pick on subqueries in MySQL, so I’ll try to be gentle. The following query will surprise users unpleasantly:

select * from a where a.id in (select id from b);

Users expect the inner query to execute first, then the results to be substituted into the IN() list. But what happens instead is usually a full scan or index scan of table a, followed by N queries to table b. This is because MySQL rewrites the query to make the inner query dependent on the outer query, which could be an optimization in some cases, but de-optimizes the query in many other cases. NOT IN(SELECT …) queries execute badly, too. (Note: putting a literal list of items in the IN() clause performs fine. It’s only when there is a SELECT inside it that it works poorly.)
The fix for this has been in progress for a few years, and Sergey Petrunia committed working code to the stalled 6.0 release. But it’s not quite clear whether that code was a complete solution. It has not been in any GA or RC release, so it hasn’t been used widely.
To be fair, many other database servers also have poor subquery performance, or have had it in the past and have fixed it. And many MySQL users have learned to simply write JOINs instead, so it isn’t that much of a limitation. But it would be a big improvement if it were fixed.
See if you can guess what limitation number 4 will be!

Entry posted by Baron Schwartz |
One comment
Add to: | | | |

How often should you use OPTIMIZE TABLE? ? followup

Oct 04, 2010

This post follows up on Baron’s How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table’s indexes. I worked on some production data I was authorized to provide as example.
The use case
I’ll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I’ve rewritten some column names for privacy:

mysql> show create table logs \G

Create Table: CREATE TABLE `logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`message` text NOT NULL,
`level` tinyint(11) NOT NULL DEFAULT ‘0′,
`s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT ”,
PRIMARY KEY (`id`),
KEY `s` (`s`),
KEY `name` (`name`,`ts`),
KEY `origin` (`origin`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

The table had log records starting 2010-08-23 and up till 2010-09-02 noon. Table status:

mysql> show table status like ‘logs’\G
*************************** 1. row ***************************
Name: logs
Engine: InnoDB
Version: 10
Row_format: Compressed
Rows: 22433048
Avg_row_length: 206
Data_length: 4625285120
Max_data_length: 0
Index_length: 1437073408
Data_free: 4194304
Auto_increment: 186878920
Create_time: 2010-08-24 18:10:49
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=8
Comment:

(A bit puzzled on the Create_time; the table was taken from an LVM snapshot of another server, so it existed for a very long time before. Not sure why the Create_time field is as it is here; I assume the MySQL upgrade marked it so, did not have the time nor need to look into it).
I was using Percona-Server-5.1.47-11.2, and so was able to look at the index statistics for that table:

mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name=’logs’;
+————–+————+————–+——–+—————-+————+————+
| table_schema | table_name | index_name?? | fields | row_per_keys?? | index_size | leaf_pages |
+————–+————+————–+——–+—————-+————+————+
| newsminer??? | logs?????? | PRIMARY????? |????? 1 | 1????????????? |???? 282305 |???? 246856 |
| newsminer??? | logs?????? | s |????? 2 | 17, 1????????? |????? 38944 |????? 33923 |
| newsminer??? | logs?????? | name???????? |????? 3 | 2492739, 10, 2 |????? 22432 |????? 19551 |
| newsminer??? | logs?????? | origin?????? |????? 3 | 1303, 4, 1???? |????? 26336 |????? 22931 |
+————–+————+————–+——–+—————-+————+————+

Status after massive purge
My first requirement was to purge out all record up to 2010-09-01 00:00:00. I did so in small chunks, using openark kit‘s oak-chunk-update (same can be achieved with maatkit‘s mk-archiver). The process purged 1000 rows at a time, with some sleep in between, and ran for about a couple of hours. It may be interesting to note that since ts is in monotonically ascending values, purging of old rows also means purging of lower PKs, which means we’re trimming the PK tree from left.
Even while purging took place, I could see the index_size/leaf_pages values dropping, until, finally:

mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name=’logs’;
+————–+————+————–+——–+————–+————+————+
| table_schema | table_name | index_name?? | fields | row_per_keys | index_size | leaf_pages |
+————–+————+————–+——–+————–+————+————+
| newsminer??? | logs?????? | PRIMARY????? |????? 1 | 1??????????? |????? 40961 |????? 35262 |
| newsminer??? | logs?????? | s |????? 2 | 26, 1??????? |????? 34440 |?????? 3798 |
| newsminer??? | logs?????? | name???????? |????? 3 | 341011, 4, 1 |?????? 4738 |?????? 2774 |
| newsminer??? | logs?????? | origin?????? |????? 3 | 341011, 4, 2 |????? 10178 |?????? 3281 |
+————–+————+————–+——–+————–+————+————+

The number of deleted rows was roughly 85% of total rows, so down to 15% number of rows.
Status after OPTIMIZE TABLE
Time to see whether OPTIMIZE really optimizes! Will it reduce number of leaf pages in PK? In secondary keys?

mysql> OPTIMIZE TABLE logs;

mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name=’logs’;
+————–+————+————–+——–+————–+————+————+
| table_schema | table_name | index_name?? | fields | row_per_keys | index_size | leaf_pages |
+————–+————+————–+——–+————–+————+————+
| newsminer??? | logs?????? | PRIMARY????? |????? 1 | 1??????????? |????? 40436 |????? 35323 |
| newsminer??? | logs?????? | s |????? 2 | 16, 1??????? |?????? 5489 |?????? 4784 |
| newsminer??? | logs?????? | name???????? |????? 3 | 335813, 7, 1 |?????? 3178 |?????? 2749 |
| newsminer??? | logs?????? | origin?????? |????? 3 | 335813, 5, 2 |?????? 3951 |?????? 3446 |
+————–+————+————–+——–+————–+————+————+
4 rows in set (0.00 sec)

The above shows no significant change in either of the indexes: not for index_size, not for leaf_pages, not for statistics (row_per_keys). The OPTIMIZE did not reduce index size. It did not reduce the number of index pages (leaf_pages are the major factor here). Some leaff_pages values have even increased, but in small enough margin to consider as equal.
Index-wise, the above example does not show an advantage to using OPTIMIZE. I confess, I was surprised. And for the better. This indicates InnoDB makes good merging of index pages after massive purging.
So, no use for OPTIMIZE?
Think again: file system-wise, things look different.
Before purging of data:

bash:~# ls -l logs.* -h
-rw-r—– 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r—– 1 mysql mysql 2.9G 2010-09-02 14:01 logs.ibd

After purging of data:

bash:~# ls -l logs.* -h
-rw-r—– 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r—– 1 mysql mysql 2.9G 2010-09-02 14:21 logs.ibd

Recall that InnoDB never releases table space back to file system!
After OPTIMIZE on table:

bash:~# ls -l logs.* -h
-rw-rw—- 1 mysql mysql 8.6K 2010-09-02 14:26 logs.frm
-rw-rw—- 1 mysql mysql 428M 2010-09-02 14:43 logs.ibd

On innodb_file_per_table an OPTIMIZE creates a new table space, and the old one gets destroyed. Space goes back to file system. Don’t know about you; I like to have my file system with as much free space as possible.
Need to verify
I’ve tested Percona Server, since this is where I can find INNODB_INDEX_STATS. But this begs the following questions:

Perhaps the results only apply for Percona Server? (I’m guessing not).
Or only for InnoDB plugin? Does the same hold for “builtin” InnoDB? (dunno)
Only on >= 5.1? (Maybe; 5.0 is becoming rare now anyway)
Only on InnoDB (Well, of course this test is storage engine dependent!)

Conclusion
The use case above is a particular example. Other use cases may include tables where deletions often occur in middle of table (remember we were trimming the tree from left side only). Other yet may need to handle UPDATEs to indexed columns. I have some more operations to do here, with larger tables (e.g. 40GB compressed). If anything changes, I’ll drop a note.

Call for Papers for the O’Reilly MySQL Conference

Sep 21, 2010

The call for papers for the O’Reilly MySQL Conference is now open, and closes October 25th.  Submit your proposal now at http://en.oreilly.com/mysql2011/user/proposal/propose/cfp/126!

Rails on PostgreSQL: Rails 3, bundler, and the pg gem

Sep 17, 2010

I was moving a Rails 2 app up to Rails 3 and, because I have PostgreSQL installed in a non-standard location on my server, I ran into problems when bundler was trying to install the pg gem. After fiddling about for a bit I ended up with this in my config/deploy.rb:

require 'bundler/capistrano'

task :set_config_for_pg_gem, :roles => [:app, :db] do
  run "cd #{current_path} && bundle config build.pg --with-pg-config=/usr/local/pgsql/bin/pg_config --no-rdoc --no-ri"
end

before "bundle:install", :set_config_for_pg_gem

This sets up the appropriate command line flags for the pg gem so that they’re in place when Capistrano runs the bundle:install task. The --no-rdoc --no-ri part isn’t necessary, but I figured it’ll save a second or two. Note that these flags end up in the deploy user’s home directory on the server:

$ cat ~/.bundle/config
---
BUNDLE_BUILD__PG: --with-pg-config=/usr/local/pgsql/bin/pg_config --no-rdoc --no-ri

Running this task every time you deploy is a little wasteful since it sets the configuration unnecessarily – really you just need it before the first time you deploy. So you could optimize things by touching a file in shared/system/ or some such and checking it as part of this task.

MySQL relay-log-space-limit vs. your page cache

Sep 10, 2010

Leif (who works for Spinn3r) published a great post on the mySQL relay-log-space-limit variable and using the page cache efficiently:
When the SQL thread is not able to keep up, logs will accumulate. You can set the location of your relay logs with the relay-log option; in the default configuration (and many production configurations), the relay logs are stored on the same disk as the rest of MySQL?s data. If the total size of the relay logs ever exceeds relay-log-space-limit, the IO thread will pause until the SQL thread has moved on to its next file and deleted its previous one.
Unfortunately, relay-log-space-limit defaults to zero (unlimited). This means that when the SQL thread can?t keep up, the slave?s relay logs will continue to accumulate indefinitely (until its disk fills up). When the total size of those logs approaches the amount of vfs page cache available, the SQL thread will begin to read them from disk (instead of getting cache hits), which will exacerbate the IO shortage that caused the problem in the first place.

MySQL Cluster: 5 Steps to Getting Started, then 5 More to Scale for the Web

Sep 03, 2010

Join us for a live and interactive webinar session where we will demonstrate how to start an evaluation of the
MySQL Cluster database in 5 easy steps, and then how to expand your
deployment for web & telecoms-scale services.Just register here: http://www.mysql.com/news-and-events/web-seminars/display-566.htmlGetting Started will describe how to:

Get the softwareInstall itConfigure itRun itTest it

Scaling for HA and the web will describe how to:

Review the requirements for a HA configurationInstall the software on more serversUpdate & extend the configuration from a single host to 4Roll out the changesOn-line scaling to add further nodesWhen: Wednesday, September 08, 2010: 09:00 Pacific time (America)

Wed, Sep 08: 11:00 Central time (America)

Wed, Sep 08: 12:00 Eastern time (America)

Wed, Sep 08: 16:00 UTC

Wed, Sep 08: 17:00 Western European time

The presentation will be approximately 45 minutes long followed by Q&A.

dbbenchmark.com ? configuring OpenBSD for MySQL benchmarking

Sep 03, 2010

Here are some quick commands for installing the proper packages and requirements for the MySQL dbbenchmark program.

export PKG_PATH=”ftp://openbsd.mirrors.tds.net/pub/OpenBSD/4.7/packages/amd64/”
pkg_add -i -v wget
wget http://dbbenchmark.googlecode.com/files/dbbenchmark-version-0.1.beta_rev26.tar.gz
pkg_add -i -v python
Ambiguous: choose package for python
a 0:
1: python-2.4.6p2
2: python-2.5.4p3
3: python-2.6.3p1
Your choice: 2

pkg_add -i -v py-mysql
pkg_add -i -v mysql
pkg_add -i -v mysql-server
ln -s /usr/local/bin/python2.5 /usr/bin/python
gzip -d dbbenchmark-version-0.1.beta_rev26.tar.gz
tar -xvf dbbenchmark-version-0.1.beta_rev26.tar
cd dbbenchmark-version-0.1.beta_rev26
./dbbenchmark.py –print-sql
– login to mysql and execute sql commands
./dbbenchmark.py

MySQL 5.1 Plugins Development Published

Aug 28, 2010

MySQL 5.1 has a great feature which not many people know about, that is the fact it can be extended via. the use of plugins. ?Unfortunately how you go about this is not incredibly well documented. ?You can search for examples on the internet, dig through the MySQL source code and ask on the forums and you may figure it all out. ?But doing all this is time consuming and could easily put someone off. So Sergei Golubchik and I have got together to bring you this book which will show you, using examples, how to write your own plugins.
We start by explaining the UDF API which has been around for a long time, and then move on to Daemon Plugins, Information Schema Plugins, Full-text Search Plugins and Storage Engine Plugins. ?Each with?usable?examples.
MySQL 5.1 Plugins Development has just been published by Packt Publishing and I believe it is well worth a look if you are thinking of extending MySQL.

MariaDB 5.1.49 for Mac OS X

Aug 27, 2010


Stuttgart: a rainy day, waiting for Iftar. Good time for good news!
During my vacation I read about a request for a MariaDB package for Mac OS X? and did some research. Back from vacation I have an alpha version of MariaDB package for Mac OS X for? our community to test.
Caution: this is the first installer I ever wrote on a Mac, so use it on a test system only!
I would like you to test the installer and provide us with feedback.
Known issues in the MariaDB installer:

The Preferences Pane app for starting/stopping the server instance is missing
The installer for setting up MariaDB as a Startup Item is missing.

Side node: while digging into the Mac installer I found two
bugs in the MySQL Mac OS X installer.

http://bugs.mysql.com/bug.php?id=56279

Mac installer does not work as documented

http://bugs.mysql.com/bug.php?id=56280

Mac installer’s postflight script does not work all the time

You can grab the package from here:

http://lisas.de/~hakan/file/mariadb-5.1.49-osx10.6-x86.pkg

MySQL GIS ? Part 1

Aug 23, 2010

TweetIn my business (weather) we use lots map based (Geo) information.? Almost every table has latitude and longitude. Working with this kind of data can be exciting and frustrating.? This should give you a quick start into GIS with MySQL.
“A geographic information system (GIS), or geographical information system, is any system that captures, stores, analyzes, manages, and presents data that are linked to location. In the simplest terms, GIS is the merging of cartography, statistical analysis, and database technology. GIS systems are used in cartography, remote sensing, land surveying, utility management, natural resource management, photogrammetry, geography, urban planning, emergency management, navigation, and localized search engines.” – Wikipedia
GIS / Mapping Systems work with both text data and graphical data.? Applications and utilities often blur the lines between the two types and make understanding difficult.? Map servers blend raster images, with point or polygon data, and bitmap images to make complete images to display in the user’s client application.? For this post I will concentrate on the text type “data”.? The type we can index in a MySQL database.
THE SEARCH
After months of reading, [1]? I’m writing this post to describes what I have learned about how to get started using GEO coding data as quickly as possible.? I found very little piratical information on GIS and MySQL.? The MySQL manual covers the functions but doesn’t supply much practical information on GEO.? Anders Karlsson wrote a nice and short story about GIS that give me a good start.
The best information has be written by Florin Duroiu in his post titled “Political boundaries overlay in Google maps”. A good part of my post is based on his work.
STEP BY STEP
Below are the detailed needed to to produce a MySQL database with the Points of Interest (POI).? This is based on CentOS 5.5 with MySQL 5.1.
yum install gdal
mkdir geo
mkdir data
mkdir data/Oklahoma
cd geo/data/Oklahoma
wget http://downloads.cloudmade.com/north_america/united_states/oklahoma/oklahoma.shapefiles.zip
unzip oklahoma.shapefiles.zip
mysql -e ‘create database geo’
ogr2ogr -f “MySQL” MySQL:”geo,user=root,host=localhost,password=” -nln oklahoma_poi -lco engine=MYISAM oklahoma_poi.shp
mysql geo -e ‘desc oklahoma_poi’

+———-+————–+——+—–+———+—————-+
| Field | Type | Null | Key | Default | Extra |
+———-+————–+——+—–+———+—————-+
| OGR_FID | int(11) | NO | PRI | NULL | auto_increment |
| SHAPE | geometry | NO | MUL | NULL | |
| category | varchar(30) | YES | | NULL | |
| name | varchar(113) | YES | | NULL | |
+———-+————–+——+—–+———+—————-+

mysql geo -e “select name, category, Y(SHAPE) as lat, X(SHAPE) as lng from oklahoma_poi where NAME like ‘School:Putnam%’”

+————————————-+——————————–+————+————-+
| name | category | lat | lng |
+————————————-+——————————–+————+————-+
| School:Putnam City West High School | Government and Public Services | 35.492557 | -97.6605975 |
| School:Putnam City North School | Government and Public Services | 35.5892209 | -97.6372648 |
| School:Putnam City School | Government and Public Services | 35.5122794 | -97.6142079 |
| School:Putnam High School | Government and Public Services | 35.5214459 | -97.6086523 |
| School:Putnam Heights Academy | Government and Public Services | 35.5081143 | -97.5397619 |
+————————————-+——————————–+————+————-+

In a coming set of post I’ll go over:

The “Data” types your will find and how to convert between them.
What data is available and where can you find it?
More examples on what you can do with GIS data.
Viewing our GIS data.
How to collect your own GIS data.
Good and bad examples of searching GIS data.
Optimizing MySQL GIS.? Is it really worth using?

[1] Books: GIS for Dummies – Author: Michael N. DeMers – John Wiley & Sons?(2009) – ISBN: 0470236825
Open Source GIS: A GRASS GIS Approach. Third Edition.-? Author: Markus Neteler and Helena Mitasova – ISBN: 978-0-38735767-6
Web Mapping Illustrated: Using Open Source GIS Toolkits – Author: Tyler Mitchell – ISBN: 9780596008659
?