SHOW ENGINE INNODB STATUS truncation, innodb_truncated_status_writes
Oct 29, 2010
Guyub adalah perusahaan TI berpusat di Palembang dengan fokus pada F/OSS Produk-produk >> Layanan-layanan >>
Oct 26, 2010
MASTER=()
CURRENT=()
FIRST=1
for SERVER in ${SERVERS}; do
# collect all builds from server and populate CURRENT list
COMMAND=”${LS} -1fd ${WEBROOT}/${SITE}.*”
BUILDS=`${SSH} ${SSHOPTS} root@${SERVER} “${COMMAND}”`
for BUILD in ${BUILDS}; do
CURRENT=( ${CURRENT[@]-} ${BUILD} )
done
# if this is our first time around, copy CURRENT to MASTER
if [ ${FIRST} -eq 1 ]; then
MASTER=( ${CURRENT[@]} )
FIRST=0
fi
# now we do a compare between MASTER and CURRENT to see what builds
# are common
INTERSECT=()
for ENTRY in ${CURRENT[@]}; do
in_array “${ENTRY}” “${MASTER[@]}”
RET=$?
if [ "${RET}" -eq 0 ]; then
INTERSECT=( ${INTERSECT[@]-} ${ENTRY} )
fi
done
MASTER=( ${INTERSECT[@]} )
# clear the CURRENT array
CURRENT=()
done
Let me take a moment to explain the code above:
In order to check for array intersection, you need an in_array()
function
The first argument as the “needle” and the second is the
“haystack”
We verify that both parameters were passed
We simply loop through the haystack checking for the needle
If we find it, return success. Otherwise, eventually return
false
We need to loop through each server eventually, but we’ll start with
the first one
Run an SSH command to get a listing of builds
Populate an array ($CURRENT) with the builds that were found
Since the first server has no previous server to compare with, so we
just copy it to $MASTER
We then loop to the 2nd server, and put the result of getting builds
into $CURRENT
Now that we have the first server’s builds in $MASTER, we perform an
intersect with $CURRENT
We realize the need for an $INTERSECT array to hold the intersections
found above
$INTERSECT becomes $MASTER since it only contains similar builds from
the 1st and 2nd server
Looping to the 3rd server, we get the builds and put them in $CURRENT
Since $MASTER contains only the similar builds thus far, we again
compare it with $CURRENT
The intersect can now be used to compare against builds on the 4th
server, and so on
Once you finish looping through all servers, your $MASTER should
contain only similar builds
There are a few guides out there which show you how to do this via
forking, but I thought someone may appreciate the elegance of using 100%
bash to accomplish this. I hope this helps someone else out there!
Oct 25, 2010
select * from a where a.id in (select id from b);
Users expect the inner query to execute first, then the results to be substituted into the IN() list. But what happens instead is usually a full scan or index scan of table a, followed by N queries to table b. This is because MySQL rewrites the query to make the inner query dependent on the outer query, which could be an optimization in some cases, but de-optimizes the query in many other cases. NOT IN(SELECT …) queries execute badly, too. (Note: putting a literal list of items in the IN() clause performs fine. It’s only when there is a SELECT inside it that it works poorly.)
The fix for this has been in progress for a few years, and Sergey Petrunia committed working code to the stalled 6.0 release. But it’s not quite clear whether that code was a complete solution. It has not been in any GA or RC release, so it hasn’t been used widely.
To be fair, many other database servers also have poor subquery performance, or have had it in the past and have fixed it. And many MySQL users have learned to simply write JOINs instead, so it isn’t that much of a limitation. But it would be a big improvement if it were fixed.
See if you can guess what limitation number 4 will be!
Entry posted by Baron Schwartz |
One comment
Add to: | | | |
Oct 04, 2010
mysql> show create table logs \G
Create Table: CREATE TABLE `logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`message` text NOT NULL,
`level` tinyint(11) NOT NULL DEFAULT ‘0′,
`s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT ”,
PRIMARY KEY (`id`),
KEY `s` (`s`),
KEY `name` (`name`,`ts`),
KEY `origin` (`origin`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
The table had log records starting 2010-08-23 and up till 2010-09-02 noon. Table status:
mysql> show table status like ‘logs’\G
*************************** 1. row ***************************
Name: logs
Engine: InnoDB
Version: 10
Row_format: Compressed
Rows: 22433048
Avg_row_length: 206
Data_length: 4625285120
Max_data_length: 0
Index_length: 1437073408
Data_free: 4194304
Auto_increment: 186878920
Create_time: 2010-08-24 18:10:49
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=8
Comment:
(A bit puzzled on the Create_time; the table was taken from an LVM snapshot of another server, so it existed for a very long time before. Not sure why the Create_time field is as it is here; I assume the MySQL upgrade marked it so, did not have the time nor need to look into it).
I was using Percona-Server-5.1.47-11.2, and so was able to look at the index statistics for that table:
mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name=’logs’;
+————–+————+————–+——–+—————-+————+————+
| table_schema | table_name | index_name?? | fields | row_per_keys?? | index_size | leaf_pages |
+————–+————+————–+——–+—————-+————+————+
| newsminer??? | logs?????? | PRIMARY????? |????? 1 | 1????????????? |???? 282305 |???? 246856 |
| newsminer??? | logs?????? | s |????? 2 | 17, 1????????? |????? 38944 |????? 33923 |
| newsminer??? | logs?????? | name???????? |????? 3 | 2492739, 10, 2 |????? 22432 |????? 19551 |
| newsminer??? | logs?????? | origin?????? |????? 3 | 1303, 4, 1???? |????? 26336 |????? 22931 |
+————–+————+————–+——–+—————-+————+————+
Status after massive purge
My first requirement was to purge out all record up to 2010-09-01 00:00:00. I did so in small chunks, using openark kit‘s oak-chunk-update (same can be achieved with maatkit‘s mk-archiver). The process purged 1000 rows at a time, with some sleep in between, and ran for about a couple of hours. It may be interesting to note that since ts is in monotonically ascending values, purging of old rows also means purging of lower PKs, which means we’re trimming the PK tree from left.
Even while purging took place, I could see the index_size/leaf_pages values dropping, until, finally:
mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name=’logs’;
+————–+————+————–+——–+————–+————+————+
| table_schema | table_name | index_name?? | fields | row_per_keys | index_size | leaf_pages |
+————–+————+————–+——–+————–+————+————+
| newsminer??? | logs?????? | PRIMARY????? |????? 1 | 1??????????? |????? 40961 |????? 35262 |
| newsminer??? | logs?????? | s |????? 2 | 26, 1??????? |????? 34440 |?????? 3798 |
| newsminer??? | logs?????? | name???????? |????? 3 | 341011, 4, 1 |?????? 4738 |?????? 2774 |
| newsminer??? | logs?????? | origin?????? |????? 3 | 341011, 4, 2 |????? 10178 |?????? 3281 |
+————–+————+————–+——–+————–+————+————+
The number of deleted rows was roughly 85% of total rows, so down to 15% number of rows.
Status after OPTIMIZE TABLE
Time to see whether OPTIMIZE really optimizes! Will it reduce number of leaf pages in PK? In secondary keys?
mysql> OPTIMIZE TABLE logs;
…
mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name=’logs’;
+————–+————+————–+——–+————–+————+————+
| table_schema | table_name | index_name?? | fields | row_per_keys | index_size | leaf_pages |
+————–+————+————–+——–+————–+————+————+
| newsminer??? | logs?????? | PRIMARY????? |????? 1 | 1??????????? |????? 40436 |????? 35323 |
| newsminer??? | logs?????? | s |????? 2 | 16, 1??????? |?????? 5489 |?????? 4784 |
| newsminer??? | logs?????? | name???????? |????? 3 | 335813, 7, 1 |?????? 3178 |?????? 2749 |
| newsminer??? | logs?????? | origin?????? |????? 3 | 335813, 5, 2 |?????? 3951 |?????? 3446 |
+————–+————+————–+——–+————–+————+————+
4 rows in set (0.00 sec)
The above shows no significant change in either of the indexes: not for index_size, not for leaf_pages, not for statistics (row_per_keys). The OPTIMIZE did not reduce index size. It did not reduce the number of index pages (leaf_pages are the major factor here). Some leaff_pages values have even increased, but in small enough margin to consider as equal.
Index-wise, the above example does not show an advantage to using OPTIMIZE. I confess, I was surprised. And for the better. This indicates InnoDB makes good merging of index pages after massive purging.
So, no use for OPTIMIZE?
Think again: file system-wise, things look different.
Before purging of data:
bash:~# ls -l logs.* -h
-rw-r—– 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r—– 1 mysql mysql 2.9G 2010-09-02 14:01 logs.ibd
After purging of data:
bash:~# ls -l logs.* -h
-rw-r—– 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r—– 1 mysql mysql 2.9G 2010-09-02 14:21 logs.ibd
Recall that InnoDB never releases table space back to file system!
After OPTIMIZE on table:
bash:~# ls -l logs.* -h
-rw-rw—- 1 mysql mysql 8.6K 2010-09-02 14:26 logs.frm
-rw-rw—- 1 mysql mysql 428M 2010-09-02 14:43 logs.ibd
On innodb_file_per_table an OPTIMIZE creates a new table space, and the old one gets destroyed. Space goes back to file system. Don’t know about you; I like to have my file system with as much free space as possible.
Need to verify
I’ve tested Percona Server, since this is where I can find INNODB_INDEX_STATS. But this begs the following questions:
Perhaps the results only apply for Percona Server? (I’m guessing not).
Or only for InnoDB plugin? Does the same hold for “builtin” InnoDB? (dunno)
Only on >= 5.1? (Maybe; 5.0 is becoming rare now anyway)
Only on InnoDB (Well, of course this test is storage engine dependent!)
Conclusion
The use case above is a particular example. Other use cases may include tables where deletions often occur in middle of table (remember we were trimming the tree from left side only). Other yet may need to handle UPDATEs to indexed columns. I have some more operations to do here, with larger tables (e.g. 40GB compressed). If anything changes, I’ll drop a note.
Sep 21, 2010
Sep 10, 2010
Sep 03, 2010
Get the softwareInstall itConfigure itRun itTest it
Scaling for HA and the web will describe how to:
Review the requirements for a HA configurationInstall the software on more serversUpdate & extend the configuration from a single host to 4Roll out the changesOn-line scaling to add further nodesWhen: Wednesday, September 08, 2010: 09:00 Pacific time (America)
Wed, Sep 08: 11:00 Central time (America)
Wed, Sep 08: 12:00 Eastern time (America)
Wed, Sep 08: 16:00 UTC
Wed, Sep 08: 17:00 Western European time
The presentation will be approximately 45 minutes long followed by Q&A.
Sep 03, 2010
export PKG_PATH=”ftp://openbsd.mirrors.tds.net/pub/OpenBSD/4.7/packages/amd64/”
pkg_add -i -v wget
wget http://dbbenchmark.googlecode.com/files/dbbenchmark-version-0.1.beta_rev26.tar.gz
pkg_add -i -v python
Ambiguous: choose package for python
a 0:
1: python-2.4.6p2
2: python-2.5.4p3
3: python-2.6.3p1
Your choice: 2
pkg_add -i -v py-mysql
pkg_add -i -v mysql
pkg_add -i -v mysql-server
ln -s /usr/local/bin/python2.5 /usr/bin/python
gzip -d dbbenchmark-version-0.1.beta_rev26.tar.gz
tar -xvf dbbenchmark-version-0.1.beta_rev26.tar
cd dbbenchmark-version-0.1.beta_rev26
./dbbenchmark.py –print-sql
– login to mysql and execute sql commands
./dbbenchmark.py
Aug 28, 2010
Aug 23, 2010
+———-+————–+——+—–+———+—————-+
| Field | Type | Null | Key | Default | Extra |
+———-+————–+——+—–+———+—————-+
| OGR_FID | int(11) | NO | PRI | NULL | auto_increment |
| SHAPE | geometry | NO | MUL | NULL | |
| category | varchar(30) | YES | | NULL | |
| name | varchar(113) | YES | | NULL | |
+———-+————–+——+—–+———+—————-+
mysql geo -e “select name, category, Y(SHAPE) as lat, X(SHAPE) as lng from oklahoma_poi where NAME like ‘School:Putnam%’”
+————————————-+——————————–+————+————-+
| name | category | lat | lng |
+————————————-+——————————–+————+————-+
| School:Putnam City West High School | Government and Public Services | 35.492557 | -97.6605975 |
| School:Putnam City North School | Government and Public Services | 35.5892209 | -97.6372648 |
| School:Putnam City School | Government and Public Services | 35.5122794 | -97.6142079 |
| School:Putnam High School | Government and Public Services | 35.5214459 | -97.6086523 |
| School:Putnam Heights Academy | Government and Public Services | 35.5081143 | -97.5397619 |
+————————————-+——————————–+————+————-+
In a coming set of post I’ll go over:
The “Data” types your will find and how to convert between them.
What data is available and where can you find it?
More examples on what you can do with GIS data.
Viewing our GIS data.
How to collect your own GIS data.
Good and bad examples of searching GIS data.
Optimizing MySQL GIS.? Is it really worth using?
[1] Books: GIS for Dummies – Author: Michael N. DeMers – John Wiley & Sons?(2009) – ISBN: 0470236825
Open Source GIS: A GRASS GIS Approach. Third Edition.-? Author: Markus Neteler and Helena Mitasova – ISBN: 978-0-38735767-6
Web Mapping Illustrated: Using Open Source GIS Toolkits – Author: Tyler Mitchell – ISBN: 9780596008659
?