<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Guyub - Konsultan F/OSS &#187; Basisdata</title>
	<atom:link href="http://guyub.co.id/category/basisdata/feed/" rel="self" type="application/rss+xml" />
	<link>http://guyub.co.id</link>
	<description>GNU/Linux - Java, PHP, Ruby - MySQL, PostgreSQL</description>
	<lastBuildDate>Thu, 29 Jul 2010 10:11:46 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Will Oracle kill MySQL?</title>
		<link>http://guyub.co.id/will-oracle-kill-mysql/</link>
		<comments>http://guyub.co.id/will-oracle-kill-mysql/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 14:32:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[F/OSS]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[oracle]]></category>

		<guid isPermaLink="false">http://guyub.co.id/will-oracle-kill-mysql/</guid>
		<description><![CDATA[I get asked this question often. It was mentioned again recently in a NYTECH executive breakfast with RedHat CIO Lee Congdon.
The short answer is No.
There is clear evidence that in the short to medium term Oracle will continue to promote and enhance MySQL. Some of these indicators include:
EU 10 point commitment in December 2009 &#8211; [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://ronaldbradford.com/blog/will-oracle-kill-mysql-2010-07-28/">I get asked this question often. It was mentioned again recently in a NYTECH executive breakfast with RedHat CIO Lee Congdon.<br />
The short answer is No.<br />
There is clear evidence that in the short to medium term Oracle will continue to promote and enhance MySQL. Some of these indicators include:</p>
<p>EU 10 point commitment in December 2009 &#8211; See Oracle Makes Commitments to Customers, Developers and Users of MySQL<br />
MySQL Conference April 2010 &#8211; Opening keynote by Edward Screven State of the Dolphin<br />
Oracle Magazine Jul/Aug 2010 &#8211; Interview with Edward Screven Open for Business.</p>
<p>It is clear from these sources that Oracle intends to incorporate MySQL into Oracle Backup and Security Vault products. Both a practical and necessary step.  There is also a clear mention of focusing on the Microsoft platform, a clear indicator that SQL Server is in their sights without actually saying it.<br />
What is unknown is exact how and when features will be implemented.  Also important is how much these may cost the end user.  Oracle is in the business of selling, now an entire H/W and S/W stack. They also have a complicated pricing model of different components with product offerings. I assume this will continue. There are already two indications,  InnoDBbackup included for Enterprise Backup (from April Keynote) and 5.1 enterprise split. (Note: while this split may have existed prior to Oracle, it is now more clearly obvious).<br />
MySQL can never be seen as drawing away from any Oracle sales of the core entry level database product. It is likely Oracle will provide a SQL Syntax compatibility layer for MySQL within 2 years, however it will I&#8217;m sure be a commercial add-on.  Likewise, I would suspect a PL/SQL lite layer within 5 years, but again at a significant cost to offset the potential loss of sales in the low end of the server market.  There continues to be active development in the MySQL Enterprise Monitor, MySQL Workbench and MySQL Connectors which is all excellent news for users.<br />
Moving forward, how long will this ancillary development of free tools continue?  What will happen to the commercial storage engine, OEM and licensing model after the 5 year commitment? How will the MySQL ecosystem survive.? There is active development in Percona, MariaDB and Drizzle forks, however unless all players that want to provide a close MySQL compatible solution work together, progress will continue to be a disappointing disjointed approach.    The 2011 conference season will also see a clear line with competing MySQL conferences in April scheduled at the same time, the O&#8217;Reilly MySQL conference in Santa Clara California and the Oracle supported(*) Collaborate 2011 in Orlando, Florida.<br />
I have a number of predictions on what Oracle ME MySQL may look like in 5 years however this is a topic for a personal discussion.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/will-oracle-kill-mysql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rails on PostgreSQL: Pivotal Labs Talk &#8211; Scaling a Rails App with Postgres</title>
		<link>http://guyub.co.id/rails-on-postgresql-pivotal-labs-talk-scaling-a-rails-app-with-postgres/</link>
		<comments>http://guyub.co.id/rails-on-postgresql-pivotal-labs-talk-scaling-a-rails-app-with-postgres/#comments</comments>
		<pubDate>Fri, 23 Jul 2010 18:01:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[Pemrograman]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scaling]]></category>

		<guid isPermaLink="false">http://guyub.co.id/rails-on-postgresql-pivotal-labs-talk-scaling-a-rails-app-with-postgres/</guid>
		<description><![CDATA[I&#8217;m slowly catching up with my podcast backlog and came across a Pivotal Labs talk from May 2009.  In this talk Josh Susser and Damon McCormick are presenting on Scaling a Rails App with Postgres .  It&#8217;s a little dated now &#8211; this talk was given was when PostgreSQL 8.4 was in beta [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://railsonpostgresql.com/2010/07/23/pivotal-labs-talk-scaling-a-rails-app-with-postgres">I&#8217;m slowly catching up with my podcast backlog and came across a Pivotal Labs talk from May 2009.  In this talk Josh Susser and Damon McCormick are presenting on <a href="http://pivotallabs.com/talks/67-scaling-a-rails-app-with-postgres">Scaling a Rails App with Postgres</a> .  It&#8217;s a little dated now &#8211; this talk was given was when PostgreSQL 8.4 was in beta &#8211; but, still, lots of good stuff.  Here are some notes:</p>
<ul>
<li>They started with an existing Rails app with lots of data, so they had some constraints &#8211; not greenfield development.</li>
<li>Around the 5-6 minute mark there&#8217;s a good discussion of PostgreSQL&#8217;s query optimizer and how it analyzes a table&#8217;s data distribution.  One takeaway (mentioned around 16:20) is to run <code>vacuum</code> more often on a particular table if there are a lot of writes.</li>
<li>10:00 How to set STATISTICS for a particular table.</li>
<li>11:00 Using partial indexes.</li>
<li>14:00 Indexing on expressions.</li>
<li>18:10-23:00 A nice discussion of the <code>EXPLAIN</code> output.</li>
<li>23:45 Here they talk about wide columns.  I&#8217;ve seen this in MySQL as well, where splitting text data out into a separate table yielded some good speedups.</li>
<li>26:10 Some discussion of <code>pg_bench</code>.</li>
<li>35:30 How long does it take to add an index to large tables?  They saw times of up to an hour for tables with millions of rows.</li>
<li>36:30 clustering your data in order to get PostgreSQL to write it more efficiently.</li>
<li>37:30-48:00 A thorough discussion of partitioning tables via table inheritance.  They used an ActiveRecord model (39:23) with a bunch of utility methods.  They also had a cron to periodically create new partitions.  At 45:15 they make a nice distinction between using partial indexes and partitions &#8211; one advantage is that a partition&#8217;s indexes can be different than its parents indexes.  At 49:00 they mention maybe doing a plugin, not sure if that happened.</li>
<li>52:00 Some discussion of full text search via <code>tsearch</code>.</li>
<li>53:00 PostgreSQL&#8217;s lack of built in replication outside of WAL shipping, Slony, etc.  Thank goodness 9.0 will address this!</li>
<li>54:00 Some props to <a href="http://www.engineyard.com/">Engine Yard</a> on their PostgreSQL support.</li>
</ul>
<p>Good stuff all around, and thanks to Pivotal for posting these great talks!</p>
<p></a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/rails-on-postgresql-pivotal-labs-talk-scaling-a-rails-app-with-postgres/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Estimating Replication Capacity</title>
		<link>http://guyub.co.id/estimating-replication-capacity/</link>
		<comments>http://guyub.co.id/estimating-replication-capacity/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 19:51:11 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[Server, Jaringan & Keamanan]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[capacity]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://guyub.co.id/estimating-replication-capacity/</guid>
		<description><![CDATA[It is easy for MySQL replication to become bottleneck when Master server is not seriously loaded and the more cores and hard drives the get the larger the difference becomes, as long as replication
remains single thread process.   At the same time it is a lot easier to optimize your system when your replication [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.mysqlperformanceblog.com/2010/07/20/estimating-replication-capacity/">It is easy for MySQL replication to become bottleneck when Master server is not seriously loaded and the more cores and hard drives the get the larger the difference becomes, as long as replication<br />
remains single thread process.   At the same time it is a lot easier to optimize your system when your replication runs normally &#8211; if you need to add/remove indexes and do other schema changes you probably would be looking at some methods involving replication if you can&#8217;t take your system down. So here comes the catch in many systems &#8211; we find system is in need for optimization when replication can&#8217;t catch up but yet optimization process we&#8217;re going to use relays on replication being functional and being able to catch up quickly.<br />
So the question becomes how can we estimate replication capacity, so we can deal with replication load before slave is unable to catch up.<br />
Need to replication capacity is not only needed in case you&#8217;re planning to use replication to perform system optimization, it is also needed on other cases. For example in sharded environment you may need to schedule downtime or set object read only to move it to another shard.  It is much nicer if it can be planned in advance rather than done on emergency basics when slave(s) are unable to catch up and application is suffering because of stale data.    This especially applies to Software as Service providers which often have very strict SLA agreements with their customers and which can have a lot of data per customer so move can take considerable amount of time.<br />
So what is replication capacity  I call replication capacity the ability to  replicate the master load.  If replication is able to replicate 3 times the write load from the master without falling behind I will call it replication capacity of 3.   When used with context of applying binary logs (for example point in time recovery from backup) replication capacity of 1 will mean  you can reply 1 hour worth of binary logs within 1 hour.     I will call &#8220;replication load&#8221;  the inverse of replication capacity &#8211;  this is basically what percentage of time the replication thread was busy replicating events vs staying idle.<br />
Note you can speak about idle replication capacity, when box does not do anything else as well as loaded replication capacity  when the box serves the normal load.  Both are important. You care about idle replication capacity when you have no load on the slave and need it to catch up or when restoring from backup, the loaded replication capacity matters during normal operation.<br />
So we defined what replication capacity is. There is however no tools which can tell us straight what replication capacity is for the given system.  It also tends to float depending on the load similar as loadavg metrics.      Here are some of the ways to measure it:<br />
1) Use &#8220;UserStats&#8221;  functionality from Google patches,  which is now available in Percona Server and MariaDB. This is the probably the easiest and most accurate approach but it<br />
does not work in Oracle MySQL Server.   set  userstat_running=1  and run following query:<br />
PLAIN TEXT<br />
SQL:</p>
<p>mysql&gt; SELECT * FROM information_schema.user_statistics WHERE user=&#8221;#mysql_system#&#8221; \G</p>
<p>*************************** 1. row ***************************</p>
<p>USER: #mysql_system#</p>
<p>TOTAL_CONNECTIONS: 1</p>
<p>CONCURRENT_CONNECTIONS: 0</p>
<p>CONNECTED_TIME: 446</p>
<p>BUSY_TIME: 74</p>
<p>CPU_TIME: 0</p>
<p>BYTES_RECEIVED: 0</p>
<p>BYTES_SENT: 63</p>
<p>BINLOG_BYTES_WRITTEN: 0</p>
<p>ROWS_FETCHED: 0</p>
<p>ROWS_UPDATED: 127576</p>
<p>TABLE_ROWS_READ: 4085689</p>
<p>SELECT_COMMANDS: 0</p>
<p>UPDATE_COMMANDS: 119127</p>
<p>OTHER_COMMANDS: 89557</p>
<p>COMMIT_TRANSACTIONS: 90259</p>
<p>ROLLBACK_TRANSACTIONS: 0</p>
<p>DENIED_CONNECTIONS: 1</p>
<p>LOST_CONNECTIONS: 0</p>
<p>ACCESS_DENIED: 0</p>
<p>EMPTY_QUERIES: 0</p>
<p>1 row IN SET &#40;0.00 sec&#41; </p>
<p>In this case CONNECTED_TIME is 446 second, out of this replication thread was busy (BUSY_TIME) 74 seconds which means replication capacity is  446/74 = 6<br />
You normally would not like to measure it from the start but rather take the difference in these counters every 5 minutes or other interval of your choice.<br />
2) Use full slow query log and mk-query-digest.  This method is great for one time execution especially as it comes together with giving you the list of queries which load replication<br />
the most.  It however works only with statement level replication.    You need to set  long_query_time=0 and log_slave_slow_statements=1 for this method to work.<br />
Get the log file which will include all queries MySQL server ran with their times and run mk-query-digest with filter to only check queries from replication thread:<br />
mk-query-digest slow-log &#8211;filter &#8216;($event->{user} || &#8220;&#8221;) =~ m/[SLAVE_THREAD]/&#8217; > /tmp/report-slave.txt<br />
In the report you will see something like this as a header:<br />
PLAIN TEXT<br />
SQL:</p>
<p># 475s user time, 1.2s system time, 80.41M rss, 170.38M vsz</p>
<p># Current date: Mon Jul 19 15:12:24 2010</p>
<p># Files: slow-log</p>
<p># Overall: 1.22M total, 1.27k unique, 558.56 QPS, 0.37x concurrency ______</p>
<p># total min max avg 95% stddev median</p>
<p># Exec time 819s 1us 92s 669us 260us 120ms 93us</p>
<p># Lock time 28s 0 166ms 23us 49us 192us 25us</p>
<p># Rows sent 4.27k 0 325 0.00 0 1.04 0</p>
<p># Rows exam 30.88M 0 1.28M 26.48 0 3.07k 0</p>
<p># Time range 2010-07-19 14:35:53 to 2010-07-19 15:12:22</p>
<p># bytes 350.99M 5 1022.34k 301.01 719.66 5.75k 124.25</p>
<p># Bytes sen 1.94M 0 9.42k 1.67 0 110.38 0</p>
<p># Killed 0 0 0 0 0 0 0</p>
<p># Last errn 34.11M 0 1.55k 29.26 0 185.83 0</p>
<p># Merge pas 0 0 0 0 0 0 0</p>
<p># Rows affe 875.19k 0 17.55k 0.73 0.99 25.61 0.99</p>
<p># Rows read 2.20M 0 14.83k 1.88 1.96 24.68 1.96</p>
<p># Tmp disk 4.15k 0 1 0.00 0 0.06 0</p>
<p># Tmp table 14.19k 0 2 0.01 0 0.14 0</p>
<p># Tmp table 8.30G 0 2.01M 7.12k 0 117.75k 0</p>
<p># 0% (5k) Filesort</p>
<p># 0% (5k) Full_join</p>
<p># 0% (7k) Full_scan</p>
<p># 0% (10k) Tmp_table</p>
<p># 0% (4k) Tmp_table_on_disk </p>
<p>There is a lot of interesting you can find out from this header but in relation to replication capacity &#8211; you can get replication load, which is same as &#8220;concurrency&#8221; figure (0.37x)   The concurrency as reported by mk-query-digest is sum of query execution time vs time range the log file covers.  In this case as we know there is only one replication thread it will be same as replication load.  This gives us replication capacity of  1/0.37 = 2.70<br />
This method should work with original MySQL Server in theory, though I have not tested it. Some versions had log_slave_slow_statements unreliable and also you may need to adjust regular expression for finding users replication thread uses.<br />
3) Processlist Pooling    This method is simple &#8211; the Slave thread has different status in Show Processlist depending on if it processes query or simply waiting.  By pooling processlist frequently (for example 10 times a second)  we can compute the approximate percentage the thread was busy vs idle.  Of course running processlist very aggressively can be an overhead especially if it is busy system with a lot of connections<br />
PLAIN TEXT<br />
SQL:</p>
<p>mysql&gt; SHOW processlist;</p>
<p>+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+</p>
<p>| Id | User | Host | db | Command | Time | State | Info |</p>
<p>+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+</p>
<p>| 801812 | system user | | NULL | Connect | 2665 | Waiting FOR master TO send event | NULL |</p>
<p>| 801813 | system user | | NULL | Connect | 0 | Has READ ALL relay log; waiting FOR the slave I/O thread TO UPDATE it | NULL |</p>
<p>| 802354 | root | localhost | NULL | Query | 0 | NULL | SHOW processlist |</p>
<p>+&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+</p>
<p>3 rows IN SET &#40;0.00 sec&#41; </p>
<p>4) Slave Catchup/Binlog Application method.   We can just get the spare server with backups restored on it and apply binary log to it. If 1 hour worth of binary logs applies for 10 minutes we have replication capacity of 6.    The challenge of course having spare server around and it is quite labor intensive. At the same time it can be good measurement to take during backup recovery trials when you&#8217;re doing this activity anyway.     Using this way you can also measure &#8220;cold&#8221; vs &#8220;hot&#8221; replication capacity as well as how long replication warmup takes.  It is very typical for servers with cold cache to perform a lot slower then they are warmed up.  Measuring times for each binary log separately should give you these numbers.<br />
The less intrusive process which can be done in production (especially if you have slave which is used for backups/reporting etc) is to stop the replication for some time and when see how long it takes to catch up.  If you paused replication for 10 minutes and it took 5 minutes to catch up your replication capacity will be 3 (not 2) because you not only had to process the events for outstanding 10 minutes but also for these 5 minutes it took to catch up.  The formula is  (Time_Replication_Paused+Time_Took_To_Catchup)/Time_Took_To_Catchup.<br />
So how much of replication capacity do you need in the healthy system ?  It depends a lot on many things including how fast do you need to be able to recover from backups and how much your load variance is.  A lot of systems have special requirements on the time it takes to warmup too (there are different things you can do about it too).   First I would measure replication capacity on 5 minute intervals (or something similar) because it tends to vary a lot.    When I would suggest to ensure the loaded replication capacity is at least 3 during the peak load and 5 during the normal load.  This applies to normal operational load &#8211; if you push heavy ALTER TABLE through replication they will surely get your replication capacity down for their duration.<br />
One more thing about these methods &#8211;  methods 1,2,3 work well only if replication capacity is above 1, so system is caught up.   If it is less than 1, so the master writes more binary logs than slave can process they will show number close to 1.  the method 4 however  with work even if replication can&#8217;t ever catch up  &#8211;  If  1 hour worth of binary logs takes 2 hours to apply, your replication capacity is 0.5.</p>
<p>    Entry posted by peter |<br />
      No comment<br />
    Add to:  |  |  |  | </a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/estimating-replication-capacity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Database Architectures &amp; Performance</title>
		<link>http://guyub.co.id/database-architectures-performance/</link>
		<comments>http://guyub.co.id/database-architectures-performance/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 12:12:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[Server, Jaringan & Keamanan]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://guyub.co.id/database-architectures-performance/</guid>
		<description><![CDATA[For decades the debate between shared-disk and shared-nothing databases has raged. The shared-disk camp points to the laundry list of functional benefits such as improved data consistency, high-availability, scalability and elimination of partitioning/replication/promotion. The shared-nothing camp shoots back with superior performance and reduced costs. Both sides have a point.First, let?s look at the performance issue. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://scaledb.blogspot.com/2010/07/database-architectures-performance.html">For decades the debate between shared-disk and shared-nothing databases has raged. The shared-disk camp points to the laundry list of functional benefits such as improved data consistency, high-availability, scalability and elimination of partitioning/replication/promotion. The shared-nothing camp shoots back with superior performance and reduced costs. Both sides have a point.First, let?s look at the performance issue. RAM (average access time of 200 nanoseconds) is considerably faster than disk (average access time of 12,000,000 nanoseconds). Let me put this 200:12,000,000 ratio into perspective. A task that takes a single minute in RAM would take 41 days in disk. So why do I bring this up?Shared-Nothing: Since the shared-nothing database has sole ownership of its data?it doesn?t share the data with other nodes?it can operate in the machine?s local RAM, only writing infrequently to disk (flushing the data to disk). This makes shared-nothing databases very fast.Shared-Disk: Cannot rely on the machine?s local RAM, because every write by one node must be instantly available to the other nodes, to ensure that they don?t use stale data and corrupt the database. So instead of relying on local RAM, all write transactions must be written to disk. This is where the 1 minute to 41 days ratio above comes into play and kills performance of shared-disk databases.Let?s look at some of the ways databases can utilize RAM instead of disk to improve performance:Read Cache: Databases typically use the RAM as a fast read cache. Upon reading data from the disk, this data is stored in the read cache so that subsequent use of that data is satisfied from RAM instead of the disk. For example, upon reading a person?s name from disk, that name is stored in the cache for fast access. The database wouldn?t need to read that name from disk again until that person?s name is changed (rare), or that RAM space is reused for a piece of data that is used more frequently. Read cache can significantly improve database performance. BOTH shared-disk and shared-nothing databases can exploit read cache. The shared-disk database just needs a system to either invalidate or update the data in read cache when one of the nodes has made a change. This is pretty standard in shared-disk databases.Background Writing: Writing data to the disk is by far the most time consuming process in a write transaction. During the transaction, that portion of the data is locked, meaning it is unavailable for other functions. So, if you can move the writing of the data outside of the transaction?write the data in the background?you get faster transactions, which means less locking contention, which means faster throughput. SHARED-NOTHING can exploit this performance enhancement, since each server owns the data in its RAM. However, shared-disk databases cannot do this because they need to share that updated data with the other database nodes in the cluster. Since the local node?s cache is not shared, in a shared-disk database, the only option is to use the shared disk to share that data across the nodes.Transactional Cache: The next step in utilizing RAM instead of disk is to use it in a transactional manner.  This means that the database can make multiple changes to data in RAM prior to writing the final results to disk. For example, if you have 100 widgets, you can store that inventory count in RAM, and then decrement it with each sale. If you sell 23 widgets, then instead of writing each transaction to disk, you update it in RAM. When you flush this data to disk, it results in a single disk write, writing the inventory number 77, instead of writing each of the 23 transactions individually to disk.SHARED-NOTHING can perform transactions on data while it is in RAM. Once again, shared-disk databases cannot do this because you might have multiple nodes updating the inventory. Since they cannot look into each others local RAM, they must once again write each transaction to disk.As you can see, shared-nothing databases have an inherent performance advantage. The next blog post will address how modern shared-disk databases address these performance challenges.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/database-architectures-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OpenSQL Camp Europe: Time to cast your votes!</title>
		<link>http://guyub.co.id/opensql-camp-europe-time-to-cast-your-votes/</link>
		<comments>http://guyub.co.id/opensql-camp-europe-time-to-cast-your-votes/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 13:46:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[F/OSS]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[europe]]></category>
		<category><![CDATA[opensql]]></category>

		<guid isPermaLink="false">http://guyub.co.id/opensql-camp-europe-time-to-cast-your-votes/</guid>
		<description><![CDATA[
If you wonder why there hasn&#8217;t been an update from me for quite a while &#8212; I just returned from two months of paternal leave, in which I actually managed to stay away from the PC most of the time. In the meanwhile, I&#8217;ve officially become an Oracle employee and there is a lot of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.lenzg.net/archives/300-OpenSQL-Camp-Europe-Time-to-cast-your-votes!.html"><br />
If you wonder why there hasn&#8217;t been an update from me for quite a while &mdash; I just returned from two months of paternal leave, in which I actually managed to stay away from the PC most of the time. In the meanwhile, I&#8217;ve officially become an Oracle employee and there is a lot of administrative things to take care of&#8230; But it feels good to be back!</p>
<p>During my absence, Giuseppe and Felix kicked off the Call for Papers for this year&#8217;s European OpenSQL Camp, which will again take place in parallel to FrOSCon in St. Augustin (Germany) on August 21st/22nd. We&#8217;ve received a number of great submissions, now we would like to ask our community about your favourites!</p>
<p>Basically it&#8217;s &#8220;one vote per person per session&#8221; and you can cast your votes in two ways, either by twittering @opensqlcamp or via the opensqlcamp mailing list. The procedure is outlined in more detail on this wiki page. </p>
<p>As we need to finalize the schedule and inform the speakers, the voting period will close this coming Sunday, 18th of July. So don&#8217;t hesitate, cast your votes now! Based on your feedback we will compile the session schedule for this year&#8217;s camp. Thanks for your help!</a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/opensql-camp-europe-time-to-cast-your-votes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Making ?Insert Ignore? Fast, by Avoiding Disk Seeks</title>
		<link>http://guyub.co.id/making-insert-ignore-fast-by-avoiding-disk-seeks/</link>
		<comments>http://guyub.co.id/making-insert-ignore-fast-by-avoiding-disk-seeks/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 13:57:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[Pemrograman]]></category>
		<category><![CDATA[Server, Jaringan & Keamanan]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://guyub.co.id/making-insert-ignore-fast-by-avoiding-disk-seeks/</guid>
		<description><![CDATA[
In my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use ?replace into? or ?insert ignore? over normal inserts, because [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://tokutek.com/2010/07/making-insert-ignore-fast-by-avoiding-disk-seeks/"><br />
In my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use ?replace into? or ?insert ignore? over normal inserts, because the semantics of these statements do NOT require disk seeks. In my post last week, I explained how the command ?replace into? can be fast with TokuDB&#8217;s fractal trees. Today, I explain how &#8220;insert ignore&#8221; can be fast, using a strategy that is very similar to what we do with &#8220;replace into&#8221;.</p>
<p>The semantics of &#8220;insert ignore&#8221; are similar to that of &#8220;replace into&#8221;:</p>
<p> if the primary (or unique) key does not exist: insert the new row<br />
 if the primary (or unique) key does exist: do nothing</p>
<p>B-trees have the same problem with &#8220;insert ignore&#8221; that they have with &#8220;replace into&#8221;. They perform a lookup of the primary key, incurring a disk seek. We have already shown how fractal trees do not incur this disk seek for &#8220;replace into&#8221;, so let&#8217;s see how we can avoid disk seeks with &#8220;insert ignore&#8221;.</p>
<p>The only difference with &#8220;replace into&#8221; is when the primary (or unique) key exists, instead of overwriting the old row with the new row, we disregard the new row. So, all we need to do is tweak our tombstone messaging scheme (that we use for deletes and &#8220;replace into&#8221;) so that when &#8220;insert ignore&#8221; commands do not overwrite old rows with new rows. Similar to deletes and replace into, with this scheme, &#8220;insert ignore? can be two orders of magnitude faster than insertions into a B-tree.</p>
<p>Here is what we do. We insert a message into the fractal tree, with a new message &#8220;ii&#8221;, to signify that we are doing an &#8220;insert ignore&#8221;. The only difference between this message and the normal &#8220;i&#8221; message for insertions is what we do on queries and merges. On queries, if the message is an &#8220;ii&#8221;, then the value in the LOWER node is read, and not the higher node. On merges, if the higher node has a message of &#8220;ii&#8221;, the value in the LOWER node takes precedence over the value in the higher node.</p>
<p>Let&#8217;s look at an example that is similar to what we looked at for &#8220;replace into&#8221;:</p>
<p>create table foo (a int, b int, primary key (a));</p>
<p>Suppose the fractal tree for this table looks as follows:</p>
<p>- </p>
<p>- -</p>
<p>- &#8211; - -</p>
<p>&#8230;.</p>
<p>(i (1,1)) (i (2,2)) (i (3,3)) (i (4,4)) &#8230; (i (1000,1000)) &#8230; (i (2^32, 2^32))</p>
<p>The ?i? stands for insertion message. Now suppose we do:</p>
<p>insert ignore into foo values (1000, 1001).</p>
<p>With fractal trees, we insert (ii (1000,1001)) into the top node. The tree then looks as such:</p>
<p>(ii (1000,1001)) </p>
<p>- -</p>
<p>- &#8211; - -</p>
<p>&#8230;.</p>
<p>(i (1,1)) (i (2,2)) (i (3,3)) (i (4,4)) &#8230; (i (2^32, 2^32))</p>
<p>So upon querying the key ?1000&#8242;, a cursor notices that (1000,1001) has a message of &#8220;ii&#8221;. If it finds another value for the key 1000 in a lower node, it reads that value, otherwise, it reads (1000,1001). Because (1000,1000) is located in a lower node, the cursor returns (1000,1000) to the user. On merges, the message in the lower node, (1000,1000) overwrites the message in the higher node, (1000,1001).</p>
<p>While &#8220;insert ignore&#8221; can be fast, there are caveats (indexes, triggers, replication), just as there are with &#8220;replace into&#8221;. In a future posting, I will get into some of them.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/making-insert-ignore-fast-by-avoiding-disk-seeks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Upgrading to MySQL 5.1</title>
		<link>http://guyub.co.id/upgrading-to-mysql-5-1/</link>
		<comments>http://guyub.co.id/upgrading-to-mysql-5-1/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 16:36:57 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Aplikasi]]></category>
		<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[F/OSS]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[upgrade]]></category>

		<guid isPermaLink="false">http://guyub.co.id/upgrading-to-mysql-5-1/</guid>
		<description><![CDATA[We have been using MySQL 5.1 on a few servers for which partitioning is a much better way to purge old data than delete. We have been working to upgrade more servers despite claims that some of us may have made in the past about using MySQL 4.0 or 5.0 forever.
We spent a lot of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.facebook.com/note.php?note_id=406991295932">We have been using MySQL 5.1 on a few servers for which partitioning is a much better way to purge old data than delete. We have been working to upgrade more servers despite claims that some of us may have made in the past about using MySQL 4.0 or 5.0 forever.</p>
<p>We spent a lot of time to confirm that MySQL 5.1 was stable and performant using benchmarks and our production workload. mk-upgrade from Maatkit was one of the tools we used. Concurrent dump/reload tests were done to measure performance and check for data drift after reload. A custom tool that replays production workload was run to compare performance between MySQL 5.0 and 5.1. We started with MySQL 5.1.38 and now are at MySQL 5.1.47 with several backports for bugs that will be fixed in more recent 5.1 releases or in 5.5.</p>
<p>We found a few serious bugs in MySQL 5.1 during this process. We fixed some of the bugs, worked with MySQL support to debug some of them and waited for MySQL to fix many others. MySQL support and developers were a huge help. It is great to have so much access to experts. MySQL has been getting things done at an amazing rate this year.</p>
<p>I am excited about MySQL 5.1 and 5.5. With a few recent changes to the Facebook patch we have been able to increase peak QPS by more than 2X and peak IOPs by more than 3X using benchmarks. There are more improvements to be done. Whether or not we match the benchmark results in production, I much prefer an RDBMS that can exceed 100,000 QPS and IOPs than one that is saturated at 10,000. Any of the changes we make for 5.1 will look even better with MySQL 5.5 given support for multiple InnoDB buffer pool instances and some of the changes above the storage engine layer that aren&#8217;t easy to describe in a few sentences.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/upgrading-to-mysql-5-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Improving MySQL Productivity – From Design to Implementation</title>
		<link>http://guyub.co.id/improving-mysql-productivity-from-design-to-implementation/</link>
		<comments>http://guyub.co.id/improving-mysql-productivity-from-design-to-implementation/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 15:18:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[Pemrograman]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[implementation]]></category>
		<category><![CDATA[mysqk]]></category>

		<guid isPermaLink="false">http://guyub.co.id/improving-mysql-productivity-from-design-to-implementation/</guid>
		<description><![CDATA[My closing presentation at the dedicated MySQL track at ODTUG Kaleidoscope 2010 discussed various techniques and best practices for improving the ROI of developer resources using MySQL.  Included in the sections on Design, Security, Development, Testing, Implementation, Instrumentation and Support were also a number of horror stories of not what to do, combined with [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://ronaldbradford.com/blog/improving-mysql-productivity-from-design-to-implementation-2010-07-01/">My closing presentation at the dedicated MySQL track at ODTUG Kaleidoscope 2010 discussed various techniques and best practices for improving the ROI of developer resources using MySQL.  Included in the sections on Design, Security, Development, Testing, Implementation, Instrumentation and Support were also a number of horror stories of not what to do, combined with practical examples of improving productivity.<br />
Increasing MySQL Productivity<br />
View more presentations from Ronald Bradford.<br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/improving-mysql-productivity-from-design-to-implementation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Running MySQL Cluster without arbitrator: what it&#8217;s really about.</title>
		<link>http://guyub.co.id/running-mysql-cluster-without-arbitrator-what-its-really-about/</link>
		<comments>http://guyub.co.id/running-mysql-cluster-without-arbitrator-what-its-really-about/#comments</comments>
		<pubDate>Sat, 26 Jun 2010 14:00:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[Server, Jaringan & Keamanan]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[arbitrator]]></category>
		<category><![CDATA[cluster]]></category>
		<category><![CDATA[mysq]]></category>

		<guid isPermaLink="false">http://guyub.co.id/running-mysql-cluster-without-arbitrator-what-its-really-about/</guid>
		<description><![CDATA[Geert made us aware that MySQL Cluster now provides the possibility to disable arbitration in order to use an external arbitration mechanism. This is a really important feature, because&#8230; well, not really, but only because I was the one who designed it 
Coming up with the concept and the two parameters Arbitration=WaitExternal and ArbitrationTimeout=n took [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://openlife.cc/blogs/2010/june/running-mysql-cluster-without-arbitrator-what-its-really-about">Geert made us aware that MySQL Cluster now provides the possibility to disable arbitration in order to use an external arbitration mechanism. This is a really important feature, because&#8230; well, not really, but only because I was the one who designed it <img src='http://guyub.co.id/site/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
Coming up with the concept and the two parameters Arbitration=WaitExternal and ArbitrationTimeout=n took a few weeks of discussion. Once we agreed on how to do it, I think Jonas coded it in 20 minutes on the mezzanine floor of the Hyatt, Santa Clara. After that MySQL conference I soon resigned from Sun, so I had now idea what then happened to this feature.<br />
read more</a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/running-mysql-cluster-without-arbitrator-what-its-really-about/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Benchmarking MySQL ACID performance with SysBench</title>
		<link>http://guyub.co.id/benchmarking-mysql-acid-performance-with-sysbench/</link>
		<comments>http://guyub.co.id/benchmarking-mysql-acid-performance-with-sysbench/#comments</comments>
		<pubDate>Sun, 20 Jun 2010 16:40:50 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Basisdata]]></category>
		<category><![CDATA[Server, Jaringan & Keamanan]]></category>
		<category><![CDATA[Sindikasi]]></category>
		<category><![CDATA[acid]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[sysbench]]></category>

		<guid isPermaLink="false">http://guyub.co.id/benchmarking-mysql-acid-performance-with-sysbench/</guid>
		<description><![CDATA[A couple of question I get a lot from MySQL customers is &#8220;how will this hardware upgrade improve my transactions per second (TPS)&#8221; and &#8220;what level of TPS will MySQL perform on this hardware if I&#8217;m running ACID settings?&#8221; Running sysbench against MySQL with different values for per-thread and global memory buffer sizes, ACID settings, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://feedproxy.google.com/~r/Themattreid/~3/m7-NK79I2BE/">A couple of question I get a lot from MySQL customers is &#8220;how will this hardware upgrade improve my transactions per second (TPS)&#8221; and &#8220;what level of TPS will MySQL perform on this hardware if I&#8217;m running ACID settings?&#8221; Running sysbench against MySQL with different values for per-thread and global memory buffer sizes, ACID settings, and other settings gives me concrete values to bring to the customer to show the impact that more RAM, faster CPUs, faster disks, or cnf changes have on the server. Here are some examples for a common question: &#8220;If I&#8217;m using full ACID settings vs non-ACID settings what performance am I going to get from this server?&#8221;<br />
Let&#8217;s find out by running sysbench with the following settings (most are self explanatory &#8211; if not the man page can explain them):</p>
<p>sysbench &#8211;test=oltp &#8211;db-driver=mysql &#8211;oltp-table-size=1000000 &#8211;mysql-engine-trx=yes &#8211;oltp-test-mode=complex &#8211;oltp-read-only=off &#8211;oltp-dist-type=special &#8211;max-requests=0 &#8211;num-threads=8 &#8211;max-time=120 &#8211;init-rng=on run</p>
<p>MySQL Settings:<br />
In the first test MySQL is set to the following ACID related settings. This will give us results for TPS performance without full ACID compliance &#8211; very common settings on a server that is handling blogs, ad serving, general business websites, and other roles where full ACID is not required and performance is valued over the benefits of full ACID. These are important settings when we look at the difference in performance when we change to full ACID in the second test.</p>
<p>innodb_flush_log_at_trx_commit = 0<br />
sync_binlog=0<br />
transaction-isolation=REPEATABLE-READ</p>
<p>System configuration and InnoDB buffer pool size:</p>
<p>XEON E5345 Series 2.33ghz 8-core, 16GB RAM, Local SATA 7.2K disks<br />
innodb_buffer_pool_size = 10G</p>
<p>Full result set from sysbench:<br />
Summary OLTP test statistics:</p>
<p>queries performed:<br />
transactions: ? ? ? ? ? ? ? ? ? ? ? ?172426 (1436.83 per sec.)<br />
read/write requests: ? ? ? ? ? ? ? ? 3276664 (27304.51 per sec.)<br />
other operations: ? ? ? ? ? ? ? ? ? ?344882 (2873.91 per sec.)</p>
<p>Non-ACID results:<br />
We can simplify the results by looking at the following TPS results for this non-ACID test:</p>
<p>transactions: ? ? ? ? ? ? ? ? ? ? ? ?172426 (1436.83 per sec.)</p>
<p>Full ACID results:<br />
Let&#8217;s go ahead and run the test again with different ACID settings. This will give us the TPS results for full ACID compliance:</p>
<p>innodb_flush_log_at_trx_commit = 1<br />
sync_binlog=1<br />
transaction-isolation=REPEATABLE-READ</p>
<p>We get the following results for TPS:</p>
<p> transactions: ? ? ? ? ? ? ? ? ? ?  3197 ? (26.58 per sec.)<br />
 read/write requests: ? ? ? ? ? ? ? ? 60743 ?(505.04 per sec.)<br />
 other operations: ? ? ? ? ? ? ? ? ? ?6394 ? (53.16 per sec.)</p>
<p>Final Results:<br />
So as you can see the difference between full ACID settings and not (on the same server with only those values on the cnf being changed) results in a huge difference in performance on this standard database server. We can now hand this data to the customer and they will know what impact the settings will have on their application&#8217;s performance and what to expect when running full ACID vs non-ACID.<br />
More info on using sysbench here:?http://sysbench.sourceforge.net<br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://guyub.co.id/benchmarking-mysql-acid-performance-with-sysbench/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
