Pencarian

Rss Posts

 

 

 

Berita pada bulan August, 2010

Why GRANT ALL is bad

Aug 06, 2010

A common observation for many LAMP stack products is the use of poor MySQL security practices. Even for more established products such as Wordpress don’t always assume that the provided documentation does what it best for you. As per my earlier posts where I detailed installation instructions and optimal permissions for both WordPress and Mediawiki, and not just directed readers to online documentation.
In this post I will detail why GRANT ALL is bad.
Let’s start with what GRANT ALL [PRIVILEGES] provides. As per the MySQL 5.1 Reference Manual you get the following:
ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE USER, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, FILE, INDEX, INSERT, LOCK TABLES, PROCESS, REFERENCES, RELOAD, REPLICATION CLIENT, REPLICATION SLAVE, SELECT, SHOW DATABASES, SHOW VIEW, SHUTDOWN, SUPER, TRIGGER, UPDATE, USAGE
I am going to focus on just one privilege that is included with ALL, and that is SUPER. This privilege can do the following which can be destructive for an application level user:

Bypasses read_only
Bypasses init_connect
Can Disable binary logging
Change configuration dynamically
No reserved connection

User Permissions
This is how a user should be created, granting only the required permissions to a given schema.

CREATE USER goodguy@localhost IDENTIFIED BY ’sakila’;
GRANT CREATE,SELECT,INSERT,UPDATE,DELETE ON odtug.* TO goodguy@localhost;

This is what is commonly seen.

CREATE USER superman@’%';
GRANT ALL ON *.* TO superman@’%';

Bypasses read_only
Many MySQL replication environments rely on ensuring the MySQL slave is consistent with the master. Did you know that an application can bypass this security when read_only=true is used?

$ mysql -ugoodguy -psakila odtug
mysql> insert into test1(id) values(1);
ERROR 1290 (HY000): The MySQL server is running with the –read-only option so it cannot execute this statement

$ mysql -usuperman odtug
mysql> insert into test1(id) values(1);
Query OK, 1 row affected (0.01 sec)

GRANT ALL is bad for data consistency.
Bybasses init_connect
A common practices used for UTF8 communications is to use the init_connect configuration variable.

#my.cnf
[client]
init_connect=SET NAMES utf8

$ mysql -ugoodguy -psakila odtug

mysql> SHOW SESSION VARIABLES LIKE ‘ch%’;
+————————–+———-+
| Variable_name | Value |
+————————–+———-+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+————————–+———-+

$ mysql -usuperman odtug

mysql> SHOW SESSION VARIABLES LIKE ‘character%’;
+————————–+———-+
| Variable_name | Value |
+————————–+———-+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+————————–+———-+

GRANT ALL is bad for data integrity.
Disables Binary Logging.

$ mysql -usuperman odtug

mysql> SHOW MASTER STATUS;
+——————-+———-+————–+——————+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+——————-+———-+————–+——————+
| binary-log.000001 | 354 | | |
+——————-+———-+————–+——————+

mysql> DROP TABLE time_zone_leap_second;
mysql> SET SQL_LOG_BIN=0;
mysql> DROP TABLE time_zone_name;
mysql> SET SQL_LOG_BIN=1;
mysql> DROP TABLE time_zone_transition;
mysql> SHOW MASTER STATUS;
+——————-+———-+————–+——————+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+——————-+———-+————–+——————+
| binary-log.000001 | 674 | | |
+——————-+———-+————–+——————+

$ mysqlbinlog binary-log.000001 –start-position=354 –stop-position=674

# at 354
#100604 18:00:08 server id 1 end_log_pos 450 Query thread_id=1 exec_time=0 error_code=0
use mysql/*!*/;
SET TIMESTAMP=1275688808/*!*/;
DROP TABLE time_zone_leap_second
/*!*/;
# at 579
#100604 18:04:31 server id 1 end_log_pos 674 Query thread_id=2 exec_time=0 error_code=0
use mysql/*!*/;
SET TIMESTAMP=1275689071/*!*/;
DROP TABLE time_zone_transition
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;

Should that statement be run on MySQL Slaves?
Is the binary log used for any level of auditing?
GRANT ALL is bad for slave consistency.
The reserved connection
MySQL reserved one connection for an administrator to be able to login to a server. For example.

$ mysql -uroot

mysql> show global variables like ‘max_connections’;
+—————–+——-+
| Variable_name | Value |
+—————–+——-+
| max_connections | 3 |
+—————–+——-+
1 row in set (0.07 sec)

mysql> show global status like ‘threads_connected’;
+——————-+——-+
| Variable_name | Value |
+——————-+——-+
| Threads_connected | 4 |
+——————-+——-+

mysql> SHOW PROCESSLIST;
+—-+——+———–+——-+———+——+————+—————
| Id | User | Host | db | Command | Time | State | Info
+—-+——+———–+——-+———+——+————+—————
| 13 | good | localhost | odtug | Query | 144 | User sleep | UPDATE test1 …
| 14 | good | localhost | odtug | Query | 116 | Locked | select * from test1
| 15 | good | localhost | odtug | Query | 89 | Locked | select * from test1
| 15 | root | localhost | odtug | Query | 89 | Locked | SHOW PROCESSLIST

However if all application users are already using the SUPER privilege, the administrator will get.

$ mysql -uroot
ERROR 1040 (HY000): Too many connections

There is no way to be able to login and see what’s happening, or kill threads for example. In this case you either wait, or you are required to kill the mysqld process, or fine the client threads to kill. The result of the former may lead to a corrupt database requiring additional recovery.
GRANT ALL is bad for system administration and monitoring.
Conclusion
Don’t use GRANT ALL for application users. For more information, including why I only listed just 5 issues, check out my MySQL Idiosyncrasies that BITE presentation. I will also be presenting this talk at MySQL Sunday at Oracle Open World in September.

Open Source BI Report tools

Aug 06, 2010

I am going to take a mulligan on a previous Blog Post on Open Source BI reporting tools. Part of my job at Calpont is getting familiar with tools folks with use with the InfiniDB storage engine. After all, what is the point of having all those terabytes in a data warehouse if you can not produce some sort of report from them. Three popular open source BI reporting tools are available from BIRT, Pentaho, and Jaspersoft.Proceeding alphabetically, BIRT is part of the Eclipse IDE world. A BI report is a new project or new report under a project. And if you like Eclipse or Java IDEs, then you will probably like BIRT for reports.Pentaho’s Rerport Designer is a stand alone program. As is iReport from Jaspersoft. I had no problems connection to a data mart in a InfiniDB instance with the old JDBC connection with any of the products. And all three produced reports of various levels of complexity from my SQL. Plus you can pretty much format the exams to your heart’s content.So which is better? That would be up to the user having to run the software. For those of you really interested, there will be quick start guides up soon on each of the three BI tools on the Calpont InfiniDB website. If I missed your favorite BI reporting tool, please let me know.Now I am digging through the ETL tools. I was getting a millions and a half rows a second into an InfiniDB table using the cpimport tool. Now to get the tools and cpimport working together!

First Development Milestone of ZF 2.0 Released – Zend Developer Zone

Aug 06, 2010

Yesterday, the Zend Framework team tagged the first development milestone of
Zend Framework 2.0 (2.0.0dev1). It is immediately downloadable from the Zend
Framework servers:

ZendFramework 2.0.0dev1 zip package
ZendFramework 2.0.0dev1 tar.gz package

NOTE! This release is not considered of production quality, and is
released solely to provide a development snapshot for purposes of testing
and research. Use at your own risk.

Serialization surprises

Aug 06, 2010

CONTEXT

I want to share a problem that we had in our project. We were doing a real-time “Profits and Loss” server (P&L). The server sends stock updates to all the users subscribed to the stocks, basically as Google Finance or Yahoo Finance.

SIMPLE IMPLEMENTATION

I will used a basic approach (no aggregation and no optimization) to explain the problem that we had with Serialization and Hessian.

The server always keep the last value on the stocks, because when a user ask for a quote we want to send back as soon as possible the
last value (from the cache) that we had on that stock).

We kept the stock prices (BID and ASK) in a simple object : SymbolSerializable.

When a update is received for a stock, we update the values in his SymbolSerializable and send back the updated values to all
client subscribed to this stock.


...

symbol.setBid(update.getBid());
symbol.setAsk(update.getAsk());

...

for(Client client : clientList){
    client.sendUpdate(symbol);
}

....

It can’t be more simpler than that. The client will received each updates on his stocks.

The problems that we had was surprising. The clients were receiving the always the same price on a stock !

WHY ?

here an example :

stock : ABC

Updates :

1 : bid=10.25$, ask=10.50$
2 : bid=11.50$, ask=11.75$
3 : bid=12.00$, ask=12.15$

and the clients received :

1 : bid=10.25$, ask=10.50$
2 : bid=10.25$, ask=10.50$
3 : bid=10.25$, ask=10.50$

To debug our server we printed the values sent to the client, and we saw in the server’s logs that the values were corrects.

value sent to client :
bid=10.25$, ask=10.50$

value sent to client :
bid=11.50$, ask=11.75$

value sent to client :
bid=12.00$, ask=12.15$

Everything look fine, why it doesn’t work on the client side ?

INVESTIGATION/SOLUTION

The only thing that could cause the problem were our Serialization in the server.

I’ll show you four different test cases. See the implementations below.

  • #1 : Reference implementation using Serializable (failed)
  • #2 : Changed Serializable for Externalizable interface (failed)
  • #3 : Serializable using new (passed)
  • #4 : Serializable using reset (passed)
#1 : Serializable : Reference #2 : Externalizable #3 : Serializable with new #4 : Serializable with reset
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • All items read
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • bid= 0.0 ask= 0.0
  • All items read
  • bid= 0.0 ask= 0.0
  • bid= 1.0 ask= 1.0
  • bid= 2.0 ask= 2.0
  • bid= 3.0 ask= 3.0
  • bid= 4.0 ask= 4.0
  • bid= 5.0 ask= 5.0
  • bid= 6.0 ask= 6.0
  • bid= 7.0 ask= 7.0
  • bid= 8.0 ask= 8.0
  • bid= 9.0 ask= 9.0
  • All items read
  • bid= 0.0 ask= 0.0
  • bid= 1.0 ask= 1.0
  • bid= 2.0 ask= 2.0
  • bid= 3.0 ask= 3.0
  • bid= 4.0 ask= 4.0
  • bid= 5.0 ask= 5.0
  • bid= 6.0 ask= 6.0
  • bid= 7.0 ask= 7.0
  • bid= 8.0 ask= 8.0
  • bid= 9.0 ask= 9.0
  • All items read

Here the Serializable pojo


package ca.sebastiendionne.demo.model;

import java.io.Serializable;

public class SymbolSerializable implements Serializable {

    private static final long serialVersionUID = 7853892880704628717L;

    Double bid = null;
    Double ask = null;

    public SymbolSerializable(){
    }

    public Double getBid() {
        return bid;
    }
    public void setBid(Double bid) {
        this.bid = bid;
    }
    public Double getAsk() {
        return ask;
    }
    public void setAsk(Double ask) {
        this.ask = ask;
    }

    @Override
    public String toString() {
        return "bid = " + bid + "  ask=" + ask;
    }

}

Here the Externalizable pojo


package ca.sebastiendionne.demo.model;

import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;

public class SymbolExternalizable implements Externalizable {
    private static final long serialVersionUID = 1853892880704628717L;

    Double bid = null;
    Double ask = null;

    public SymbolExternalizable(){

    }

    public Double getBid() {
        return bid;
    }
    public void setBid(Double bid) {
        this.bid = bid;
    }
    public Double getAsk() {
        return ask;
    }
    public void setAsk(Double ask) {
        this.ask = ask;
    }

    @Override
    public String toString() {
        return "bid = " + bid + "  ask=" + ask;
    }

    @Override
    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
        bid = (Double)in.readObject();
        ask = (Double)in.readObject();
    }

    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeObject(bid);
        out.writeObject(ask);
    }

}

For the test cases I use a for loop to create Symbol and serialized on the hard drive and unserialized them back.

Here the code for the test case #1 (used as reference)


package ca.sebastiendionne.demo;

import java.io.EOFException;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;

import ca.sebastiendionne.demo.model.SymbolSerializable;

public class SerializationFailed {

    /**
     * @param args
     */
    public static void main(String[] args) throws Exception {

        ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(new File("test.ser"))); 

        SymbolSerializable symbol = new SymbolSerializable();

        // dummy values
        symbol.setAsk(new Double(-1));
        symbol.setBid(new Double(-1));

        for(int i=0;i<10;i++){
            // update values
            symbol.setAsk(new Double(i));
            symbol.setBid(new Double(i));

            oos.writeObject(symbol);
            oos.flush();
        }

        oos.flush();
        oos.close();

        // read

        ObjectInputStream ois = new ObjectInputStream(new FileInputStream(new File("test.ser")));

        try {
            Object obj = null;
            while((obj = ois.readObject())!=null){
                System.out.println((SymbolSerializable)obj);
            }
        } catch (EOFException e) {
            System.out.println("All items read");
        } catch(Exception e){
            e.printStackTrace();
        } finally {
            ois.close();
        }

    }

}

Here the code for the test case #2


package ca.sebastiendionne.demo;

import java.io.EOFException;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;

import ca.sebastiendionne.demo.model.SymbolExternalizable;

public class SerializationFailed2 {

    /**
     * @param args
     */
    public static void main(String[] args) throws Exception {

        ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(new File("test.ser"))); 

        SymbolExternalizable symbol = new SymbolExternalizable();

        // dummy values
        symbol.setAsk(new Double(-1));
        symbol.setBid(new Double(-1));

        for(int i=0;i<10;i++){
            // update values
            symbol.setAsk(new Double(i));
            symbol.setBid(new Double(i));

            oos.writeObject(symbol);
            oos.flush();
        }

        oos.flush();
        oos.close();

        // read

        ObjectInputStream ois = new ObjectInputStream(new FileInputStream(new File("test.ser")));

        try {
            Object obj = null;
            while((obj = ois.readObject())!=null){
                System.out.println((SymbolExternalizable)obj);
            }
        } catch (EOFException e) {
            System.out.println("All items read");
        } catch(Exception e){
            e.printStackTrace();
        } finally {
            ois.close();
        }

    }

}

Here the code for the test case #3


package ca.sebastiendionne.demo;

import java.io.EOFException;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;

import ca.sebastiendionne.demo.model.SymbolSerializable;

public class SerializationWithNew {

    /**
     * @param args
     */
    public static void main(String[] args) throws Exception {

        ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(new File("test.ser"))); 

        SymbolSerializable symbol = new SymbolSerializable();

        // dummy values
        symbol.setAsk(new Double(-1));
        symbol.setBid(new Double(-1));

        for(int i=0;i<10;i++){
            // update values
            symbol.setAsk(new Double(i));
            symbol.setBid(new Double(i));

            SymbolSerializable symbol2 = new SymbolSerializable();
            symbol2.setAsk(symbol.getAsk());
            symbol2.setBid(symbol.getBid());

            oos.writeObject(symbol2);
            oos.flush();
        }

        oos.flush();
        oos.close();

        // read

        ObjectInputStream ois = new ObjectInputStream(new FileInputStream(new File("test.ser")));

        try {
            Object obj = null;
            while((obj = ois.readObject())!=null){
                System.out.println((SymbolSerializable)obj);
            }
        } catch (EOFException e) {
            System.out.println("All items read");
        } catch(Exception e){
            e.printStackTrace();
        } finally {
            ois.close();
        }

    }

}

Here the code for the test case #4


package ca.sebastiendionne.demo;

import java.io.EOFException;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;

import ca.sebastiendionne.demo.model.SymbolSerializable;

public class SerializationWithReset {

    /**
     * @param args
     */
    public static void main(String[] args) throws Exception {

        ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(new File("test.ser"))); 

        SymbolSerializable symbol = new SymbolSerializable();

        // dummy values
        symbol.setAsk(new Double(-1));
        symbol.setBid(new Double(-1));

        for(int i=0;i<10;i++){
            // update values
            symbol.setAsk(new Double(i));
            symbol.setBid(new Double(i));

            oos.writeObject(symbol);
            oos.flush();
            oos.reset(); // magic line
        }

        oos.flush();
        oos.close();

        // read

        ObjectInputStream ois = new ObjectInputStream(new FileInputStream(new File("test.ser")));

        try {
            Object obj = null;
            while((obj = ois.readObject())!=null){
                System.out.println((SymbolSerializable)obj);
            }
        } catch (EOFException e) {
            System.out.println("All items read");
        } catch(Exception e){
            e.printStackTrace();
        } finally {
            ois.close();
        }

    }

}

I can't say what is the performance impact if we use reset() on each updates sent. It is better to use new or reset ?

You can follow me on Twitter : http://twitter.com/survivant

Rails on PostgreSQL: PGCon 2010 talk on Rails and PostgreSQL

Aug 03, 2010

A while back I posted a link to a talk by Gleb Arshinov that he gave at the SF PUG. This talk was on “PostgreSQL for high performance Rails apps”, and was full of fine suggestions from their experiences with their Rails apps.

Gleb is back again, this time on May 21 2010 at PGCon where he and Alexander Dymo talked about PostgreSQL as a secret weapon for Rails apps. Some of the same ground is covered (use SQL DDL vs ActiveRecord create_table, etc), but there’s lots of new information too. Here are some notes:

  • 1:10 They’re using PostgreSQL 8.4, nginx, and mongrel
  • 4:00-6:00 Talks about dropping down into SQL via ActiveRecord
  • 6:30 Use include to eliminate N+1 queries.
  • 7:30 Watch for things like acts_as_tree that reintroduce lots of queries in exchange for the improvement in abstraction.
  • 9:00 One query, 12 joins – complicated, but query time goes from 8 seconds to 60 ms.
  • 14:00-17:00 A technique for recording SQL queries; this helps ensure you’re not running unexpected queries
  • 19:00 Suggests use straight SQL for DDL rather than the ActiveRecord DSL
  • 20:00 Use constraints, FKs, etc to preserve data integrity – “anything you don’t have a constraint on will get corrupted”
  • 23:00 Don’t use CASCADE since app won’t know about the deletions
  • 28:00 Keep a log of times for the most frequent user requests. Alex suggests using integration tests for this; code is at 29:10 and 29:30.
  • 32:30 A technique for loading data with ActiveRecord’s select option with PostgreSQL arrays to save on object creation. Questions from the audience about normalization vs efficiency.
  • 38:50 Role/user/privilege checking can be slow; shows a technique for using PostgreSQL’s bool_or and GROUP BY to get the data in one fell swoop. Query time went from 2+ seconds to 64 ms.
  • 42:00 Do analytics in the database. Saw speed improve from 90s to 5s and saved tons of RAM.
  • 44:40 Some excellent new PostgreSQL features that are either here now or are on the way (replication, windowing functions)
  • 46:30 Demonstrates a problem with PostgreSQL’s LIMIT and OFFSET when used with subselects. Some discussion of pagination with the audience. Here’s an excellent discussion of pagination alternatives written by Justin French.
  • 50:30 How to force PostgreSQL to use a subselect vs a join; the example goes from 605ms to 325 ms.
  • 52:20 Be careful with generate_series. Apparently these functions cannot generate hints for the planner.
  • 55:30 General props to PostgreSQL community.
  • 59:40 Need to test queries both in cold state and hot state; they saw 14x speed difference.
  • 1:01:40 Tune PostgreSQL – shared_buffers, work_mem, autovacuum, etc. Rely on community knowledge for initial configuration.

Lots of good stuff there, enjoy!

MySQL Master HA at Yahoo

Aug 03, 2010

I was asked to write a blog post about MySQL High Availability at Yahoo, particularly for writes. Our standard practice is not particularly high-tech, but we’ve been using it for over 4 years now and it has become a company-wide standard with a few exceptions.
 
Let me start by saying that at Yahoo! we consider a datacenter as a Single Point of Failure (SPoF). We build and manage many of our own datacenters, and we still don’t assume they are invulnerable. How many people can attest to the fact that however to configure your racks, how many redundant switches, power supplies, drives, etc. you buy, if your leased datacenter has power or network issues, you are at their mercy.
 

read more

Tips for taking MySQL backups using LVM

Aug 03, 2010

LVM uses copy-on-write to implement snapshots. Whenever you’re writing data to some page, LVM copies the original page (the way it looked like when the snapshot was taken) to the snapshot volume. The snapshot volume must be large enough to accommodate all pages written to for the duration of the snapshot’s lifetime. In other words, you must be able to copy the data somewhere outside (tape, NFS, rsync, etc.) in less time than it would take for the snapshot to fill up.
While LVM allows for hot backups of MySQL, it still poses an impact on the disks. An LVM snapshot backup may not go unnoticed by the MySQL users.
Some general guidelines for making life easier with LVM backups follow.
Lighter, longer snapshots
If you’re confident that you have enough space on your snapshot volume, you may take the opportunity to make for a longer backup time. Why? Because you would then be able to reduce the stress from the file system. Use ionice when copying your data from the snapshot volume:

ionice -c 2 cp -R /mnt/mysql_snapshot /mnt/backup/daily/20100719/

Are you running out of space?
Monitor snapshot’s allocated size: if there’s just one snapshot, do it like this:

Every 10.0s: lvdisplay | grep Allocated????????????????????????????????????????????????????????????????????????????????????????????????????????????????? Mon Jul 19 09:51:29 2010

Allocated to snapshot? 3.63%

Don’t let it reach 100%.
Avoid running out of space
To make sure you don’t run out of snapshot allocated size, stop all administrative scripts.

Are you running your weekly purging of old data? You will be writing a lot of pages, and all will have to fit in the snapshot.
Building your reports? You may be creating large temporary tables; make sure these are not on the snapshot volume.
Rebuilding your Sphinx fulltext index? Make sure it is not on the snapshot volume, or postpone till after backup.

You will gain not only snapshot space, but also faster backups.
Someone did the job before you
Use mylvmbackup: the MySQL LVM backup script by Lenz Grimmer. Or do it manually: follow this old-yet-relevant post by Peter Zaitsev.