[GE users] mysql path + arco issues

jana jana.olivova at sun.com
Wed Feb 18 14:15:58 GMT 2009


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

On 02/17/09 22:06, crhea wrote:




The limit was added due to the issue
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2284.
I don't have much experience with Mysql database so I will appreciate
yours, do you think it won't cause any problem when the limit is removed
from delete statement for Mysql?



I'm not an expert in MySQL yet (I've done the DBA coursework, but not the certification exam)-- my feeling is that 500 is too low. For a table that can easily have 1M+ rows, I'd think a limit of 5,000 or 10,000 would be more reasonable (the Oracle case you referenced above did not have details on how big they made their tran log space. If they can't handle a delete of more than 500 rows, the tran log space is too small [tune the DB, not the application]).

It might make sense to make this parameter something that can be changed based on a confg setting.


You might see a difference once you use the INNodB as the storage engine. The MyISAm does not support transaction safe processing, i.e. no primary / foreign key matters, you can easily insert data in the child table even though the parent key does not exist. This was also problem when deleting the rows, on PostgreSQL and Oracle. Deleting too many rows at once would just make it hang, I think this usually happend with the sge_job table who has 4 child tables, with each delete the constraints of all the child tables have to be checked. This is what made it hang, because a sequential scan over all the four tables had to be performed. By selecting too big of a row range, the query optimizer might choose to use sequential scan, instead of an index scan, even if indexes are configured on a table. Setting the limit actually made it possible to speed it up and run through. Since you are using MyISAM, the performance might be totally different, because no constraints are checked, therefore you cannot also assure data integrity.

I am actually not sure why you mean that it can never catch up and it trashes the db. Even if the dbwriter crashes, you might not be deleting that many rows, some rules are set for older than 2 days, some 1 week etc. The deletion rules are run continually every hour (so are the derived values), and it deletes data depending on what intervals are set in the $SGE_ROOT/dbwriter/database/mysql/dbwriter.xml. You can adjust those to more suit your needs, and add or delete more rules. You can read more at:

http://wikis.sun.com/display/GridEngine/Derived+Values+and+Deletion+Rules

However, I agree that this limit could be configurable.

Regards,

Jana Olivova





If you have problems with deleting data I can recommend to disable
deletion rules and delete them manually or write a script which would do it.

Regards,
aja



In our case, we did fix the problem by just runing the delete by hand (by taking an educated guess at what ARCO was doing). Including something in the docs on how to do this would be helpful-- I just guessed based on what I was seeing by watching the MySQL processes/queries. We also changed the reporting interval and removed some of the variables being recorded.

Thank you very much for you insight on why things are the way they are...

--- Cris

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=108404

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




    [ Part 2, "jana_olivova.vcf"  Text/X-VCARD (Name: "jana_olivova.vcf") ]
    [ ~339 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list