Over 150,000 Products, Slow Queries Making Backend Hard To Use

In an attempt to make this better for my client I turned on the slow sql queries log and set it to log anything over 1 second. This is by far the biggest offender in both frequency and time. I need a way to optimize this query through indexing or other means. I would really appreciate some help from cscart employees on this one. As this query takes about 7 seconds minimum sometimes upwards of 10 seconds. I'd say the average is 8.


```php

Query_time: 7 Lock_time: 0 Rows_sent: 24 Rows_examined: 1567261

SELECT SQL_CALC_FOUND_ROWS products.*, descr1.product as product, MIN(IF(prices.percentage_discount = 0, prices.price, prices.price - (prices.price * prices.percentage_discount)/100)) as price, descr1.short_description, IF(descr1.short_description = '', descr1.full_description, '') as full_description, companies.company as company_name, GROUP_CONCAT(IF(products_categories.link_type = 'M', CONCAT(products_categories.category_id, 'M'), products_categories.category_id)) as category_ids, products_categories.position, cscart_seo_names.name as seo_name FROM cscart_products as products LEFT JOIN cscart_product_features_values ON cscart_product_features_values.product_id = products.product_id LEFT JOIN cscart_product_descriptions as descr1 ON descr1.product_id = products.product_id AND descr1.lang_code = 'EN' LEFT JOIN cscart_product_prices as prices ON prices.product_id = products.product_id AND prices.lower_limit = 1 LEFT JOIN cscart_companies AS companies ON companies.company_id = products.company_id INNER JOIN cscart_products_categories as products_categories ON products_categories.product_id = products.product_id INNER JOIN cscart_categories ON cscart_categories.category_id = products_categories.category_id AND (cscart_categories.usergroup_ids = '' OR FIND_IN_SET(0, cscart_categories.usergroup_ids) OR FIND_IN_SET(1, cscart_categories.usergroup_ids)) AND cscart_categories.status IN ('A', 'H') LEFT JOIN cscart_seo_names ON cscart_seo_names.object_id = products.product_id AND cscart_seo_names.type = 'p' AND cscart_seo_names.dispatch = '' AND cscart_seo_names.lang_code = 'EN' LEFT JOIN cscart_product_popularity as popularity ON popularity.product_id = products.product_id WHERE 1 AND (((descr1.search_words LIKE '%STM%') OR descr1.product LIKE '%STM%' OR descr1.short_description LIKE '%STM%' OR descr1.full_description LIKE '%STM%' OR (descr1.meta_keywords LIKE '%STM%' OR descr1.meta_description LIKE '%STM%') OR cscart_product_features_values.value LIKE '%STM%')) AND (companies.status = 'A' OR products.company_id = 0) AND products.status IN ('A') AND (products.usergroup_ids = '' OR FIND_IN_SET(0, products.usergroup_ids) OR FIND_IN_SET(1, products.usergroup_ids)) AND products.status IN ('A') AND prices.usergroup_id IN (0, 0, 1) GROUP BY products.product_id ORDER BY popularity.total desc LIMIT 432, 24;



```



This is the query I believe that happens on the products page. The second most used page after our orders page.

Hi Atrix,

The query posted is see that you are using cs-cart 3and the page is product search, page 17, the searched word is STM and you see it better like this
SELECT SQL_CALC_FOUND_ROWS products.*, descr1.product as product, MIN(IF(prices.percentage_discount = 0, prices.price, prices.price - (prices.price * prices.percentage_discount)/100)) as price, descr1.short_description, IF(descr1.short_description = '', descr1.full_description, '') as full_description, companies.company as company_name, GROUP_CONCAT(IF(products_categories.link_type = 'M', CONCAT(products_categories.category_id, 'M'), products_categories.category_id)) as category_ids, products_categories.position, cscart_seo_names.name as seo_name FROM cscart_products as products
LEFT JOIN cscart_product_features_values ON cscart_product_features_values.product_id = products.product_id
LEFT JOIN cscart_product_descriptions as descr1 ON descr1.product_id = products.product_id AND descr1.lang_code = 'EN'
LEFT JOIN cscart_product_prices as prices ON prices.product_id = products.product_id AND prices.lower_limit = 1
LEFT JOIN cscart_companies AS companies ON companies.company_id = products.company_id
INNER JOIN cscart_products_categories as products_categories ON products_categories.product_id = products.product_id
INNER JOIN cscart_categories ON cscart_categories.category_id = products_categories.category_id AND (cscart_categories.usergroup_ids = '' OR FIND_IN_SET(0, cscart_categories.usergroup_ids) OR FIND_IN_SET(1, cscart_categories.usergroup_ids)) AND cscart_categories.status IN ('A', 'H')
LEFT JOIN cscart_seo_names ON cscart_seo_names.object_id = products.product_id AND cscart_seo_names.type = 'p' AND cscart_seo_names.dispatch = '' AND cscart_seo_names.lang_code = 'EN'
LEFT JOIN cscart_product_popularity as popularity ON popularity.product_id = products.product_id
WHERE 1 AND
(((descr1.search_words LIKE '%STM%') OR
descr1.product LIKE '%STM%' OR
descr1.short_description LIKE '%STM%' OR
descr1.full_description LIKE '%STM%' OR
(descr1.meta_keywords LIKE '%STM%' OR descr1.meta_description LIKE '%STM%') OR
cscart_product_features_values.value LIKE '%STM%')) AND
(companies.status = 'A' OR products.company_id = 0) AND
products.status IN ('A') AND
(products.usergroup_ids = '' OR FIND_IN_SET(0, products.usergroup_ids) OR FIND_IN_SET(1, products.usergroup_ids)) AND
products.status IN ('A') AND
prices.usergroup_id IN (0, 0, 1)
GROUP BY products.product_id ORDER BY popularity.total desc
LIMIT 432, 24;




Do you have multiple stores?

Are you using different prices per users?

For the admin area can be done some optimizations by removing some mysql extra conditions from query, from the one you send is this is backend area can be easily removed: seoname, usergroup, features,company



You can send a quote request on hungryweb.net and togheter we find solutions for your store to work as you wish and fit for your bussiness.





Valentin

[color=#808080][size=2]part of hungryweb.net[/size][/color]

I'm a dev, was just wanting feedback from the community. And yah, I had already figured the query out, I was just wondering if anyone else ran into the issue. I had problems with it even filling up our tmp partition. I just had to do some query optimizations and take out unneeded joins, and limit it to active products, and change the mysql server configuration to the recommended “big server” settings. It's working better now but there's still room for improvement.

Update: If you use EXPLAIN on the query that was causing issues I found some real issues with the prices field (native cscart) not using indexes. I fixed this by adding a single key index for product id to the prices table, it was an index but I believe it was a multikey index and thus useless for this query. I also took out the IN statement which wasn't letting it use indexing well.

I could optimize it further if I could get rid of Group by and Order by using different fields or get rid of those statements all together. The query was taking 11 seconds (since we added more products than my original post) now it takes 5.5 seconds. I tried taking out seonames but it gave no measurable speed boost.

Be careful in reducing conditions. Most cs-cart queries are shared between admin (and various administrative security criteria) and customer side. Indexing product_id in prices is probably a good idea. It should probably be a unique index of product_id/usergroup_id/company_id.



Strongly suggest you have your server mysql config use it's own tmp directory rather than the system /tmp directory so you can maintain DB integrity relative to space issues (I.e. process creating large file in /tmp).

Here is the complete query that I ended up with.


SELECT SQL_CALC_FOUND_ROWS products.product_id, products.product_code, products.product_type, products.status, products.company_id, products.list_price, products.amount, products.weight, products.tracking, products.is_edp, descr1.product as product, MIN(IF(prices.percentage_discount = 0, prices.price, prices.price - (prices.price * prices.percentage_discount)/100)) as price, companies.company as company_name, cscart_seo_names.name as seo_name
FROM cscart_products as products
LEFT JOIN cscart_product_descriptions as descr1 ON descr1.product_id = products.product_id AND descr1.lang_code = 'EN'
LEFT JOIN cscart_product_prices as prices ON prices.product_id = products.product_id AND prices.lower_limit = 1
LEFT JOIN cscart_companies AS companies ON companies.company_id = products.company_id
LEFT JOIN cscart_seo_names ON cscart_seo_names.object_id = products.product_id AND cscart_seo_names.type = 'p' AND cscart_seo_names.dispatch = '' AND cscart_seo_names.lang_code = 'EN'
LEFT JOIN cscart_product_popularity as popularity ON popularity.product_id = products.product_id
WHERE 1
GROUP BY products.product_id
ORDER BY popularity.total LIMIT 0, 200;




This query is the actual query it used to go to the products page with no searching done. The one in my OP was with an example search. But I learned the slowest query is when it returns all products. So this is the one I worked on.



The one change I had to do, following EXPLAIN's lead in mysql was on the prices join, it was using type ALL which is very bad it means it had to scan the entire table and no indexes were used.

The index I said I added before was completely redundant and it was single key. So it wasn't an indexes problem. The problem was using an IN statement:


WHERE 1 AND prices.usergroup_id IN (0)


This was edited in the fn_get_products function in fn.catalog.php you can probably do some of it with hooks though cscart doesn't have enough hooks to modify everything you need to. I've been in that spot before and once it happens once… well lets just say my installation is pretty customized by now and upgrading is a pain because of it. I did use hooks for as long as I could but somethings aren't doable by them I needed to do.



I could safely take out the usergroup condition because we don't use usergroups or any of it's features in our store. Doing this, lowered the query time. Nothing else. To half. I had been told many times before on stack overflow and other places using IN statements is a big no no and very inefficient.



So I also changed the products.status condition to be AND conditions with ('s around it and OR's on the inside this lets mysql do it's job and not slow way down.



Secondly, I changed my server's settings to use my-big.cnf suggested settings that ship with mysql this helped alot too.



Thirdly, I had before tbirnseth suggested, moved my tmp directory for mysql to the root drive. This allowed it to not be limited to 1.8 gigs for temp tables. However, I still think the best solution is leave it in the tmp partition but increase the size of the partition if you have the time to do that. I have noticed large speedups because it is no longer waiting for space to clear to finish queries.



To further make this query efficient is difficult, but I do believe getting rid of ORDER BY and GROUP BY if doable will give it another increase (as I tried, it got it down to 3.3 seconds and mysql EXPLAIN also says that it is bad especially if the order by and group by are different fields which they are here). So I basically can't do that without more time spent. The Order By clause coudl be taken out though for just a blank search for sure and shouldn't be there at all with no search query or parameters. The Group by could be taken out too maybe if the query was written in a way to not need it. I don't know if that's possible but that's where the next biggest speedups lie.



The other option I am facing is just purchasing alot of ram and turning up caching really high. When the query is cached it is .001 seconds. But it doesn't stay in memory long due to our low ram and high demand server. Ram isn't cheap for cloud hosting so my client doesn't see that as too practical at the moment.



Here is the my-big.cnf file I found I believe it ships with mysql sometimes called my-huge.cnf

I additionally used mysqltuner found here GitHub - major/MySQLTuner-perl: MySQLTuner is a script written in Perl that will assist you with your MySQL configuration and make recommendations for increased performance and stability. to help suggest adjustments to my config and used the tmpdir variable to change my directory for mysql tmp files. I adjusted as my RAM allowed but mostly stuck with the my-big.cnf settings.


```php

Example MySQL config file for very large systems.

#

This is for a large system with memory of 1G-2G where the system runs mainly

MySQL.

#

You can copy this file to

/etc/my.cnf to set global options,

mysql-data-dir/my.cnf to set server-specific options (in this

installation this directory is /var/lib/mysql) or

~/.my.cnf to set user-specific options.

#

In this file, you can use all long options that a program supports.

If you want to know which options a program supports, run the program

with the “–help” option.


# The following options will be passed to all MySQL clients
[client]
#password = your_password
port = 3306
socket = /var/lib/mysql/mysql.sock

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
port = 3306
socket = /var/lib/mysql/mysql.sock
skip-locking
key_buffer = 384M
max_allowed_packet = 1M
table_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8

# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
#
#skip-networking

# Disable Federated by default
skip-federated

# Replication Master Server (default)
# binary logging is required for replication
log-bin=mysql-bin

# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id = 1

# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
# the syntax is:
#
# CHANGE MASTER TO MASTER_HOST=, MASTER_PORT=,
# MASTER_USER=, MASTER_PASSWORD= ;
#
# where you replace , , by quoted strings and
# by the master's port number (3306 by default).
#
# Example:
#
# CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
# MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
# start replication for the first time (even unsuccessfully, for example
# if you mistyped the password in master-password and the slave fails to
# connect), the slave will create a master.info file, and any later
# change in this file to the variables' values below will be ignored and
# overridden by the content of the master.info file, unless you shutdown
# the slave server, delete master.info and restart the slaver server.
# For that reason, you may want to leave the lines below untouched
# (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id = 2
#
# The replication master for this slave - required
#master-host =
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user =
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password =
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port =
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin

# Point the following paths to different dedicated disks
#tmpdir = /tmp/
#log-update = /path-to-dedicated-directory/hostname

# Uncomment the following if you are using BDB tables
#bdb_cache_size = 384M
#bdb_max_lock = 100000

# Uncomment the following if you are using InnoDB tables
#innodb_data_home_dir = /var/lib/mysql/
#innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
#innodb_log_group_home_dir = /var/lib/mysql/
#innodb_log_arch_dir = /var/lib/mysql/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 384M
#innodb_additional_mem_pool_size = 20M
# Set .._log_file_size to 25 % of buffer pool size
#innodb_log_file_size = 100M
#innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
#innodb_lock_wait_timeout = 50
max_heap_table_size = 32M
tmp_table_size = 64M

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[isamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout
```

Lastly here is the EXPLAIN of the final query as you can see it uses indexes or ref on all, where before the prices type was listed ALL:


+----+-------------+------------------+--------+--------------------------------------------------+------------+---------+---------------------------------------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+--------------------------------------------------+------------+---------+---------------------------------------------------------+--------+---------------------------------+
| 1 | SIMPLE | products | index | NULL | PRIMARY | 3 | NULL | 213564 | Using temporary; Using filesort |
| 1 | SIMPLE | descr1 | eq_ref | PRIMARY,product_id | PRIMARY | 9 | wanaryd_cart_utf8.products.product_id,const | 1 | |
| 1 | SIMPLE | prices | ref | usergroup,product_id,lower_limit,product_id_only | product_id | 3 | wanaryd_cart_utf8.products.product_id | 1 | |
| 1 | SIMPLE | companies | eq_ref | PRIMARY | PRIMARY | 4 | wanaryd_cart_utf8.products.company_id | 1 | |
| 1 | SIMPLE | cscart_seo_names | ref | PRIMARY,dispatch | PRIMARY | 206 | wanaryd_cart_utf8.products.product_id,const,const,const | 340 | |
| 1 | SIMPLE | popularity | eq_ref | PRIMARY,total | PRIMARY | 3 | wanaryd_cart_utf8.products.product_id | 1 | |
+----+-------------+------------------+--------+--------------------------------------------------+------------+---------+---------------------------------------------------------+--------+---------------------------------+
6 rows in set (0.03 sec)

Just for other's, Atrix has done a great job in analysis but has modified things to fit the specific needs of his environment. I.e. he has eliminated standard features that he doesn't use.



Instead of an IN clause for doing the usergroups, one could create a “AND (usergroup_id=0 OR usergroup_id=1 OR usergroup_id=2 OR etc…)” But that's where the optimization in the db server should take over and correctly adjust the IN clause. If it is IN (0) then that should be modified by the query optimizer to be usergroup_id=0.



The vast majority of product data in the cart comes from the fn_get_product_info() function and there are hooks available to modify the queries (get_product_data_pre). However, parsing out what's to be used is awkward at best… Product lists are usually handled by queries specific to the needs of the list…

[quote name='tbirnseth' timestamp='1395166052' post='179539']

he has eliminated standard features that he doesn't use.



Instead of an IN clause for doing the usergroups, one could create a “AND (usergroup_id=0 OR usergroup_id=1 OR usergroup_id=2 OR etc…)” But that's where the optimization in the db server should take over and correctly adjust the IN clause. If it is IN (0) then that should be modified by the query optimizer to be usergroup_id=0.



The vast majority of product data in the cart comes from the fn_get_product_info() function and there are hooks available to modify the queries (get_product_data_pre). However, parsing out what's to be used is awkward at best… Product lists are usually handled by queries specific to the needs of the list…

[/quote]



I suppose you could write some sort of query optimizer but IN clauses should only be used when they are the only choice (only thing that comes to mind is a subquery in a where statement). They are handy and most people don't know that IN is a performance hit. The queries for fn_get_products are used a lot and you'd have to account for every case. There are hooks all over the place in cscart, but sometimes the variables you need to adjust aren't even passed to any hook. I'd give examples but I forget just where. This problem could probably be solved with hooks. It saves me dev time to not have to worry about it however, and my client told me he wishes to freeze the CS Cart version due to problems of upgrades in the past. Maybe he'll change his mind one day and I'll have my work cut out for me. But well, I have found shoving best practices down a clients throat usually ends up losing clients. The care about time and cost, and maintance is something they usually care about much later. I explained everything to him, it was his call I'm just saying that just because it's considered best practice, doesn't always mean it's best business practice. (say for instance, he's trying this store idea out and needs it up for as little as possible as it might be a negative profit venture. If it flops maintainability is never an issue. If it's successful, then it will later be able to pay to make it maintainable etc.)



Don't get me wrong I love cscart's hooks, but they can't account for every change you want to make. But it does do most the big ones.

hi guys

our website was just suspended,

our host provider told us , There are many temp tables created on the DIsk.

[font=Verdana, Arial, Helvetica][size=2]This has far from resolved the issue and is not a resolution. The queries must be fixed or disabled:

below are samples.

-----

| Query | 613 | Copying to tmp table | SELECT SQL_CALC_FOUND_ROWS products., descr1.product as product, MIN(prices.price) as price, descr1 |

| Query | 571 | Copying to tmp table | SELECT SQL_CALC_FOUND_ROWS products.
, descr1.product as product, MIN(prices.price) as price, descr1 |

| Query | 567 | Copying to tmp table | SELECT SQL_CALC_FOUND_ROWS products., descr1.product as product, MIN(prices.price) as price, descr1 |

| Query | 414 | Copying to tmp table | SELECT SQL_CALC_FOUND_ROWS products.
, descr1.product as product, MIN(prices.price) as price, descr1 |

| Query | 128 | Copying to tmp table | SELECT SQL_CALC_FOUND_ROWS products.*, descr1.product as product, MIN(prices.price) as price, descr1 |[/size][/font]



They need us disable the query or find the way to fix it .



Could someone help me here? We used csv files to import products about 100000. after that , Disk I/O abuse . Then suspended.



Thanks

martin

[quote name='martin1979' timestamp='1395673250' post='180013']

hi guys

our website was just suspended,

our host provider told us , There are many temp tables created on the DIsk.

We used csv files to import products about 100000

[/quote]



Is this a shared host? A VPS? CS Cart 3 can't run that many products on a shared host. You'll probably have to end up doing what I did and either increase the size of your /tmp or move the mysql tmp directory off of /tmp and put it somwhere else on the main disk partition (neither of which you can do on a shared host).



This is the same query I optimized. But I optimized speed not for temp table size on disk. To get it to stop using temporary tables there's some things to look at:


Temporary tables can be created under conditions such as these:

If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.

DISTINCT combined with ORDER BY may require a temporary table.

If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.




In this case it is probably the order by and group by and there's not much you can do about having to use them. So I'm afraid your best option is to get a better server or use less products.