Wilf

Massively improving database INSERTs

Thursday 10th August 2023 (updated: Wednesday 23rd August 2023)

JavaScript

PHP

Research

Development

I use MySQL a lot. I run it locally using MAMP for a ton of data projects that never find their way online.

My current development project involves a one-off operation to insert 1,000,000 rows of data to a small table.
The table has 2 indexes - a Primary Key and a specific index on 2 columns.

The project is driven by JavaScript that then AJAX's out calls to PHP which runs the database operation.

The actual project is a pixel scanner which uses HTML5's Canvas API to examine a pixel, determine its RGBA values and then based on some conditions writes a specific value to the database.
The client side of things is remarkably quick and a 1,000 x 1,000 pixel image is scanned in around 30 seconds.
The server side of things is very very slow.

Here's how I was doing it initially.

  • Determine the values from the pixel
  • Fire off an AJAX call with jQuery
  • Create a DB connection
  • Perform INSERT in SQL

I shudder to think of the overheads being placed on MAMP for this.
The whole task equated to 1,000,000 INSERT operations and was off-the-scale slow and unreliable. In most cases the browser just hung and the system (an iMac) gave up.

Then I remembered the syntax used by PHPMyAdmin involved chaining the values to an INSERT statement much like this: INSERT INTO table (columnA,columnB) VALUES (val1,val2),(val3,val4)...

So in the JavaScript I set a threshold to 50 rows such that as the code looped through its pixels I bunched up INSERTS to contain 50 bracketed values per INSERT statement.

So this:

INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);
INSERT INTO table (columnA,columnB) VALUES (x,y);

became

INSERT INTO table (columnA,columnB) VALUES (x,y),(x,y),(x,y),(x,y),(x,y),(x,y),(x,y),(x,y);

The result was staggering and I am now writing 1,000,000 rows of data in around 2 minutes.
That's still 20,000 database connections being established but the performance boost is fantastic.
I continue to develop more of an understanding for DBMS and optimisation.

Note: in an attempt to reduce the number of database connections to just one I tried to use the $_SESSION collection to store the connection.
I ran into problems with serialisation when attempting to re-assign the $_SESSION value to the $conn parameter which I use in my sql.