Importing Large CSV files with PHP Part 1: Import using one query

Importing Large CSV files with PHP Part 1: Import using one query

Importing CSV files into your application can become a problem when the file is really big, > 65,000 rows big. Each row of the file needs to be parsed, converted into an object, and then saved to a database. All of this needs to happen within a 30 second timeout window. It may sound like an impossible task, but there are actually a couple of solutions that can solve this problem. While working on a project at Grok, I was tasked with doing exactly that.

Initially I hoped to save the CSV file to a directory and iterate through it in a single request. I was only able to get through about 1/3 of the file before the request timed out. The second attempt was to upload the file and process it at a later time. I saved the CSV file to a temp directory, used a support class to split the original file, and saved 10,000 row chunks in processing directory. I decided to split the file in order to reduce the margin of error for large files in this case that the import did fail at some point, it would only be a part of the file instead of the entire file.

I also set up an artisan command which will look for files in processing directory and process the 10,000 row chunk without worrying about HTTP request timeout. I hooked the artisan command into a cronjob and ran it every minute on the server. (Stay tuned for my guide on how to process csv files using a command) This solved the problem but it was not very well tested since it relied primarily on file system. If the import of chunk would fail we didn't have any information to debug why it failed. After about 6 months it came time to add more features to the CSV importer and was a perfect time to refactor this feature.

After hours of research I came up on this MySQL function

LOAD DATA LOCAL INFILE 'path/to/file.csv' INTO TABLE 'table_name';

You can read more on the LOAD DATA function here. This was exciting discovery because the performance of importing a CSV file directly into mySQL database was unparalleled. Using this SQL query I was able to import a 65,000 row file in under 10 seconds, it is BLAZING fast. Below I will explain how to use this query in order to quickly import a CSV file into your Laravel app.

Controller

First lets set up our CsvImportController.php in app/Acme/Controllers/

Note: I am using PSR-4 namespacing check out PSR-4 documentation for details


<?php

namespace Acme\Controllers;

use Acme\Importing\CsvImporter;
use Input;
use Redirect;

class CsvImportController extends BaseController
{
    /**
     * [POST] Form which will submit the file
     */
    public function store()
    {
        // Check if form submitted a file
        if (Input::hasFile('csv_import')) {
            $csv_file = Input::file('csv_import');

            // You wish to do file validation at this point
            if ($csv_file->isValid()) {

                // We can also create a CsvStructureValidator class
                // So that we can validate the structure of our CSV file

                // Lets construct our importer
                $csv_importer = new CsvFileImporter();

                // Import our csv file
                if ($csv_importer->import($csv_file) {
                    // Provide success message to the user
                    $message = 'Your file has been successfully imported!'
                } else {
                    $message = 'Your file did not import';
                }

            } else {
                // Provide a meaningful error message to the user
                // Perform any logging if necessary
                $message = 'You must provide a CSV file for import.';
            }

            return Redirect::back()->with('message', $message);
        }
    }
}
?>

WARNING: We are making an assumption that the user is providing us with the correctly formatted file. THIS IS NOT A SAFE ASSUMPTION! You should never trust the user to provide correct content! You must ALWAYS validate users input including files. I will be adding another blog post with CSV validation in the near future, stay tuned!

store() - will be used for the POST route that the form will submit. We will perform some basic input validation to make sure that the file is present. We are constructing a CsvFileImporter and using the import method with our file.

Importer Class

Lets make CsvFileImporter.php inside app/Acme/Importing. This class is responsible for saving the file to a file system and importing it using the MySQL query I mentioned above.


<?php

namespace Acme\Importing;

use DB;

class CsvFileImporter
{
    /**
     * Import method used for saving file and importing it using a database query
     * 
     * @param Symfony\Component\HttpFoundation\File\UploadedFile $csv_import
     * @return int number of lines imported
     */
    public function import($csv_import)
    {
        // Save file to temp directory
        $moved_file = $this->moveFile($csv_import);

        // Normalize line endings
        $normalized_file $this->normalize($moved_file);

        // Import contents of the file into database
        return $this->importFileContents($normalized_file);
    }

    /**
     * Move File to a temporary storage directory for processing
     * temporary directory must have 0755 permissions in order to be processed
     *
     * @param Symfony\Component\HttpFoundation\File\UploadedFile $csv_import
     * @return Symfony\Component\HttpFoundation\File $moved_file
     */
    private function moveFile($csv_import)
    {
        // Check if directory exists make sure it has correct permissions, if not make it
        if (is_dir($destination_directory = storage_path('imports/tmp')) {
            chmod($destination_directory, 0755);
        } else {
            mkdir($destination_directory, 0755, true);
        }

        // Get file's original name
        $original_file_name = $csv_import->getClientOriginalName();

        // Return moved file as File object
        return $csv_import->move($destination_directory, $original_file_name);
    }

    /**
     * Convert file line endings to uniform "\r\n" to solve for EOL issues
     * Files that are created on different platforms use different EOL characters
     * This method will convert all line endings to Unix uniform
     *
     * @param string $file_path
     * @return string $file_path
     */
    protected function normalize($file_path)
    {
        //Load the file into a string
        $string = @file_get_contents($file_path);

        if (!$string) {
            return $file_path;
        }

        //Convert all line-endings using regular expression
        $string = preg_replace('~\r\n?~', "\n", $string);

        file_put_contents($file_path, $string);

        return $file_path;
    }

    /**
     * Import CSV file into Database using LOAD DATA LOCAL INFILE function
     *
     * NOTE: PDO settings must have attribute PDO::MYSQL_ATTR_LOCAL_INFILE => true
     *
     * @param $file_path
     * @return mixed Will return number of lines imported by the query
     */
    private function importFileContents($file_path)
    {
        $query = sprintf("LOAD DATA LOCAL INFILE '%s' INTO TABLE file_import_contents 
            LINES TERMINATED BY '\\n'
            FIELDS TERMINATED BY ',' 
            IGNORE 1 LINES (`content`)", addslashes($file_path));

        return DB::connection()->getpdo()->exec($query);
    }
}
?>

Now lets walk through each method and look at what each of them does.

import($csv_import) method is the only public method of this class thats because we only want to worry about using the import method when importing. This method accepts 1 parameter. The parameter should be an instance of the UploadedFile class from the Laravel support library. This way we have conveniet access to methods like getClientOriginalName() and move(). This method will use the moveFile() normalize() and importFileContents() methods. It returns the result of importFileContents() method which will return the number of line imported by the SQL query.

moveFile($csv_import) method is used to move our uploaded file to a temp directory and also verifies that we have appropriate file permissions. The directory and file permissions must be 0755 this is important because MySQL needs to be able to read from the directory and file when we execute the query. This method returns a new File object from the Laravel's support library

normalize($file_path) method is a very useful workaround for file that are created on different platforms. This method will normalize all line endings to be \n format which is standard for Unix system. Learn more about line endings here. This is also important for our query because we need to tell MySQL where the row of file stops. This method returns a newly written file object.

NOTE: We are specifying the parameter to be a $file_path. This is because we only need a file path to use it inside of file_get_contents() method. Here is where some nice magic happens. File class extneds SplFileInfo class provided by PHP. This class has a __toString() method which will return a full path to the file. Neat!

importFileContents()method is where all of the import actually happens. This method builds up a SQL query and executes it using the DB class. The method returns the number of records processed by the query, if nothing was imported this method will return 0.

Query breakdown

Lets look closely at the query we are executing: LOAD DATA LOCAL INFILE tells mySQL that it needs to stream the contents of the file into a specified table. The LOCAL flag tells MySQL server that the file resides on the same machine as the Database, this is usually the case unless you have a remote database server in whic case you can not use the LOCAL flag. Since we are using sprintf() function we are specifying the file path as a %s string format.

INTO TABLE is fairly straight forward this is the table where you want the data to go to.

LINES TERMINATED BY '\\n' will accept any character including \n,\r,\r\n line endings, or even \t tabs. This tell the query where the row will end and thus will being creating a new record.

FIELDS TERMINATED BY ',' this parameter tells MySQL where each field should end. You may also specify the following optional parameter ENCLOSED BY '' which tell MySQL to only look for values which are enclosed in the specified character. This is helpful if you know that your data will be enclosed in " or ' characters.

IGNORE 1 LINES will tell MySQL to ignore the number of lines specified in the beginning of the file. This is helpful when your CSV file contains a header.

Additionally you can specify the column names which you would like to set by including SET (column_name, column_name2, etc)

Configuration notes:

In order to use LOAD DATA LOCAL INFILE your PDO connection must have the PDO::MYSQL_ATTR_LOCAL_INFILE => true parameter. It is set to false by default but its fairly simple to enable.

Open your app/config/database.php file. It should look something like this:

<?php

return array(

    /* other settings removed for brevity */
    'connections' => array(


        'mysql' => array(
            'driver'    => 'mysql',
            'host'      => 'localhost',
            'database'  => 'database',
            'username'  => 'root',
            'password'  => '',
            'charset'   => 'utf8',
            'collation' => 'utf8_unicode_ci',
            'prefix'    => ''
        ),
        /* other connections removed for brevity */
    )
)
?>

Now we need to add our connection options to the end of the mysql connection array. So your database.php file should end up looking like this.

<?php

return array(

    /* other settings removed for brevity */
    'connections' => array(


        'mysql' => array(
            'driver'    => 'mysql',
            'host'      => 'localhost',
            'database'  => 'database',
            'username'  => 'root',
            'password'  => '',
            'charset'   => 'utf8',
            'collation' => 'utf8_unicode_ci',
            'prefix'    => '',
            'options'   => array(
                PDO::MYSQL_ATTR_LOCAL_INFILE => true,
            ),
        ),
        /* other connections removed for brevity */
    )
)
?>

Thanks to fideloper.com for discovering this nice trick.

NOTE: From my experience this works best if your MySQL server version is ~ 5.6 earlier version of MySQL seem to experience problems when using the LOCAL parameter.

Using this method large CSV files can be imported into MySQL really fast and thus are not required to be processed later. In a later post I will describe how we can validate the contents of the file before attempt to import it. I will also make a post about storing the CSV file and processing it at a later time using php artisan command and crontab job. If you enjoyed this post, share it, if you have questions, comments or concearns feel free to leave a comment below.

Next we will look at validating the CSV file structure before we run this import, head on over to Importing Large CSV files with PHP Part 2: Validating CSV file structure

Categories: Software Development | Tags: CSV, MySQL, PHP, Laravel

Portrait photo for Anton Domratchev Anton Domratchev

Anton graduated from the University of Texas at San Antonio with a degree in Cyber Security. Anton started front end development with HTML, CSS and JavaScript in 2008, and has been developing in PHP since 2013. He is currently honing his Ruby on Rails skills while discovering the wonderful world of Software Architecture.

Comments


LET US HELP YOU!

We provide a free consultation to discover competitive advantages for your business. Contact us today to schedule an appointment.