RDS S3 Import/export

 

Overview

Hello. This article explains the AWS RDS S3 import/export feature and shares the results of import performance testing.


What is S3 import/export?

RDS S3 import/export is a feature that exports RDS data to S3 or imports S3 data to RDS. To allow RDS to access S3, you need to configure IAM role connection settings. This article conducted hands-on practice for Aurora MySQL and Aurora Postgres.



INFILE import VS S3 import Performance Comparison

I conducted performance testing to import 18 million records using MySQL Aurora db.r6i.xlarge spec. I executed only the import function without any database settings, such as indexes.

For the 18 million records of data, I used TPC-H, which I documented in a previous blog post.

Comparing LOAD DATA LOCAL INFILE and LOAD DATA FROM S3 PREFIX import, the INFILE import was approximately 12% faster than the S3 import. S3 import performs worse than the INFILE method due to additional network hops when accessing S3.

  • LOCAL DATA INFILE import
LOAD DATA LOCAL INFILE 'lineitem.tbl' 
INTO TABLE LINEITEM
FIELDS TERMINATED BY '|' 
LINES TERMINATED BY '\\n';


  • LOAD DATA S3 import
LOAD DATA FROM S3 PREFIX '$BUCKET'
INTO TABLE LINEITEM
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\\n';


AWS blog doesn't have articles comparing INFILE and S3 import, but it explains that INFILE import performance is very good.



To compare INFILE and S3 import performance more accurately, it seems necessary to test with hundreds of millions of records and database configurations.


When should you use S3 import/export?

It's good to use S3 import/export when you need to store and utilize data in S3. For example, you can use it when you need to periodically back up some data instead of RDS snapshots, or you can utilize S3 as a shared storage role when performing RDS migration.

I used both S3 import and export to selectively restore only the damaged portion of data after executing RDS Point in Time Recovery.


Usage Method

To use S3 import/export in RDS, you need to configure 3 things. The configuration method differs depending on the RDS engine. Import/export SQL also differs depending on the RDS engine.

  1. Connect IAM role to RDS
  2. Configure database to use S3 import/export
  3. Execute SQL

MySQL Aurora Configuration

Reference: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html

  1. Connect IAM role to RDS

In the RDS dashboard, attach an IAM role from Connectivity & security → Managed IAM roles.



Configure the IAM role's policy as follows.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::akubun-9umx9v5r"
        },
        {
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::akubun-9umx9v5r/*"
        }
    ]
}


Configure the IAM role's Trust relationship so RDS can assume it.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AssumeRoleForBackup",
            "Effect": "Allow",
            "Principal": {
                "Service": "[rds.amazonaws.com](<http://rds.amazonaws.com>)"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}



  1. Enable S3 import/export feature

After connecting the IAM role to RDS, modify the MySQL cluster parameter group. Enter the connected IAM role in "aws_default_s3_role". After completing the configuration, reboot the RDS instance.




  1. Grant S3 import/export permissions to Database user

Grant S3 import/export permissions to MySQL users. The rds_root user does not need permission configuration.

AWS_LOAD_S3_ACCESS permission is for S3 import and AWS_SELECT_S3_ACCESS is for S3 export.

GRANT AWS_LOAD_S3_ACCESS TO 'user'@'domain-or-ip-address'
GRANT AWS_SELECT_S3_ACCESS TO 'user'@'domain-or-ip-address'

MySQL Aurora Practice Data Preparation

For MySQL Aurora import/export data, I used TPC-H.



I used the lineitem table with approximately 18 million rows. I generated approximately 18 million rows using the dbgen command.

git clone <https://github.com/electrum/tpch-dbgen>
cd tpch-dbgen
make
time ./dbgen -s 3 -T L




I split the approximately 18 million records into multiple files to execute S3 import in parallel.

split -l 700000 lineitem.tbl lineitem_




I uploaded the split data to S3.



And I created the tpch database and tables used in the S3 import/export examples. The table creation SQL is in the tpch-dbgen github repo.

mysql -h $HOST -u$USER -p$PASSWORD << EOF
CREATE DATABASE tpch;
USE tpch;
SOURCE dss.ddl;
EOF

MySQL Aurora Import SQL

Import S3 data with LOAD DATA FROM S3 PREFIX SQL.

BUCKET="s3://akubun-9umx9v5r/import/lineitem_"
time mysql -h $HOST -u$USER -p$PASSWORD tpch << EOF
LOAD DATA FROM S3 PREFIX '$BUCKET'
INTO TABLE LINEITEM
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\\n';
EOF



I imported bucket items starting with lineitem_ from the S3 bucket.




With db.t4g.medium specs, the S3 import operation of 18 million records took approximately 7 minutes.



Verify that the data was imported correctly. I used the COUNT statement.

mysql -h $HOST -u$USER -p$PASSWORD tpch << EOF
SELECT COUNT(*) FROM LINEITEM
EOF



When the S3 import operation starts, RDS instance resource usage such as CPU increases. In this example, CPU usage went up to a maximum of 70% during S3 import.



Performance Insight shows the SQL used during S3 import.



MySQL Aurora Export SQL

S3 export is executed with INTO OUTFILE S3 SQL. Specify a prefix for the bucket path.

BUCKET="s3://akubun-9umx9v5r/export/mysql_export"
time mysql -h $HOST -u$USER -p$PASSWORD tpch << EOF
SELECT * FROM LINEITEM
LIMIT 10
INTO OUTFILE S3 '${BUCKET}'
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\\n'
MANIFEST ON
OVERWRITE ON;
EOF

Exported data is stored in partition units (part_xxxx).



The manifest stores information about the exported data.




The manifest can be used during S3 import.

LOAD DATA FROM S3 MANIFEST ...

Differences between Postgres Aurora S3 import/export Configuration and MySQL Aurora

  1. As of November 2025, Postgres Aurora does not have prefix S3 import.

MySQL Aurora supports prefix S3 import, allowing you to import files with patterns like abc_* at once. However, Postgres does not support S3 import prefix, so you need to develop parallel import logic.


  1. The configuration when attaching IAM to RDS is different.

MySQL Aurora needs only one IAM role for import and export.

On the other hand, when attaching IAM to Postgres Aurora, separate IAM roles are needed for import and export.



  1. There are differences in parameter group configuration.

MySQL Aurora required parameter group configuration to use S3 import/export, but Postgres Aurora does not configure parameter groups. Postgres Aurora uses the aws_s3 extension.

  1. There are differences in instance rebooting.

MySQL Aurora requires RDS instance reboot to apply parameter changes. However, Postgres Aurora does not need RDS instance reboot because it didn't change parameter groups.

  1. There are differences in file extensions.

Unlike MySQL, PostgreSQL must explicitly specify the file format when doing S3 import/export. PostgreSQL supports CSV, TEXT, and GZIP compressed files.







Files in the MySQL S3 bucket don't need extensions, but files in the Postgres S3 bucket need extensions. Also, MySQL supports prefix so you can upload multiple files to the bucket and import them, but Postgres doesn't support prefix so I uploaded a single file.

  1. There are differences in database user permission settings.

MySQL Aurora had to explicitly set which user gets S3 import/export permissions, but Postgres doesn't have user permission settings. This is because Postgres sets permissions based on roles.

Postgres Aurora Configuration

  1. Connect IAM role to RDS

In the RDS dashboard, attach IAM roles from Connectivity & security → Managed IAM roles. You need to create separate IAM roles for S3 import and S3 export.



Configure the import IAM role policy as follows. I only set policies to fetch S3 items.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::akubun-9umx9v5r"
        },
        {
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::akubun-9umx9v5r/*"
        }
    ]
}

Configure the export IAM role policy as follows. I added policies related to uploading items to S3.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::akubun-9umx9v5r"
        },
        {
            "Action": [
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::akubun-9umx9v5r/*"
        }
    ]
}

Configure the IAM role's Trust relationship so RDS can assume it.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AssumeRoleForBackup",
            "Effect": "Allow",
            "Principal": {
                "Service": "[rds.amazonaws.com](<http://rds.amazonaws.com>)"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}


  1. Install aws_s3 extension

To use S3 import/export in Postgres Aurora, you need to install the aws_s3 extension.

Since Postgres extension requires a database, create the database first.

USER=testuser

psql -h $HOST -U $ROOT_USER -d $DATABASE << EOF
CREATE USER $USER;
GRANT $USER TO $ROOT_USER;
CREATE DATABASE tpch OWNER $USER;
EOF


Install the aws s3 extension in the tpch database you just created. The "\dx" SQL queries the list of extensions installed in the current database. The "\dx" SQL execution result should show the aws_s3 extension.

psql -h $HOST -U $ROOT_USER -d tpch << EOF
CREATE EXTENSION aws_s3 CASCADE;
\\dx
EOF


Postgres Aurora Practice Data Preparation

For Postgres Aurora import/export data, I used TPC-H.

Clone tpch-dbgen.

git clone <https://github.com/electrum/tpch-dbgen>
cd tpch-dbgen

For Postgres, the data must not have a delimiter at the end when importing. If there's a delimiter at the end, you'll encounter the following error.

ERROR:  extra data after last expected column





Therefore, you need to modify the compile option in dbgen to not generate the last delimiter. Add -DEOL_HANDLING to CFLAGS. Then compile with make to generate the dbgen binary.

$ vi makefile
CFLAGS  = -g -DDBNAME=\\"dss\\" -D_FILE_OFFSET_BITS=64 -DEOL_HANDLING

$ make

I used the lineitem table with approximately 6 million rows. I generated approximately 6 million rows using the dbgen command.

time ./dbgen -s 1 -T

Verify that there is no delimiter at the end of the generated lineitem.tbl file.



I uploaded the split data to S3. Change the extension from tbl to csv when uploading.

mv lineitem.tbl lineitem.csv



For Postgres, I created the tpch database and tables. Unlike MySQL, Postgres needs to set a role when creating a database.

psql -h $HOST -U $ROOT_USER -d $DATABASE << EOF
CREATE USER $USER;
GRANT $USER TO $USERROOT_USER;
CREATE DATABASE tpch OWNER $USER;
EOF

I used dss.ddl as-is for the table creation SQL. The table creation SQL is in the tpch-dbgen github repo.

psql -h $HOST -U $ROOT_USER -d tpch -f dss.ddl


Postgres Aurora S3 import SQL

Postgres uses the table_import_from_s3 function of aws_s3 create_s3_uri for S3 import.

  • Bucket location: s3://akubun-9umx9v5r/import/lineitem.csv

time psql -h $HOST -U $ROOT_USER -d tpch << EOF
SELECT aws_s3.table_import_from_s3(
    'LINEITEM',
    '',
    '(format text, delimiter ''|'')',
    aws_commons.create_s3_uri(
        'akubun-9umx9v5r',
        'import/lineitem.csv',
        'ap-northeast-2'
    )
);
EOF


Postgres Aurora S3 export SQL

Postgres S3 export uses the query_export_to_s3 function of the aws_s3 extension.

psql -h $HOST -U $ROOT_USER -d tpch << EOF
SELECT * FROM aws_s3.query_export_to_s3(
    'SELECT * FROM LINEITEM LIMIT 10',
    aws_commons.create_s3_uri(
        'akubun-9umx9v5r',
        'export/postgres/lineitem_sample.csv',
        'ap-northeast-2'
    ),
    options := 'format csv, delimiter ''|'', header true'
);
EOF


You can see the CSV file created by Postgres in S3. When you download and view the CSV file, the file was extracted well as intended.


References

Comments

Popular posts from this blog

When I use AWS Auto Scaling Group? and What is it?