How to connect your AWS Glue to AWS Aurora Serverless RDS instance ( in VPC )

4 min readDec 1, 2023

In this article we are going to understand how one can connect their AWS Glue job to the AWS aurora RDS instance using a JDBC connection. Please note that the RDS discussed here is in a virtual private cloud ( default VPC in a public subnet in this case ) .

The user needs to identify the type of the Engine for the Aurora Serverless RDS . In this example we are connecting to a MySQL Engine.

Lets write the Glue job code to connect to the RDS .

connection_mysql5_options = {
    "url": "jdbc:mysql://<jdbc-host-name>:3306/db",
    "dbtable": "test", // Table of the DB
    "user": "admin", // user name of the connection
    "password": "pwd"}

df_mysql5 = glueContext.create_dynamic_frame.from_options(connection_type="mysql",
                                                          connection_options=connection_mysql5_options)

For the code to work one needs to establish a Glue connector as well . For this go the data connection tab on the left menu of your AWS glue console interface and then choose create connection.

To establish the connection to the Aurora serverless DB residing in the subnet of your VPC choose JDBC from the options and select the target for the connection.

Click on next and then paste your JDBC connection string here. The JDBC connection string for MySQL Engine on Aurora serverless will look as follows

jdbc:mysql://<host-name>:<port number - usually 3306 >/<db-name>

For connection strings of other engines please refer to the AWS Glue documentation https://docs.aws.amazon.com/glue/latest/dg/connection-properties.html

Attach the VPC , subnet and the security group of your RDS DB in the VPC connection section . Visit the RDS cluster and you will see the details under the respective tabs of the cluster as seen below.

Name the connection . We shall name it “JDBC connection” in this case.

Now the connection with the DB has been made successfully .

The final job code should look as follows

try : 

    url = "jdbc:mysql://" + <cluster-address >+ ":" + <port> + <db>
    dbtable = "order_type"
    order_type_mysql_options = {
        "url": url , // Url 
        "dbtable": dbtable, // target table in DB
        "user": 'admin', // name of the user
        "password": 'password'} // password
        
    order_type_ddf = glueContext.create_dynamic_frame.from_options(connection_type="mysql", connection_options = order_type_mysql_options)
    order_type_df = order_type_ddf.toDF();
    loggingdict = order_type_df.first().asDict()
    print(loggingdict)
    job.commit()

except Exception as err :
    print(auth)
    e_log_desc = str(err).replace("'", "")     
    print(e_log_desc) 
    print("error occured")
    sys.exit(0)

The exception block will help you track the precise error in case of an issue.

Go to job details and add the created connection ( JDBC ) to the list of connections.

Does AWS connection associates your AWS with the VPC ?

Yes , although this is not clear in the documentation , this assumption is based on the following point published in the AWS glue user guide and a stack over flow response

You can configure your AWS Glue ETL jobs to run within a VPC when using connectors. You must configure your VPC for the following, as needed:

Configure a VPC for your ETL job

You can configure your AWS Glue ETL jobs to run within a VPC when using connectors. You must configure your VPC for the…

docs.aws.amazon.com

Why AWSS Glue connections need a VPC S3 endpoint

I have a psql RDS on the same AWS account where I am trying to set up a glue connection to it. I used the RDS option…

stackoverflow.com

Hence one would require to connect an S3 VPC endpoint since the S3 scripts reside in the S3 bucket which is not a part of your VPC .

Recommendation : Use AWS secrets manager in place of directly using the admin name and password. Use the Secrets manager VPC end point for the same.