Hey guys, In this blog we will see how we can schedule a Python Script in AWS Glue as a Job that will run every hour. I have tried to make this tutorial as easy as possible with each and every step explained.
So without any further due, let’s do it…
Step 1 – Search and Open AWS Glue in your AWS account
Step 2 – Open Jobs from Legacy Pages.
- In the left sidebar, we can see Legacy Pages.
- Click on that and open Jobs from there.
Step 3 – Add Job
Click on Add Job.
Step 4 – Configure your Job
- Name your Job.
- Change your bucket where Glue will store your Python Script and temporary files.
- An finally choose an IAM Role.
- Rest keep everything as it is.
- Scroll down and click on Next.
- A screen like the one below will pop up and ask you to make connections. My Python Script doesn’t require any connection, so I will not select any of the connections.
- You can click on Save job and edit script.
Step 5 – Let’s add our Python code
Now in the left menu bar click on the Jobs(new) which will open up a console where we can add our code and schedule it later.
A screen like the one below will open where you need to select your Glue Job.
- Once you click on your Job, a code editor will open where you need to paste the Python Code that you want to schedule.
- Following is the demo code that I wrote to check my Glue Job.
- Now you can see that I imported pandas and numpy in my code and these are not already present in the Glue environment.
- So now we will add these libraries to our environment.
Steps to include external libraries:
- Open Job details.
- Scroll down and click on Advanced Properties.
- Scroll down and under Job Parameters click on ‘Add new parameter’.
- Under Key add ‘–additional-python-modules’ and under Value add comma-separated libraries.
- Click on Save.
Step 6 – Let’s schedule a Python Script in AWS Glue as a Job
- Click on Schedules.
- Click on Create Schedule.
- Add a Name, and create a schedule.
Step 7 – Let’s run it
- Click on Run and it will run your Job.
- And it should run successfully.
You can also see All Logs, Output Logs, and Error Logs on this page.
You can see the messages here that we printed from our code.
And this is how you can schedule a Python Script in AWS Glue as a Job.
So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time ?…
Read my previous post: Easiest Way to use an Amazon S3 trigger to invoke a Lambda function