Automating Python Scripts

Running automated scripts on a server on continuous basis, instead of manually.
data science
python
seo
automation
Author

Elias Dabbas

Published

June 16, 2024

This is the summary of the video, together with the code used. We discuss the process of automating Python scripts so they can run online on a continuous basis. This way, you don’t have to worry about manually running them every time, and can scale your work. We’ll cover everything from basic script writing to setting up a server that’s always connected to the internet. Some parts might be familiar if you’re experienced, while others might be new. Let’s get started.

Note

The transcription of the video was done with OpenAI’s whisper (speech to text) model.

After that, I used OpenAI’s GPT4o model to format the text as you can see on this page. It knows how to format text, code, file names, Python, and bash. I basically copied and pasted that output onto this page.

Enjoy!

Writing a Simple Script

We’ll start with a very simple task: writing a script that prints the current date and time. The following script uses Python’s datetime module to achieve this:

import datetime

now = datetime.datetime.utcnow()
print("The time is now", now)
Note

This function is being depracated, and the alternative is to use

datetime.datetime.now(datetime.UTC)

It is kept for consistency with the video

Adding Time to a File

Next, let’s update the script to append the current date and time to a file called dates.txt.

import datetime

now = datetime.datetime.utcnow()
with open("dates.txt", "a") as file:
    print("The time is now", now, file=file)

You can run this script multiple times, and it will add new lines with updated times to dates.txt.

Running the Script from Command Line

To run the script from the command line, save it as print_dates.py and use the following commands:

# List current directory
pwd

# List files

ls

# Run the script
python print_dates.py

After running the command, you can check dates.txt to see if new lines with the current date and time were added.

Scheduling Scripts with Crontab

Now, let’s automate the script so it runs at regular intervals. For this, we’ll use cron, a time-based job scheduler in Unix-like operating systems. Open the crontab with the following command:

crontab -e

Add the following line to schedule the script to run every minute:

* * * * * /path/to/python /path/to/print_dates.py

Replace /path/to/python and /path/to/print_dates.py with the actual paths on your system.

Moving to an Online Server

Setting Up the Server

We’ll use a server that’s online 24/7. Here’s how you can connect to your server using SSH:

ssh username@server_ip_address

Create the script print_dates_online.py on the server, copy the code from print_dates.py, and save it. Make sure Python is installed on the server. You can use the following command:

sudo apt-get install python3

Setting Up a Python Virtual Environment

Create a virtual environment to ensure consistent execution with all necessary dependencies:

python3 -m venv myenv
source myenv/bin/activate
pip install <necessary-packages>

Running the Script with Crontab on the Server

Add the crontab entry to run the script every minute, just like before:

crontab -e

Add the following line:

* * * * * /path/to/myenv/bin/python /path/to/print_dates_online.py

Collecting Sitemaps

For a more practical example, we’ll collect sitemaps and store them as CSV files:

import datetime
import advertools as adv

name = "Google"
sitemap_url = "https://www.google.com/sitemap.xml"
sitemap_df = adv.sitemap_to_df(sitemap_url, recursive=False)

now = datetime.datetime.utcnow().strftime('%Y_%m_%d_%H_%M_%S')
sitemap_df.to_csv(f"sitemap_{name}_{now}.csv", index=False)

Handling Multiple Sitemaps and Errors

To collect and handle multiple sitemaps, create a list of (name, URL) tuples:

import datetime
import advertools as adv

names_sitemaps = [
    ("google", "https://www.google.com/sitemap.xml"),
    ("yahootopics", "https://www.yahoo.com/topics-sitemap_index_US_en-US.xml.gz"),
    ("bingmapsapi", "https://www.bing.com/api/maps/mapcontrol/isdk/sitemap.xml"),
    ("wrong", "https://doesnotexist.com/sitemap.xml")
]

for name, sitemap in names_sitemaps:
    try:
        sitemap_df = adv.sitemap_to_df(sitemap, recursive=False)
        now = datetime.datetime.utcnow()
        time = now.strftime('%Y_%m_%d_%H_%M_%S')
        sitemap_df.to_csv(f"sitemap_{name}_{time}.csv", index=False)
    except Exception as e:
        with open("sitemap_errors.txt", "at") as file:
            print(str(e), file=file)

Synchronizing Files with Rsync

You can use rsync to synchronize files between your server and your local machine efficiently:

rsync username@server_ip_address:/home/username/*.csv ./sitemap_files/

This command will copy all CSV files from the server to the sitemap_files directory on your local machine.

More detail

You will see more details in the video and a few questions/answers, as well as how to setup a server. The code here should get you started with creating and automating your scripts any way you want.