This is the summary of the video, together with the code used. We discuss the process of automating Python scripts so they can run online on a continuous basis. This way, you don’t have to worry about manually running them every time, and can scale your work. We’ll cover everything from basic script writing to setting up a server that’s always connected to the internet. Some parts might be familiar if you’re experienced, while others might be new. Let’s get started.
The transcription of the video was done with OpenAI’s whisper
(speech to text) model.
After that, I used OpenAI’s GPT4o
model to format the text as you can see on this page. It knows how to format text, code, file names, Python, and bash. I basically copied and pasted that output onto this page.
Enjoy!
Writing a Simple Script
We’ll start with a very simple task: writing a script that prints the current date and time. The following script uses Python’s datetime module to achieve this:
Adding Time to a File
Next, let’s update the script to append the current date and time to a file called dates.txt
.
import datetime
now = datetime.datetime.utcnow()
with open("dates.txt", "a") as file:
print("The time is now", now, file=file)
You can run this script multiple times, and it will add new lines with updated times to dates.txt
.
Running the Script from Command Line
To run the script from the command line, save it as print_dates.py
and use the following commands:
After running the command, you can check dates.txt
to see if new lines with the current date and time were added.
Scheduling Scripts with Crontab
Now, let’s automate the script so it runs at regular intervals. For this, we’ll use cron
, a time-based job scheduler in Unix-like operating systems. Open the crontab with the following command:
Add the following line to schedule the script to run every minute:
Replace /path/to/python
and /path/to/print_dates.py
with the actual paths on your system.
Moving to an Online Server
Setting Up the Server
We’ll use a server that’s online 24/7. Here’s how you can connect to your server using SSH:
Create the script print_dates_online.py
on the server, copy the code from print_dates.py
, and save it. Make sure Python is installed on the server. You can use the following command:
Setting Up a Python Virtual Environment
Create a virtual environment to ensure consistent execution with all necessary dependencies:
Running the Script with Crontab on the Server
Add the crontab entry to run the script every minute, just like before:
Add the following line:
Collecting Sitemaps
For a more practical example, we’ll collect sitemaps and store them as CSV files:
Handling Multiple Sitemaps and Errors
To collect and handle multiple sitemaps, create a list of (name, URL)
tuples:
import datetime
import advertools as adv
names_sitemaps = [
("google", "https://www.google.com/sitemap.xml"),
("yahootopics", "https://www.yahoo.com/topics-sitemap_index_US_en-US.xml.gz"),
("bingmapsapi", "https://www.bing.com/api/maps/mapcontrol/isdk/sitemap.xml"),
("wrong", "https://doesnotexist.com/sitemap.xml")
]
for name, sitemap in names_sitemaps:
try:
sitemap_df = adv.sitemap_to_df(sitemap, recursive=False)
now = datetime.datetime.utcnow()
time = now.strftime('%Y_%m_%d_%H_%M_%S')
sitemap_df.to_csv(f"sitemap_{name}_{time}.csv", index=False)
except Exception as e:
with open("sitemap_errors.txt", "at") as file:
print(str(e), file=file)
Synchronizing Files with Rsync
You can use rsync
to synchronize files between your server and your local machine efficiently:
This command will copy all CSV files from the server to the sitemap_files
directory on your local machine.
More detail
You will see more details in the video and a few questions/answers, as well as how to setup a server. The code here should get you started with creating and automating your scripts any way you want.