This article is linked to my Pycon Italia presentation “Exploring Art with Python: Building an Italian Art Bot”.
Or maybe it’s the other way around and a partial draft of this article was at the foundation of the slides? Hard to figure out, for sure without building the bot neither the talk nor this post would see the light.
This is going to be a pretty long post, so if you only want to see the bot click here.
Otherwise grab a cup of coffee, or better an espresso, and let’s start from the origin.
Context
Last year I challenged myself to build 3 side projects between my 30th and 31st birthday, given that the previous year I met the same goal.
In the meanwhile I got early access to the social Bluesky. For those of you that don’t like to try out new social network apps, Bluesky it’s a decentralized microblogging social platform. It’s like the ol’ good Twitter plus its code is open-source and you have extreme ownership of your timeline.
While reading some posts, I discovered some bots publishing relevant information and interesting images. In particular my attention was drawn to art bots: some were focused on specific painters like Van Gogh and Monet, others were much more general. Following a quick search it looked like it was possible to build those using also Python and the Bluesky documentation was good.
At this point, I had zero new projects live and I thought: why not build a bot myself?
Now I had a general goal, but I had to decide something more specific, unless I wanted to delegate to Chat GPT this task (Me: I want to make a Bluesky bot possibly focused on art, what are your suggestions? ChatGPT: go outside and have a walk).
After thinking about it, I resolved I wanted something more inclusive than a single painter bot and less general than one publishing any great artwork. Following an incredible association (I am Italian → Italians are not so present on Bluesky → I could bring something more about Italy on this platform) I decided the bot would be on Italian art, in particular Italian paintings.
Another choice I made was to be kind to the followers and not overwhelming them with a picture every 15 minutes even though given the multitude of Italian artworks it would be quite easy to publish a new picture at that frequency. 3 images per day could be a good number.
Last point was to make the project as inexpensive as possible, as in the case of whilemodeltrains.com. Probably, this time I could even avoid paying for a domain.
So now the goal was clear: building a Python bot of Italian art publishing 3 images per day on Bluesky.
Looking closely at the last sentence, I could not help but think that I have seen something similar in the past. It didn’t take long to find the unconscious source of inspiration was the blog post Building a Twitter art bot with Python, AWS, and socialist realism art by Vicki Boykis one of the best bloggers and practictioner in the data realm.
Basically instead of socialist realism I would publish Italian paintings and instead of Twitter I would publish on Bluesky. On the technical side I would of course keep Python and not use AWS since I would like to spend as little as possible.
Project
Having the idea, I needed to outline a roadmap to have the bot live and (working). The project could be divided in 3 phases:
- Collecting the images: retrieving and selecting the images I want to publish
- Building the Python bot: implementing the actual bot that takes an image and publishes on Bluesky
- Deployment: publishing the bot online
Collecting the images
Reading again Vicki’s post, I found out that her source for images was WikiArt that is basically the Wikipedia of artworks. You can find there a huge amount of artworks, divided by nationality, genre, art movement and so on.
The really good thing, from a developer point of view, is that it has a documented API.
From behind the curtain - Gaetano Bellei
First of all, I wanted to get all the artists available and there is an endpoint just for that
import requests
import json
response = requests.get("http://www.wikiart.org/en/App/Artist/AlphabetJson?v=new&inPublicDomain={true/false}")
if response.status_code == 200:
content = json.loads(response.text)
with open("artists.json", "w") as file:
json.dump(content, file, indent=4)
Unfortunately, I did not find in the documentation how to select just Italian artists. So I was in for a little detective activity.
For each painter we have a dictionary with relevant information like biographical entries, url
and dictionaries
field (actually it is mistyped everywhere dictonaries, but I make way bigger errors so who I am to judge?). Here is an example with Vincent van Gogh
{
"contentId": 204915,
"artistName": "Vincent van Gogh",
"url": "vincent-van-gogh",
"lastNameFirst": "van Gogh Vincent ",
"birthDay": "/Date(-3684528000000)/",
"deathDay": "/Date(-2506464000000)/",
"birthDayAsString": "March 30, 1853",
"deathDayAsString": "July 29, 1890",
"image": "https://uploads8.wikiart.org/images/vincent-van-gogh.jpg!Portrait.jpg",
"wikipediaUrl": "https://en.wikipedia.org/wiki/Vincent_van_Gogh",
"dictonaries": [16743, 317]
}
and this is for Sandro Botticelli
{
"contentId": 188828,
"artistName": "Sandro Botticelli",
"url": "sandro-botticelli",
"lastNameFirst": "Botticelli Sandro ",
"birthDay": "/Date(-16567372800000)/",
"deathDay": "/Date(-14504486400000)/",
"birthDayAsString": "c.1445",
"deathDayAsString": "May 17, 1510",
"image": "https://uploads0.wikiart.org/images/sandro-botticelli.jpg!Portrait.jpg",
"wikipediaUrl": "http://en.wikipedia.org/wiki/Sandro_Botticelli",
"dictonaries": [302, 2451, 925]
}
and Leonardo da Vinci
{
"contentId": 225091,
"artistName": "Leonardo da Vinci",
"url": "leonardo-da-vinci",
"lastNameFirst": "da Vinci Leonardo",
"birthDay": "/Date(-16337462400000)/",
"deathDay": "/Date(-14221785600000)/",
"birthDayAsString": "April 15, 1452",
"deathDayAsString": "May 2, 1519",
"image": "https://uploads0.wikiart.org/images/leonardo-da-vinci.jpg!Portrait.jpg",
"wikipediaUrl": "http://en.wikipedia.org/wiki/Leonardo_da_Vinci",
"dictonaries": [303, 925]
}
It looks like the cryptic dictionaries
field could be helpful for finding Italian paintings. They should be those having value 925. Looking for occurrences of this number this idea was confirmed. I later figured out that was an endpoint returning this information, but well who does not want to play some CTRL+F?
Since the API provided an endpoint to list all the paintings given a painter url
property,
I wrote this piece of code for filtering only those with dictionaries
value containing 925 and getting their paintings information
ITALIAN_CODE = 925
def get_paintings_by_artist(artist_url):
url = f"https://www.wikiart.org/en/App/Painting/PaintingsByArtist?artistUrl={artist_url}&json=2"
response = requests.get(url)
if response.status_code == 200:
return json.loads(response.text)
italian_artists = [artist for artist in artists if ITALIAN_CODE in artist["dictonaries"]]
paintings_by_artist = {artist["url"]: get_paintings_by_artist(artist_url=artist["url"])
for artist in italian_artists}
paintings_by_artist = {artist_url: value
for artist_url, value in paintings_by_artist.items()
if value is not None or len(value) == 0}
Each painting object contained information like title, artist name, year of completion and a url to retrieve the image. For instance this is the one for “The Birth of Venus”
{
"title": "The Birth of Venus",
"contentId": 189114,
"artistContentId": 188828,
"artistName": "Sandro Botticelli",
"completitionYear": 1485,
"yearAsString": "1485",
"width": 1600,
"image": "https://uploads6.wikiart.org/images/sandro-botticelli/the-birth-of-venus-1485(1).jpg!Large.jpg",
"height": 1067
}
Having this information we can download the images. For making stuff easier I saved each image by its contentId
since it is unique. The paintings for which I had information were around 19500 and so I decided to add a little sleep between each download to avoid overwhelming the API
import requests
from itertools import chain
import time
from random import random
def save_image(image, path):
with open(path, "wb") as file:
file.write(image)
unpacked_paintings = list(chain(*paintings_by_artist.values()))
for item in unpacked_paintings:
time.sleep(0.1 + round(random(), 2))
image = requests.get(item["image"])
if image.status_code == 200:
save_image(image.content, f"images/{item['contentId']}.JPG")
At the end of this operation, I had roughly 16000 because some images were not available. I didn’t make a big deal out of it since they were already too many for my scope.
Choosing the paintings
Now it was time for a fun activity (that is also really important when you have to build an image recognition model): looking at the images!
Discarding Round
Indeed I found that many had issues:
-
some had low quality, for instance the frame and wall around the painting included
The Three Dead and Three Loud - Jacopo Bellini
-
others were not paintings or really uninteresting details (at least to my untrained eyes)
Procession of the Queen of Sheba (detail) - Piero della Francesca (1466)
-
a few subjects were overrepresented, like scenes of the Virgin holding the Child, crucifixions and beheadings. Of course this is due to the cultural context of the painters. During Renaissance, and before, Biblical themes were central (also because members of the clergy were among the most important patreons)
The Virgin and Child - Giovanni Antonio Boltraffio (1480)
After this discarding phase around 8000 images were left.
Picking Round
Then I had to pick my favorites. I wanted to reduce the collection to something around 2000. Given the goal of publishing 3 images per day. The criteria I used for the selection were:
-
picking really famous paintings. Could I call it an Italian Art Bot if I did not include the most known masterpieces?
Mona Lisa - Leonardo da Vinci (1519)
-
selecting artworks of different art movements and styles, otherwise more recent centuries would be under underrepresented
Apparenti Tinte - Carla Accardi (1990)
-
choosing painters less known to the larger public
Cosmic Sunflowers - Chiara Magni (2020)
Overall this choice was based on personal taste. After countless hours of artwork selection, I was down to 1761 paintings. Enough for a year and half without bothering followers. It’s now time for building the bot.
Building the Bot
Before diving into the code for the bot, let’s take a look at the architecture and the database structure.
Architecture
- on the left a Postgres SQL where I dumped the information relative to the selected paintings
- in the middle is a monolith containing the Python code for getting the information from the database, selecting the image and publishing the post
- on the right there is Bluesky
Database Table Structure
A Postgres SQL table is enough to contain all the information we need for the bot.
The fields are:
title
: title of the paintingartistName
: name of the artistyear
: completion year (when available)contentId
: unique key of the image useful to retrieve it since it is named after itisPublished
: a boolean field to track if the artwork has been already published, to avoid publishing multiple times the same entry
Here are 5 of the entries:
title | artistName | year | contentId | isPublished |
---|---|---|---|---|
Apparenti tinte | Carla Accardi | 1990 | 319335 | true |
Ecstasy of St. Francis | Giotto | 1300 | 192729 | false |
The Birth of Saint John the Baptist | Artemisia Gentileschi | null | 9223372032559906181 | false |
The Last Judgement | Michelangelo | 1541 | 193265 | false |
St. Augustine in his cell | Sandro Botticelli | 1490 | 189106 | false |
Posting on Bluesky
Luckily, Bluesky is developer friendly and provides detailed instructions on programmatically communicating with the platform, including loading media. I borrowed most of the code to interact with the platform from this blog post.
Here you can see the code for getting a session, loading an image to a blob and publishing a post with the image
import requests
import os
from dotenv import load_dotenv
import json
from datetime import datetime, timezone
def get_session():
resp = requests.post("https://bsky.social/xrpc/com.atproto.server.createSession",
json={"identifier": os.environ.get("BLUESKY_HANDLE"),
"password": os.environ.get("BLUESKY_APP_PASSWORD")})
resp.raise_for_status()
session = resp.json()
return session
def load_image_to_blob(path, session):
*, IMAGE_MIMETYPE = path.rsplit(".", maxsplit=1)
with open(path, "rb") as f:
img_bytes = f.read()
# this size limit is specified in the app.bsky.embed.images lexicon
if len(img_bytes) > 1000000:
raise Exception(
f"image file size too large. 1000000 bytes maximum, got: {len(img_bytes)}"
)
resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
headers={
"Content-Type": IMAGE_MIMETYPE,
"Authorization": "Bearer " + session["accessJwt"],
},
data=img_bytes,
)
resp.raise_for_status()
blob = resp.json()["blob"]
return blob
def post_image(path, text, session, alt=None):
blob = load_image_to_blob(path, session)
now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
if alt is None:
alt = text
post_with_image = {
"$type": "app.bsky.feed.post",
"text": text,
"createdAt": now,
"embed": {
"$type": "app.bsky.embed.images",
"images": [{"alt": alt, "image": blob}],
},
}
post_with_image_resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.createRecord",
headers={"Authorization": "Bearer " + session["accessJwt"]},
json={
"repo": session["did"],
"collection": "app.bsky.feed.post",
"record": post_with_image,
},
)
post_with_image_resp.raise_for_status()
return json.dumps(post_with_image_resp.json(), indent=4)
Bot Helper functions
For avoiding to cram all the code needed by the main script, some simple functions are defined in utils.py
script for formatting the post text with the information available, building the path to retrieve the image thanks to the contentId
and checking that the token provided is equal to the one available in the environment variables
from functools import wraps
import os
from flask import request, jsonify
def format_text(entry):
if entry["year"] is None:
return f"{entry['title']} - {entry['artistName']} 🇮🇹 #art"
else:
return (f"{entry['title']} - {entry['artistName']} ({int(entry['year'])}) 🇮🇹 #art")
def build_image_path(content_id, folder):
return os.path.join(folder, f"{content_id}.JPG")
def verify_token(token):
return token == os.environ.get("TOKEN")
The last function is used by a more complex one (written by ChatGPT) for a decorator to make the endpoint require authorization
def require_token(func):
@wraps(func)
def wrapper(*args, **kwargs):
token = request.headers.get("Authorization")
if not token or not token.startswith("Bearer "):
return jsonify({"error": "Unauthorized"}), 401
token = token.split(" ")[1]
if not verify_token(token):
return jsonify({"error": "Invalid token"}), 401
return func(*args, **kwargs)
return wrapper
It is not a perfect solution, but for this use case is good enough.
The Bot
Having all the above the bot consists of a Flask app with the endpoint post_painting
. Let’s look at the script.
First the imports and the app declaration
import os
from dotenv import load_dotenv
from flask import Flask, jsonify
import psycopg2
from post_artwork import get_session, post_image
from utils import format_text, build_image_path, require_token
app = Flask(__name__)
Then some functions that I refactored out of the main function (BDD blog post driven development) for:
- instantiating the database connection
def instantiate_database_connection():
connection = psycopg2.connect(
database=os.environ.get("POSTGRES_DATABASE"),
host=os.environ.get("POSTGRES_HOST"),
user=os.environ.get("POSTGRES_USER"),
password=os.environ.get("POSTGRES_PASSWORD"),
port=os.environ.get("POSTGRES_PORT"),
)
return connection
- selecting an entry of the database among the ones not yet published
def select_painting(table_name, cursor):
# Get painting info to publish
columns = ["title", "artistName", "year", "contentId", "isPublished"]
cursor.execute(
f"""SELECT {", ".join(columns)} FROM {table_name} WHERE isPublished = False ORDER BY RANDOM() LIMIT 1"""
)
entry = cursor.fetchone()
entry = dict(zip(columns, entry))
return entry
- posting the image with the description on Bluesky
def post_painting(folder_images, entry):
# Post painting
session = get_session()
path = build_image_path(content_id=entry["contentId"], folder=folder_images)
text = format_text(entry)
post_image(path=path, text=text, session=session)
return text
- updating of the
isPublished
field for the entry selected
def update_is_published(table_name, cursor, entry):
# Update database
cursor.execute(
f"""UPDATE {table_name} SET isPublished = True WHERE contentId = {entry["contentId"]}"""
)
the endpoint that is the core of the bot ends up to be
@app.route("/post_image", methods=["GET"])
@require_token
def _post_image():
load_dotenv()
table_name = os.environ.get("TABLE_NAME")
folder_images = os.environ.get("FOLDER_IMAGES")
connection = instantiate_database_connection()
# Create a cursor object to execute SQL queries
cursor = connection.cursor()
entry = select_painting(table_name, cursor)
text = post_painting(folder_images, entry)
update_is_published(table_name, cursor, entry)
# Commit the changes and close the connection
connection.commit()
connection.close()
return jsonify(f"Successfully posted {text}")
and to close the script, the indispensable
if __name__ == "__main__":
app.run()
Deployment
Hosting
My default option when I have to deploy a personal project is Vercel that comes with an excellent free tier.
I use it for this very blog and While Model Trains.
Unfortunately the size of the Serverless function this time was too big, due to the images and .git
folder.
Head studies for the battle of Anghiari - Leonardo da Vinci (1504)
At first I tried to work on the size of the function, but when I figured my effort was unsuccessful I opted for a workaround.
I kept the Postgres Database on Vercel and moved the Serverless function to Render. It was my first time using it, but actually the experience was neat.
The repo is hosted on Github, and when a change is pushed to the main branch the Flask app on Render is updated.
Scheduling
Now I had to schedule the posts. It looked like having cron jobs 3 times a day is a premium feature on both services and given the requirement of spending as little as possible I looked for another free option before pulling out my credit card.
Luckily I found cron-job.org, which allows scheduling cron-jobs for free.
The only problem is that Render spins down a free web service that goes 15 minutes without receiving inbound traffic and the first interaction when idle takes a little more than 30 seconds. Guess what is the maximum timeout on cron-job.org? Right, 30 seconds.
At first (up to the presentation at Pycon), I solved this by making another cron-job that would ping at regular interval the service to keep it alive.
That was done by adding a dummy endpoint /ping
to the Flask app that would do nothing except keeping the service awake.
This implementation has the drawbacks of making unecessary calls to the service and also basically waste the 750 free instance hours provided by Render just for 3 posts a day.
Therefore, I browsed Reddit for alternatives to cron-job.org and I found FastCron that has cron-jobs timeout of 60 seconds.
If you want to see the final result you can see it live on Bluesky (it does not require registration)
Here is a preview if you don't want to click on the link and follow the bot 😭
What did I learn?
This project has been a cool learning experience. In particular I enjoyed:
- learning how to build a Bluesky Bot (obviously)
- integrating different services, some of which used for the first time, that I could consider for my next side projects
- discovering a bunch of paintings and artists
- being a speaker at Pycon Italia thanks to this project!
(Possible) Next Steps
Some of the next steps I thought of for this project are:
- cleaning a bit the code and make it open source
- retrieving and showing the location where the paintings are exhibited as one of the first followers of the bot suggested
- adding more meaningful alt text (maybe using LLM to describe the image?)
- promoting the bot to have more people getting to know famous and less famous Italian paintings
Stay tuned for the new features!