I spent the last couple weeks enjoying Thanksgiving, time with my kids and family, and learning how to extract data from an API using Python (with the assistance of ChatGPT).
After writing my post on the music industry, I wanted to find data specifically on live music. I scoured the internet and didn’t find exactly what I wanted, so I stumbled back to a website I frequent quite a bit: setlist.fm. This site is like Wikipedia for concert set lists. Anyone can submit the set list for any shows they attend, and can also input what time the headliner started playing (which is useful if you’re not really keen on seeing the opener). Setlist.fm also has an open source API (an API is a programming interface that allows two software components to communicate with each other). In this context, the API allows users to apply for an API key and use it to extract the data from the website for personal or other use.
Where Python and ChatGPT comes in.
One of the easiest ways to extract data from an API is using Python. Now I know a little Python, but I am definitely not an expert yet. So I knew I would need a little assistance on this project in order to get the data I wanted. If you’re a programmer, data analyst, or anyone who works with code regularly, you are very familiar with Googling different ways to write code. I’ve spent hours searching the internet for solutions to bugs in my code. I only recently started using ChatGPT to correct my code, and it is life-changing. The time it takes to simplify and de-bug code is amazing. ChatGPT basically does the Googling for you, and presents the answers to all your coding questions almost exactly how you need it. Honestly, it’s not a bad way to learn code either. I definitely learned a lot about Python doing this project.
Once I reviewed the API documentation on setlist.fm, applied for an api key, I started coding. Or really, I started asking ChatGPT some questions, and testing my code in a jupyter notebook. I asked ChatGPT to revise the code several times to fit my needs, but in the end I had a code block that gave me what I needed—a dataset of all Taylor Swift’s concerts and set lists from the beginning of her tour through the end of November 2023. I’ve pasted the code I used at the bottom of this post.
Taylor Swift - The Eras Tour Visualization
After cleaning up the data a bit, which included having to separate each song list into separate columns for each song, I was able to import the dataset into Tableau and start playing with it. I had a little fun with this visualization, and am happy with the result. Although I’ve called it Taylor Swift - The Eras Tour Dashboard, it’s more of a visualization as it’s not a “plug and play” dashboard.
The visualization covers the entirety of Taylor’s performing career, with a focus on her five world tours. It should have been seven or eight tours by now, but Lover Fest was cancelled in summer 2020 due to the COVID-19 pandemic (you’ll see a spot saved on the timeline for that tour, but with no shows), and folklore never had a dedicated tour.
The timeline starts in May 2009 with her first world tour, for her album Fearless. The size of the circles represents the number of songs played at each show—The Eras Tour is clearly the longest live show. Each Eras Tour concert had Taylor playing at least 48 songs! That’s twice the length of a what a typical headliner plays at their concerts these days. She really is amazing. You can also clearly see that Taylor’s most played song is Love Story, which she has performed 546 times now (do you think she’s sick of it yet?). And despite distancing herself from her country roots through the years, Nashville is still the city where she’s performed the most.
Please check out my viz on Tableau Public and let me know what you think!
My Python code used to extract data from the setlist.fm API
import requests
import csv
import os
import time
# Change the working directory to a new path
new_directory = "[insert directory path]"
os.chdir(new_directory)
# Replace 'your_api_key' and 'your_artist_id' with your actual API key and artist ID
api_key = 'IsWn1iy31yFkSlIhRLCYe-868TiY0Hvf30Sm'
artist_id = '20244d07-534f-4eff-b4d4-930878889970'
endpoint = f'https://api.setlist.fm/rest/1.0/artist/{artist_id}/setlists'
# Specify the initial parameters
params = {
'p': 1, # Start with the first page
}
all_results = []
while True:
# Make the API request
headers = {
'Accept': 'application/json',
'x-api-key': api_key,
}
response = requests.get(endpoint, params=params, headers=headers)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Parse and work with the response data
data = response.json()
# Extract the setlists from the current page and add them to the overall results
current_results = data.get('setlist', [])
all_results.extend(current_results)
# Check if there are more pages
total = data.get('total', 0)
items_per_page = data.get('itemsPerPage', 0)
current_page = data.get('page', 0)
if current_page * items_per_page < total:
# Update the parameters for the next page
params['p'] += 1
else:
# Break the loop if there are no more pages
break
else:
print(f"Error: {response.status_code}")
print(response.text)
break
# Introduce a delay between requests to avoid hitting rate limits
time.sleep(2) # Sleep for 2 second (adjust as needed)
# Write all results to a CSV file
csv_file_name = 'taylor_swift.csv'
csv_header = ["artist_name", "eventDate", "city", "state", "country", "tour", "song"] # Replace with your desired columns
with open(csv_file_name, mode='w', newline='', encoding='utf-8') as csv_file:
csv_writer = csv.writer(csv_file)
# Write the header
csv_writer.writerow(csv_header)
# Write setlist data
for result in all_results:
# Extract data from the result and write to CSV
row_data = [
result.get('artist', {}).get('name', ''),
result.get('eventDate', ''),
result.get('venue', {}).get('city', []).get('name', ''), # Assuming 'venue' contains 'city'
result.get('venue', {}).get('city', []).get('state', ''),
result.get('venue', {}).get('city', []).get('country', []).get('name', ''),
result.get('tour', {}).get('name', ''),
# Extracting song names from the 'set' key
', '.join([song['name'] for set_item in result.get('sets', {}).get('set', []) for song in set_item.get('song', [])]),
]
csv_writer.writerow(row_data)
print(f"CSV file '{csv_file_name}' created successfully.")