Cleaning Metal Archives Data Using SQL

Since we all know multi-subgenre Metal bands are cooler, I parsed the genre field so each band could represent multiple subgenres.
SQL
Data Cleaning
Author

Kevin B. Miller

Published

July 8, 2024

Database Connection

Metal bands often incorporate influences from multiple subgenres, or shift subgenres throughout their career. Accordingly, many bands in the Metal Archives have a genre field with multiple subgenres. Rather than forcing each band into a primary subgenre, I wanted each band to be characterized as having a presence or absence of each subgenre. The code in this document accomplished these steps using SQL code by connecting to the metal_archives_table from the metallum_bands.db via an RSQLite instance.

# import library from R
library("DBI")
library(RSQLite)
library(knitr)
# set options to print more rows (up to 100)
opts_chunk$set(echo=TRUE, sql.max.print=100)
# save connection to metallum_bands.db as con
con <- dbConnect(RSQLite::SQLite(), "metallum_bands.db")
SELECT * 
FROM metal_archives_table;
Displaying records 1 - 10
band_name country genre status band_id
A // Solution United States Crust Punk/Thrash Metal Split-up 0
A 12 Gauge Tragedy United States Deathcore Split-up 1
A Balance of Power United States Melodic Death Metal/Metalcore Active 2
A Band Named Jon United States Brutal Death Metal/Grindcore Active 3
A Band of Orcs United States Death/Thrash Metal Active 4
A Baptism by Fire Canada Progressive Power Metal Active 5
A Bastard’s Breath United States Raw Black Metal Active 6
A Belt Above Black United States Melodic Death Metal/Metalcore Active 7
A Billion Limbs United Kingdom Groove Metal/Deathcore Active 8
A Binding Spirit Germany Black Metal/Ambient Active 9

Steps for Subgenre Structure

1. Create TEMP Table for Processing Subgenres

To start, I created a TEMP table, processed_genres, to process the genre field in the data.

CREATE TEMP TABLE processed_genres AS
SELECT 
    band_id, 
    band_name,
    country,
    genre, 
    status
FROM metal_archives_table;

2. Create a New Column for Each Basic Metal Archives Subgenre

Once processed_genres was created, I added an integer column for each base subgenre from the Metal Archives.

ALTER TABLE processed_genres ADD COLUMN black INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN death INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN doom INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN electronic_industrial INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN experimental INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN folk INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN gothic INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN grindcore INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN groove INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN heavy INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN metalcore_deathcore INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN power INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN progressive INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN speed INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN symphonic INTEGER DEFAULT 0;
ALTER TABLE processed_genres ADD COLUMN thrash INTEGER DEFAULT 0;
SELECT * 
FROM processed_genres
LIMIT 10;
Displaying records 1 - 10
band_id band_name country genre status black death doom electronic_industrial experimental folk gothic grindcore groove heavy metalcore_deathcore power progressive speed symphonic thrash
0 A // Solution United States Crust Punk/Thrash Metal Split-up 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 A 12 Gauge Tragedy United States Deathcore Split-up 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 A Balance of Power United States Melodic Death Metal/Metalcore Active 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 A Band Named Jon United States Brutal Death Metal/Grindcore Active 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 A Band of Orcs United States Death/Thrash Metal Active 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 A Baptism by Fire Canada Progressive Power Metal Active 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 A Bastard’s Breath United States Raw Black Metal Active 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 A Belt Above Black United States Melodic Death Metal/Metalcore Active 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 A Billion Limbs United Kingdom Groove Metal/Deathcore Active 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 A Binding Spirit Germany Black Metal/Ambient Active 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3. Identify Bands with Appropriate Subgenres

The code block below parsed the genre field in processed_genres and added a 1 to the value of a given subgenre when the band’s genre field included that subgenre. While I attempting to align with the Metal Archives’ original classifications, I made a few exceptions when an unclassified subgenre alone could be classified under an existing subgenre, without any additional phrasing or another subgenre. Examples of this are shown below, and reflected by inclusions to the code block below.

  • Southern alone would get put with both Doom/Stoner/Sludge and Groove

  • Goregrind alone would get put with Grindcore

  • Neoclassical alone would get put with Symphonic

Conversely, while Post-Metal typically went under Doom/Stoner/Sludge, anything with Post- in that subgenre was accompanied by Doom, Stoner, or Sludge. This suggests that Post-Metal alone was not enough to get grouped under Doom/Stoner/Sludge, and was therefore not classified under that subgenre here (Post-Metal and other unclassified subgenres are shown in section 3.A. Identify Records Not Belonging to a Subgenre). Since there was a vast amount of data to work with, no further processing was conducted for bands of unclassified subgenres.

UPDATE processed_genres SET
    black = CASE WHEN genre LIKE '%Black%' THEN 1 ELSE 0 END,
    death = CASE WHEN genre LIKE '%Death%' THEN 1 ELSE 0 END,
    doom = CASE WHEN genre LIKE '%Doom%' 
                OR genre LIKE '%Stoner%' 
                OR genre LIKE '%Sludge%' 
                OR genre LIKE '%Southern%' THEN 1 ELSE 0 END,
    electronic_industrial = CASE WHEN genre LIKE '%Electronic%' 
                                 OR genre LIKE '%Industrial%' THEN 1 ELSE 0 END,
    experimental = CASE WHEN genre LIKE '%Experimental%' 
                        OR genre LIKE '%Avant-garde%' THEN 1 ELSE 0 END,
    folk = CASE WHEN genre LIKE '%Folk%' 
                OR genre LIKE '%Viking%'
                OR genre LIKE '%Pagan%' THEN 1 ELSE 0 END,
    gothic = CASE WHEN genre LIKE '%Gothic%' THEN 1 ELSE 0 END,
    grindcore = CASE WHEN genre LIKE '%Grindcore%' 
                     OR genre LIKE '%Goregrind%' THEN 1 ELSE 0 END,
    groove = CASE WHEN genre LIKE '%Groove%' 
                  OR genre LIKE '%Southern%' THEN 1 ELSE 0 END,
    heavy = CASE WHEN genre LIKE '%Heavy%' THEN 1 ELSE 0 END,
    metalcore_deathcore = CASE WHEN genre LIKE '%Metalcore%' 
                               OR genre LIKE '%Deathcore%' THEN 1 ELSE 0 END,
    power = CASE WHEN genre LIKE '%Power%' THEN 1 ELSE 0 END,
    progressive = CASE WHEN genre LIKE '%Progressive%' THEN 1 ELSE 0 END,
    speed = CASE WHEN genre LIKE '%Speed%' THEN 1 ELSE 0 END,
    symphonic = CASE WHEN genre LIKE '%Symphonic%' 
                     OR genre LIKE '%Neoclassical%' THEN 1 ELSE 0 END,
    thrash = CASE WHEN genre LIKE '%Thrash%' THEN 1 ELSE 0 END;
SELECT * 
FROM processed_genres
LIMIT 10;
Displaying records 1 - 10
band_id band_name country genre status black death doom electronic_industrial experimental folk gothic grindcore groove heavy metalcore_deathcore power progressive speed symphonic thrash
0 A // Solution United States Crust Punk/Thrash Metal Split-up 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1 A 12 Gauge Tragedy United States Deathcore Split-up 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
2 A Balance of Power United States Melodic Death Metal/Metalcore Active 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
3 A Band Named Jon United States Brutal Death Metal/Grindcore Active 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
4 A Band of Orcs United States Death/Thrash Metal Active 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
5 A Baptism by Fire Canada Progressive Power Metal Active 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
6 A Bastard’s Breath United States Raw Black Metal Active 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 A Belt Above Black United States Melodic Death Metal/Metalcore Active 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
8 A Billion Limbs United Kingdom Groove Metal/Deathcore Active 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0
9 A Binding Spirit Germany Black Metal/Ambient Active 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3.A. Identify Records Not Belonging to a Subgenre

ALTER TABLE processed_genres ADD COLUMN total_subgenres INTEGER DEFAULT 0;
UPDATE processed_genres
SET total_subgenres = 
    black + death + doom + electronic_industrial + experimental + folk +  gothic + 
    grindcore + groove + heavy + metalcore_deathcore + power + progressive + speed + 
    symphonic + thrash;
SELECT COUNT(*) AS subgenreless_bands 
FROM processed_genres
WHERE total_subgenres = 0;
1 records
subgenreless_bands
403
SELECT 
    genre, 
    COUNT(*) AS number_of_bands
FROM processed_genres
GROUP BY genre
HAVING total_subgenres = 0
ORDER BY number_of_bands DESC
LIMIT 10;
Displaying records 1 - 10
genre number_of_bands
Post-Metal 118
Crossover 90
Various 73
Post-Metal/Rock 27
Crossover/Hardcore 8
Atmospheric Post-Metal 7
Grind ‘n’ Roll 6
Dark Ambient 6
Post-Metal/Shoegaze 3
Various, Shred 2

4. Save Table

The code blocks below save the TEMP table processed_genres as dc_processed_genres, with the dc_ prefix denoting the dummy coding-like structure of the subgenre columns. Dropping any pre-existing tables of this name ensures the new table is created, even if a table with the same name exists.

DROP TABLE IF EXISTS dc_processed_genres;
CREATE TABLE dc_processed_genres AS 
SELECT * 
FROM processed_genres;
dbDisconnect(con)