(logo for the 2001 thrash metal band python)

Pythonic Metal

Pydata Amsterdam 2017-04-09

Iain Barr


  • Counting and Natural Language
  • Comparing documents
  • Generating Lyrics


During this talk I'm going to talk about "Metal" as a genre. People have strong opinions about this. For the duration of this talk I am going to use the term "Metal" to refer to the music whose lyrics I am analysing. I apologise in advance.

The Data

  • Lyrics for over 200,000 songs scraped from www.darklyrics.com
  • Mostly english, but a whole range of languages in there
  • No plans for release

Counting and Natural Language

How can we Quantify Natural Language?


  • Order:
    • "Man bites Dog" vs "Dog bites Man"
  • Context:
    • "I want to kill everyone" vs "Don't say things like 'I want to kill everyone'"
  • Rare words


  • Ignore it all
  • Just count words

Word Frequencies


Why Does this look like it makes sense?

  • Implicitly we have a feeling for the relative frequency of words in the english language
  • Words like "death", "die", "blood" apprear more frequently then expected
  • Can we make this idea explicit?

Metal vs Brown

  • We to compare the frequency of words we see in metal lyrics to those "Standard" english
  • Use the Brown Corpus
  • Define "Metalness" of a word

    $$M = \log\left(\frac{f_{metal}}{f_{englsih}}\right) $$

Metal Words

Most Metal

Rank Word Metalness
0 burn 4.11
1 cries 3.93
2 veins 3.89
3 eternity 3.87
4 breathe 3.84
5 beast 3.84
6 gonna 3.84
7 demons 3.84
8 ashes 3.81
9 soul 3.71

Metal Words

Least Metal

Rank Word Metalness
10174 particularly -6.03
10173 indicated -6.01
10172 secretary -5.98
10171 university -5.85
10170 committee -5.85
10169 relatively -5.77
10168 approximately -5.59
10167 noted -5.42
10166 chairman -5.38

Most Metal Song - Infinite Darkness, Tormentor

After when the sun will die
After the lights
Evil souls of the dark will
Wake up and fight

Will be
Dark, War, Burn, Pain

After, when the wrong is right
Pain is nice
Total destructions winds
Blow, and blow

Will be
Dark, War, Burn, Pain

Storm, wild, thunder, infinite darkness...
Storm, wild, thunder, infinite darkness...
Storm, wild, thunder, infinite darkness...
Storm, wild, thunder, infinite darkness...

Metallica's Emotional Arc


Harry Potter Emotional Arc


Comparing Documents

Raw Counts

Raw Counts




Identifying "Important" Words

  • Different, but not that different
  • Common words are still very common
  • How can we amplify the differences?
  • One possibility is the Likelihood Ratio: $$ L_{w} = N_{w}\log{\frac{N_{w}}{E_{w}}} + \bar{N}_{w}\log{\frac{\bar{N}_{w}}{\bar{E}_{w}}} $$






  • Similar to likelihood ratio, but implemented in scikit-learn

TF-IDF Orgasmatrion

i am the one orgasmatron
the oustreched grasping hand
my image is of agony
my servants rape the land
obsequious and arrogance
clandestine and pain
two thousend years of misery
of torture in my name
hypocrisy made paramount
paranoia the law
my name is called religion

i twist the truth
i rule the world
my crown is called deceit
i am the emperor of lies
you grovel at my feet
i rob you and i slaughter you
your downfall is my gain
and still you play the sycophant
and rebel in your pain
and all my promises are lies
all my love is hate
i am the politician
and i decide your fate

i march before a martiant world
an army for the fight
i speak of great heroic days
of victory and might
i hold a banner drenched in blood
i urge you to be brave
i lead you to your destiny
i lead you to your grave
your bones will build my palaces
your eyes will stud my crown
for i am mars the god of war
and i will cut you down


Band Nearby Bands Nearby Songs Important words
motorhead motorhead, alicecooper, helloween Life's A Bitch, Desperate For You, Name In Vain don't, know, ain't
slayer slayer, hypocrisy, testament Black Magic (Live), Black Magic, Black Magic death, blood, life
carcass carcass, cannibalcorpse, archenemy Pyosisified (Rotten To The Gore), Pyosified (Still Rotten To The Gore), Malignant Defecation flesh, pus, septic


Where this Fails

  • Covers songs
  • Spare Vectors
  • No notion of Synonyms
  • Much more sophisticated (document -> vector) tools exist
    • LSA
    • LDA
    • Word2Vec/doc2vec/etc
    • Deep [whatever]

Generating Metal

How to Generate Natural language

How we might think humans "generate" lyrics:

  • Start with an idea we wish to communicate
  • Generate Natural Language from it
  • Impose song structure constraints
  • Iterate till it is "good"


  • What does this high level representation space even look like?


  • Ignore it all
  • Just count words

Language as Probability

  • Think of language not as a representation of some higher level space, but simply a probabiliy distribution over tokens:
$$ P(w_{1}w_{2}w_{3}...)$$
  • To "generate" language, we just need to create a representation for this distribution, and then sample from it
  • Problems:
    • Exponentially large space of possible documents
    • Sequences have variable length
    • Limited training data

Bayes Theorem to the rescue

Repeatly apply Bayes theorem to write sequence probability as the product of the probability of a token, given all previous tokens

$$ P(w_{0}w_{1}w_{2}...) = P(w_{n}|w_{0}w_{1}...w_{n-1})P(w_{0}w_{1}...w_{n-1})$$$$ = P(w_{n}|w_{0}w_{1}...w_{n-1})P(w_{n-1}|w_{0}w_{1}...w_{n-2})...P(w_{0})$$$$ = \prod^{n}_{i=0} P(w_{i}|w_{0}...w_{i-1})$$

Markov Assumption

History only exists for $k$ previous tokens:

$$P(w_{n}|w_{0}...w_{i-1}) \approx P(w_{n}|w_{n-k}...w_{n-1})$$


$$P(w_{n}|w_{n-3}...w_{n-1}) = \frac{\#(w_{n}w_{n-1}w_{n-2})}{\#(w_{n-1}w_{n-2})}$$

Markov Metal Machine

Markov Metal Machine - 4 word context example

(coloured by existance in trained text)

legion of the damned

ow it's over
now it's over
and i
t just ain't right. it just ain't right hey man, look at me, look at me
you know i
t won't last too long hey kid, you're the toast of the town

what did i ever do to you
hat makes you fall much higher
you plant a d
emon seed
you raise a flower of fire.
we see them n
ow in picture books
built by you on belfast docks
the greatest ships the world has ever seen.

sing this hymn of victory, it will be y
our last
arise the war cry-
like it will be your last [END]

Going Deeper

  • This is really good for unsupervised text generation
  • Compare with the output of recurrent neural networks trained on the same data

Going Deeper

(coloured by existance in trained text)

far away from the sky
here we will be forgotten
the fi
nal beauty for a way
this enemy of sun
and the l
and of a blood

we are w
hat it is a life of hope
the blood of the world we start to deceive
the wo
rld is what i feel in the wind

a shadow o
f the same road
the air o
f the walls
the s
un was never a sin
babylon in the start
and the c
ult of the time

behind t
he voice of the fucking slaughters
with the things that w
e deny
the s
un is the same
with the stre
ngth of lies

soul and the dea

The End

"Formulas to stop your heart
And eradicate your soul
I will raise you from the ground
Strenghtened by the Python God"

Deathwish, Usurper