An “ex-physicist currently working as a data scientist,” the author of Degenerate State, who only goes by Iain did a very cool thing. He scraped to discover what the most and least metal words in the game, which has a lot. The site includes the lyrics to 222,623 songs by 7,364 bands across 22,314 albums.

Just because it’s metal, do not underestimate this whole thing’s legitimacy. Here’s an excerpt from his process, evolving from the metal word cloud you see above.

As users of the English language, we can pick out “interesting” words just by eyeballing the word cloud. We get a feeling for the content of the documents just by prominence of the words “night”, “pain” and “death”. The word “see” also appears prominently in the word cloud, but isn’t as interesting as the word “hell” or “blood”, which appear less frequently in the lyrics. My feeling is that our prior experience of the English language guides our search. We know that the word “see” is usually common, but the words “death” are not, so we interpret it as important. Can we quantify this feeling?

One approach might be to look at how the relative frequency of words change between the metal lyrics and the English language in general. To do this we need some sort of measure of what “standard” English looks like, and given I’m using NLTK for text processing, an easy comparison is to the brown corpus, a collection of documents published in 1961 covering a range of different genres (although it should be pointed out, no lyrics).

From there, Iain devised a plan to filter out “stop words” and devised a legitimate scientific process to figure out the “metalness” of words. The equation looks like this:

Here is the beautiful result.

Most Metal Words

Rank Word Metalness

1 burn 3.81

2 cries 3.63

3 veins 3.59

4 eternity 3.56

5 breathe 3.54

6 beast 3.54

7 gonna 3.53

8 demons 3.53

9 ashes 3.51

10 soul 3.40

11 sorrow 3.40

12 sword 3.38

13 goodbye 3.28

14 dreams 3.28

15 gods 3.24

16 pray 3.22

17 reign 3.15

18 tear 3.12

19 flames 3.12

20 scream 3.11

Least Metal Words

Rank Word Metalness

1 particularly -6.47

2 indicated -6.32

3 secretary -6.29

4 committee -6.16

5 university -6.09

6 relatively -6.08

7 noted -5.85

8 approximately -5.75

9 chairman -5.69

10 employees -5.67

11 attorney -5.66

12 membership -5.64

13 administrative -5.61

14 considerable -5.60

15 academic -5.51

16 literary -5.49

17 agencies -5.48

18 measurements -5.47

19 fiscal -5.45

20 residential -5.45

Go here to hear out the fascinating methodology.

[H/T Boing Boing]