Within communication, certain methods of summarization and translation suffer less ambiguity, edge-case-blurriness, and data loss than other types. One form of translation especially vulnerable to harm is the translation of music to word, or word to music. Why? Because music is generally a medium with an extremely low level of literary content.
No musical configuration (barring words, which I will return to), can portray a human being in, say, the way a painting or drawing could. This is not a bug nor a fault - this is built into the strength, foundation, and identity of the medium. Alongside this profound abstraction and disconnection from literal portrayals leads to the medium (music) with perhaps the widest set of valid interpretations. This intentionally ambiguous feature of music leads to many difficulties in speaking about it.
Even the most verbose music critic will find difficulty in accurately portraying the subjective emotional or artistic appeal of a song. Too often they fall into the same problems as the burgeoning academic: unable to contribute an original thought on the piece, they fill pages with references they’ve spotted, noted influences, or pure quotation. We dance all around the social conditions and biographical notes of the designer rather than addressing the work itself.
That’s the professional, that’s the so-called “musically educated!”: Forget the layman! - Could you count on two random people agreeing on what counts as music? Could two random people accurately agree on the genre of a certain song? Or, without recognition and pointing, describe what certain genres consist of? Or, succinctly elaborate about what it is they like and dislike about a piece? Ears certainly function, but not at the resolution that our eyes or mouths do. Have you ever read a musical review before listening to it yourself? Have you ever asked a person to describe their musical taste?
I say all this to contend that I am very skeptical that the use of an LLM AI (Language Learning Model Artificial Intelligence) for the creation of complete songs will ever result in high quality, emotional, or stylistically interesting music. That is, due to the low level of literary transmission in music, I’m skeptical that the description of audio from human language can result in very interesting complete songs.
Before I continue on that assertion, I cannot ignore the human voice. The human voice has a unique instrumental quality (alongside synthesized equivalents, ex. Vocaloids https://www.vocaloid.com/en/ ), through the ability to say words. No other instrument can better, if at all, transmit literary content.
Of course, not all music has lyrics. Additionally, to reduce lyrical content to pure prose is to do a great disservice to the specific sort of design process lyrical-writing entails. What sounds interesting and impactful as a lyric might, within the context of other textures and instruments, completely deflate when isolated.
There’s also world music: there’s enough minutes to fill your life with albums and songs sung in tongues you cannot understand. Does your lack of understanding of every language kill the appeal of music in other languages? No - it simply changes your experience. (And if it does, perhaps you like the literary aspects of song more than the musical ones… that’s okay. Or that you haven’t listened to enough world music to find something to your taste).
Although, I don’t even need to go that far. Can you understand every word in every song you hear, even in your own language? Even among the artists you are most familiar with?
I’m quite interested in Sigur Rós’s album - Ágætis Byrjun. I won’t disservice the music by attempting to explain the emotional appeal, but I will tell you that although there were specifically written lyrics (which are memorized, transcribed, and repeated in live performances), they are not from any spoken language; they are gibberish. I reference this band to further hone the separation of the musical content, emotional appeal, and literary content.
We are at a point in which we have mapped a great deal of the qualities of audio with words. Several elements of audio, (volume, texture, chords, instrument, distance from recording source, recording method, compression, room-reverb/size, for example) are capable of being recorded and accurately transmitted, but the totality of a song, and its subjective experience, especially for an untrained ear, is likely impossible to communicate.
And I should hope that it remains this way - that this should mean the power and magic of music shall remain.
This is the entire purpose of an art-form, no? It is a border of communication, the tipping-point or moment when an idea becomes expressible only within the context of its own limitations. Thus, I am very skeptical that an LLM can be used to design or describe a song in any significant way. I contend that AI might have some use in music production and sound design.
I also find that in all sorts of design, about half of the ideas that make the end product are found within implementation. Ideas arrive not as a result of preconception, but as a result of chance possibility and exploration. This entire opportunity to create something of higher quality is lost while prompting.
Even in the age of electronic music production and internet-based distribution, part of the joy of music is in its in-person creation and performance, and in its inherent indescribability through words. To ignore that is to amputate the medium. Enjoying and creating with others is not something that can be emulated or replaced any time soon!