0%

Melodies Are Just Math

两周前我在 YouTube 上看到一个好玩的视频,讲的是一个斜杠中年 Damien Riehl 用计算机程序 6 天暴力跑完 C 调所有音乐旋律的事情。Damien Riehl 是一个特别有意思的老哥,他从 2002 年以来一直是一名律师,从 1985 年开始从事编码工作;与此同时,他还是一个音乐家。天…这不是我一直想要成为的样子吗,So Cool !

在这个视频中,包括他做的一个 TED 演讲,Damien Riehl探讨了人工智能时代音乐与版权等值得思考与讨论的问题。这里是「朝花夕拾」第八期,作为一个音乐小白兼程序猿,我尝试来理解他们到底做了什么。另外,作为一个法盲,也尝试聊一聊音乐版权中的法律问题。

Get Hands Dirty !

令人开心的是,Damien Riehl已经把他们工作的代码传到了 GitHub。看了看,代码使用 Rust 写的,除了一个简单的命令行工具外,还写了一个专门的库 libatm。对于程序猿来说,不怕新语言,就怕没代码。不就是 Rust 吗,看看教程, 就可以开始干活了(正好一直想学一学 Rust :)

拿到代码,可以看到 client 的代码简单,只是对于 library 的简单封装。(谁说不是呢,一个命令行要多复杂)

1
2
houmin@cosmos:~/atm-cli/src$ ls
cli.rs directives.rs lib.rs main.rs utils.rs

看看 library,实际上也只有一个简单的 lib.rs,参考他的文档可以看到这就是一个专门针对 MIDI 文件的库。它在这里定义了 MIDI 的文件格式,定义了音符,定义了音符序列,定义了音轨…

什么是 MIDI,Why MIDI?

About MIDI

MIDI,Musical Instrument Digital Interface,乐器数字接口 ,是20 世纪80 年代初为解决电声乐器之间的通信问题而提出的。MIDI是编曲界最广泛的音乐标准格式,可称为「计算机能理解的乐谱」。它用音符的数字控制信号来记录音乐。

一首完整的MIDI音乐只有几十KB大,而能包含数十条音乐轨道。几乎所有的现代音乐都是用MIDI加上音色库来制作合成的。MIDI 传输的不是声音信号, 而是音符、控制参数等指令, 它指示 MIDI 设备要做什么,怎么做,如演奏哪个音符、多大音量等,它们被统一表示成 MIDI消息 (MIDI Message) 。

MIDI File Format

MIDI 文件是由一系列的 chunk组成的,每个 chunk 的组成格式如下:

type length data
4 bytes 4 bytes length bytes

这里的 Chunk Type有两种类型:

  • Header Trunks,也就是 MThd类型
  • Track Chunks,也就是 MTrk类型

那么一个 MIDI 文件组成如下,开始是一个 Header Trunk,之后跟随着一个或多个Track Chunks

1
2
3
4
5
6
7
Format|  type  |  length  |                        Data                        |
-------------------------------------------------------------------------------
REAL | MThd | 6 | <format> | <tracks> | <division> |
MIDI | MTrk | <length> | <delta_time> <event> ... |
FILE | : |
DATA | MTrk | <length> | <delta_time> <event> ... |
-------------------------------------------------------------------------------

Header Chunk

这里是 MIDI Header的组成

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/// MIDI file header
///
/// Unlike the [MIDITrackHeader](struct.MIDITrackHeader.html), this structure is
/// specified in the official MIDI spec (as "Header Chunk"), though the last three 16-bit
/// fields are simply referred to as "Data". For a more detailed discussion of the
/// Header Chunk, see section 2.1 of the document here:
/// <https://www.cs.cmu.edu/~music/cmsip/readings/Standard-MIDI-file-format-updated.pdf>.
#[derive(Clone, Debug, PartialEq)]
pub struct MIDIHeader {
pub chunk_type: Vec<u8>,
pub length: u32,
pub format: u16,
pub tracks: u16,
pub division: u16,
}

对着几个参数依次分析

  • format
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/// MIDI file format
///
/// MIDI files have three different formats: 0, 1, and 2. Format 0 means the MIDI file
/// has a single track chunk, whereas formats 1 and 2 indicate one _or more_ track chunks.
/// A longer discussion of these formats can be found in section 2.2 of the document here:
/// <https://www.cs.cmu.edu/~music/cmsip/readings/Standard-MIDI-file-format-updated.pdf>.
#[derive(Clone, Copy, Debug)]
pub enum MIDIFormat {
/// Single track.
Format0,
/// One or more simultaneous tracks.
Format1,
/// One or more independent tracks.
Format2,
}
  • tracks

    指在这个 MIDI 文件中有多少个 Track chunks

  • division

    指定基本时间格式,根据最高位的不同,这两个字节有不同的含义。

    1
    2
    3
    |     bit 15    |    bits 14 thru 8         |        bits 7 thru 0      |
    | 0 | ticks per quarter-note |
    | 1 | negative SMPTE format | ticks per frame |
    • bit 15 = 0:

      • bits 0-14
        number of delta-time units in each a quarter-note.
    • bit 15 = 1:

      • bits 0-7
        number of delta-time units per SMTPE frame

      • bits 8-14
        form a negative number, representing the number of SMTPE frames per second. Valid values correspond to those in the MTC Quarter Frame message.

        1
        2
        3
        4
        -24 = 24 frames per second
        -25 = 25 frames per second
        -29 = 30 frames per second, drop frame
        -30 = 30 frames per second, non-drop frame

Track Chunk

对于每一个 Track Chunk也是一样,具有和上面相同的 Track Header

1
2
3
4
5
6
7
8
9
10
11
/// MIDI track chunk header
///
/// Encapsulates the chunk type ('MTrk') and the length
/// of a MIDI track chunk. The official MIDI spec does
/// not refer to these data as the truck chunk header, this
/// library simply makes the distinction for ease of use.
#[derive(Clone, Debug, PartialEq)]
pub struct MIDITrackHeader {
pub chunk_type: Vec<u8>,
pub length: u32,
}

紧接着 Track Header就是音轨实际的数据,是由一系列的 MTrk event组成,写成公式表达如下(+ means one or more)

1
<Track Chunk> = <chunk type><length><MTrk event>+

MTrk event的组成很简单,由delta timeevent组成。

1
<MTrk event> = <delta-time><event>

<delta-time> is stored as a variable-length quantity. It represents the amount of time before the following event. If the first event in a track occurs at the very beginning of a track, or if two events occur simultaneously, a delta-time of zero is used. Delta-times are always present. (Not storing delta-times of 0 requires at least two bytes for any other value, and most delta-times aren’t zero.) Delta-time is in some fraction of a beat (or a second, for recording a track with SMPTE times), as specified in the header chunk.

1
<event> = <MIDI event> | <sysex event> | <meta-event>

这里的 event分为三种类型:

  • MIDI event:指 MIDI 通道信息,也是描述 MIDI 音乐的主要部分,参见下小节
  • Sysex event:指系统控制信息,参见这里
  • Meta event:主要记录一些元信息,包括版权信息、歌词、乐器名等,参见这里

Put Things Together

MIDI文件格式

MIDI Messages

MIDI Messages可以分为两大类,专门用于某个特定 MIDI 通道的消息称为Channel Messages,而有些消息会影响到整个MIDI 系统(或者至少整个 MIDI 设备),称作 System Messages,系统消息不会和某个特定通道有关系。更加详细的可以分为以下几类:

不管分类如何,所有的 MIDI 消息组成的结构都一样:

  • A Status Byte,一个状态字节
    range: 0x80..FF
  • Zero or more Data Bytes,0 或多个数据字节
    range: 0x00..7F
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/// MIDI channel voice message
///
/// MIDI supports two main types of messages: Channel and System.
/// Channel messages are tied to a specific MIDI channel, whereas
/// System messages are not (and thus don't contain a channel number).
/// This library only supports channel messages, and more specifically
/// the `NoteOn` and `NoteOff` channel _voice_ messages,
/// which actually produce sounds. For a detailed explanation of
/// MIDI messages, see appendix 1.1 of the document here:
/// <https://www.cs.cmu.edu/~music/cmsip/readings/Standard-MIDI-file-format-updated.pdf>.
#[derive(Clone, Copy, Debug, PartialEq)]
pub struct MIDIChannelVoiceMessage {
pub delta_time: u8,
pub status: u8,
pub note: u8,
pub velocity: u8,
}

下图是典型的 Channel Voice Message的例子,更多内容可以参考这里

MIDI Channel Voice Messages

根据代码,我们可以一一对应起来,比如这里的 NoteOn就是 0b1001(0b 是二进制表示)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/// MIDI message status
///
/// Each MIDI event (message) has a status, which sets the message type and thus the meaning
/// of the associated message data. Technically the status bits also include the channel number,
/// but this library currently only supports single track, single channel MIDI files (and thus
/// defaults to channel 0). For a detailed description of each status type, see Appendix 1.1 of the document here:
/// <https://www.cs.cmu.edu/~music/cmsip/readings/Standard-MIDI-file-format-updated.pdf>.
#[derive(Clone, Copy, Debug)]
pub enum MIDIStatus {
/// Assume status bytes of previous MIDI channel message
RunningStatus = 0b0000,
/// Note released
NoteOff = 0b1000,
/// Note pressed
NoteOn = 0b1001,
/// Pressure on key after pressed down
PolyphonicAftertouch = 0b1010,
/// Controller value change
ControlChange = 0b1011,
/// Change program (patch) number
ProgramChange = 0b1100,
/// Greatest pressure on key after pressed down
Aftertouch = 0b1101,
/// Chainge pitch wheel
PitchWheelChange = 0b1110,
}
  • MIDI Note

在上图中我们可以看到,MIDI 消息的数据中有一个 kk字段,说的是按下的 Key,也就是我们通常说的音符 Note。这里是按照每个八度 Octave一组进行排列,其中 Middle C,也就是 C4 的序号是 60(十进制)

Octave # C C# D D# E F F# G G# A A# B
-1 0 1 2 3 4 5 6 7 8 9 10 11
0 12 13 14 15 16 17 18 19 20 21 22 23
1 24 25 26 27 28 29 30 31 32 33 34 35
2 36 37 38 39 40 41 42 43 44 45 46 47
3 48 49 50 51 52 53 54 55 56 57 58 59
4 60 61 62 63 64 65 66 67 68 69 70 71
5 72 73 74 75 76 77 78 79 80 81 82 83
6 84 85 86 87 88 89 90 91 92 93 94 95
7 96 97 98 99 100 101 102 103 104 105 106 107
8 108 109 110 111 112 113 114 115 116 117 118 119
9 120 121 122 123 124 125 126 127

对应在代码中:

1
2
3
4
5
6
7
8
9
10
11
12
/// MIDI note
///
/// Represents key on a piano, combining a [note type](enum.MIDINoteType.html)
/// with an octave. For example, middle C would be represented as
/// `MIDINote { note_type: MIDINoteType::C, octave: 4 }`. For a detailed table
/// of MIDI notes and octave numbers, see document here:
/// <https://www.cs.cmu.edu/~music/cmsip/readings/Standard-MIDI-file-format-updated.pdf>.
#[derive(Clone, Copy, Debug, PartialEq)]
pub struct MIDINote {
pub note_type: MIDINoteType,
pub octave: u32,
}
  • Program Change

在上面的 status 中有一个 Program Change,说的是改变乐器。

Program Change

在 MIDI 中,对于不同的乐器也有相应的数字定义,这里简单列出了几个 Family 的序列范围。

PC# Family PC# Family
1-8 Piano 65-72 Reed
9-16 Chromatic Percussion 73-80 Pipe
17-24 Organ 81-88 Synth Lead
25-32 Guitar 89-96 Synth Pad
33-40 Bass 97-104 Synth Effects
41-48 Strings 105-112 Ethnic
49-56 Ensemble 113-120 Percussive
57-64 Brass 121-128 Sound Effects

MIDI File

这里我们定义了 MIDIFile类:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/// MIDI file representation
///
/// MIDI files can be complex, allowing for any number of tracks with
/// different notes and instruments playing simultaneously. This library
/// was created for the express purpose of brute-forcing melodies, and thus
/// only supports a subset of the official MIDI standard. More specifically,
/// this class is optimized for creating the smallest possible single track MIDI
/// files.
#[derive(Clone, Debug)]
pub struct MIDIFile {
/// Sequence of notes ([MIDINoteSequence](struct.MIDINoteSequence.html)) from which the track chunk is generated
pub sequence: MIDINoteSequence,
/// Format specification (should always be [MIDIFormat::0](enum.MIDIFormat.html#variant.Format0))
pub format: MIDIFormat,
/// Number of tracks in MIDI file (should always be `1`)
pub tracks: u16,
/// Number of ticks to represent a quarter-note (recommended to use `1`)
pub division: u16,
}

通过输入音符序列,我们即可生成对应的 MIDI 文件。比如这里输入序列为C:4,D:5,CSharp:8,DSharp:3

1
2
3
4
5
6
7
8
9
10
11
12
let mfile = libatm::MIDIFile::new(
libatm::MIDINoteSequence::new(vec![
libatm::MIDINote::new(libatm::MIDINoteType::C, 4),
libatm::MIDINote::new(libatm::MIDINoteType::D, 5),
libatm::MIDINote::new(libatm::MIDINoteType::CSharp, 8),
libatm::MIDINote::new(libatm::MIDINoteType::DSharp, 3),
]),
libatm::MIDIFormat::Format0,
1,
1,
);
assert_eq!("607410951", mfile.gen_hash());

Brute Force

到现在为止,我们对 MIDI 文件已经有了很深的理解,可能关于时序方面还不是很了解,但是我们知道这些参数足以完整的描述一个旋律,接下来我们看如何遍历所有的旋律。

暴力枚举,那么我们需要知道总共有多少种可能,怎么判断呢?下面的公式

对于一个钢琴而言,总共有 88 种音高,对于一个有 12 个音符的旋律来说,总共有 $88^{12}$ 种可能。这意味着什么?即使是每个旋律只占 1 个字节的话,这将会占据 186 ZB,是不可能将所有旋律都存储在一个硬盘上的。

那么,我们来看看Damien Riehl是如何做到的:

Now you might say what consititutes a melody ?

We were initially going to take the entire piano keyboard and to do the all at the entire piano keyboard. But let’s focus on the vocal range which is actually 2 octaves and we thought actually we’re thinking about pop music, which is the only thing that makes money that people sue over, doesn’t go to octaves. It goes a single octave so that’s what we landed on eight notes.

Now you might say how may notes consititutes a melody ?

So we looked at musicologists…and we landed at twelve notes.

Damien Riehl做了一个很巧妙的范围缩小,通过将研究目标锁定到人可歌唱的流行音乐,我们得出结论

  • 一个旋律的每个音符通常只有 8 种选择,比如 C 大调,do re mi fa so la ti do
  • 只需要 12 个音符即可组成几乎所有的旋律。
  • 那么在这种情况下,有 $8^{12} = 68.7 million$ 种可能。

基于这个认知,我们就可以暴力枚举了。

下面这段代码很简单,就是遍历所有的 Sequence,对于每一个 Sequence生成 MIDI 文件,同时写到压缩包中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
pub fn atm_batch(args: BatchDirectiveArgs) {
// Initialize progress bar and set refresh rate
let mut pb = pbr::ProgressBar::new(args.max_count as u64);
pb.set_max_refresh_rate(Some(std::time::Duration::from_millis(args.update)));
// Initialize output archive
let mut archive = crate::utils::BatchedMIDIArchive::new(
&args.target,
args.partition_depth,
args.max_files,
args.partition_size,
args.batch_size,
);
// For each generated sequence
for (idx, notes) in crate::utils::gen_sequences(&args.sequence.notes, args.length).enumerate() {
// if reached max count, finish
if idx == args.max_count {
archive.finish().unwrap();
break;
}
// Clone libatm::MIDINoteSequence from Vec<&libatm::MIDINote>
let seq = libatm::MIDINoteSequence::new(
notes
.iter()
.map(|note| *note.clone())
.collect::<Vec<libatm::MIDINote>>(),
);
// Create MIDIFile from libatm::MIDINoteSequence
let mfile = libatm::MIDIFile::new(seq, libatm::MIDIFormat::Format0, 1, 1);
// Add MIDIFile to archive
archive.push(mfile).unwrap();
// Increment progress bar
pb.inc();
}
// Stop progress bar
pb.finish_println("");
// Finish archive if not already finished
if let crate::utils::BatchedMIDIArchiveState::Open = archive.state {
archive.finish().unwrap();
}
}

因此,这里的 gen_sequence是关键。下面这段代码说明了它的原理:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/// Generate all permutations (with replacement) of given length
/// from the given sequence of MIDI notes.
///
/// # Arguments:
///
/// * `notes`: sequence of MIDI notes (see: [libatm::MIDINote](../../libatm/struct.MIDINote.html))
/// * `length`: length of sequences to generate
///
/// # Examples
///
///
/// // Create MIDI note sequence
/// let sequence = "C:4,C:4,D:4,E:4,F:4,G:5".parse::<libatm::MIDINoteSequence>().unwrap();
/// // Create iterable over all permutations, which in this example would be
/// // 6^8 = 1,679,616 instances of `Vec<&libatm::MIDINote>`.
/// let permutations = atm::utils::gen_sequences(&sequence.notes, 8);
///
pub fn gen_sequences(
notes: &[libatm::MIDINote],
length: u32,
) -> itertools::MultiProduct<std::slice::Iter<libatm::MIDINote>> {
(0..(length))
.map(|_| notes.iter())
.multi_cartesian_product()
}
  • 输入是一段 C:4,C:4,D:4,E:4,F:4,G:5序列,这里一共有 6 个 Note,说的是我们遍历的时候所有可以的选择
  • 然后我们会传递一个 length 参数,比如这里传递的参数是 8
  • 输出是一个有 8 个音符的旋律,每个音符可以有 6 种选择,这 6 种来自 C:4,C:4,D:4,E:4,F:4,G:5序列
  • 那么,我们可以产生 $6^8$ 种遍历可能

至此,我们已经搞懂了Damien Riehl到底做了什么。其实做法很简单,就是用 Rust 写了一个 MIDI 库,然后遍历了所有流行歌曲可能产生的旋律。但是这么简单的过程却给我们揭示了一个道理,旋律创作在这里只是一个数学问题

Maybe melodies are just math which is just facts, which maybe are not copyrightable. If somebody is suing over a melody alone, not lyrics, not recordings, but just melody alone, maybe those cases go away.

到现在,我们还只是粗暴的遍历进行旋律创作,没有涉及到编曲等其他进一步的创作。这几年随着深度学习的进一步发展,已经有很多人开始把眼光投到了音乐生成这个领域。比如这篇论文 KDD 2018 Research Track 最佳学生论文详解:流行音乐的旋律与编曲生成 就把深度学习技术运用到了音乐生成这个领域。这个话题先停留在这里,或许以后会再回来填这个坑吧 :)

Post Script

两周前看到 Damien Riehl那个视频的时候,就对这个话题非常感兴趣。且先不说版权这个话题(这是另外一个要填的坑),就只是用计算机编曲这个事情就足以让我兴奋了。如果再加上人工智能编曲,那又是另外一个深坑了。大一的时候在学校听过一个讲座,一个帅气美国小哥讲了他用人工智能创作音乐的故事(好久远啊,好像是那次信科学院本科生科研成果展示会?可是当时自己对这些完全没有概念)。

过去的两个星期我尝试去学习编曲,对着B站上一个台湾小哥的 GarageBand 教程 玩起了 GarageBand。在这个过程中,我越来越发现这是一个超级深的坑 :)

编曲技能树

这是我在知乎上看到的一张图,还能怎么办呢 :)

慢慢来吧,作曲编曲,也许有一天,我能够自己创作出自己的歌曲呢 👀

Reference