AIでリアルな音楽生成「Jukebox」

論文へのリンク

Jukebox: A Generative Model for Music

筆者・所属機関

Prafulla Dhariwal * 1, Heewoo Jun * 1, Christine Payne * 1, Jong Wook Kim * 1, Alec Radford * 1 ,Ilya Sutskever * 1

1 OpenAI, San Francisco. Correspondence

投稿日付

2020/04/30

概要（一言まとめ）

　VQ-VAEや複数のSOTAの手法を組み合わせ、膨大な計算リソースを用いてAIでリアルな音楽を生成。

手法の概要

　3種類の異なる解像度に圧縮して、それぞれVQ-VAEにかけて、中間表現を獲得

f:id:karaage:20200503215905p:plain

　複数のSOTAの手法を組み合わせて、歌詞の抽出・歌詞の位置特定などを実施。さらに、膨大な計算資源で学習している。

　学習に関しては、billion parametersや2 weeksとか4 weeksという凄い数字が出ている。

The upsamplers have one billion parameters and are trained on 128 V100s for 2 weeks, and the top-level prior has 5 billion parameters and is trained on 512 V100s for 4 weeks. We use Adam with learning rate 0.00015 and weight decay of 0.002. For lyrics conditioning, we reuse the prior and add a small encoder, after which we train the model on 512 V100s for 2 weeks.

　白金興業FMで知ったもの。こりゃ凄いな…としか言えない。以前、Deep JazzというJazzのMIDIをベースにRNNでJazzを自動生成とかするソフトあったけど、あれとはまた全然レベルが違う。