MelGAN(Mel-spectrogram Generative Adversarial Network)
MelGAN is a type of Generative Adversarial Network (GAN) designed for speech synthesis. It was proposed by Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, and Yoshua Bengio in a 2019 paper titled “MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis.”
Unlike other speech synthesis methods that operate on spectrograms, MelGAN operates directly on the raw waveform of speech signals. It takes as input a mel spectrogram, which is a compact representation of the spectral envelope of a speech signal, and generates a high-quality waveform signal that closely matches the input spectrogram.
MelGAN is a conditional GAN, meaning that it is trained to generate waveform signals conditioned on a given input spectrogram. The model is trained using a two-player adversarial game, in which a generator network learns to synthesize high-quality waveform signals, and a discriminator network learns to distinguish between the generated signals and real signals from the training data.