[Technology and products] Music AI technology and digital singer Vivizen from Metabuild
Currently, AI is developing around a
Generative AI that captures and learns various types of patterns from a
discriminative model that distinguishes data by finding and learning key
patterns from data.
Generative AI is a technology that
generates new data with features similar to the original by learning the
characteristics of various contents included in the data through non-subject
learning. This Generative AI is drawing particular attention in that it is
possible to create digitized creations in various art fields such as
literature, art, and music.
Under the theme of "2020 Top Strategic
Technology Trends for 2022," Gartner introduced 12 strategic technology
trends for growth, digitalization, and efficiency. Among them, Generative AI
was selected as a major technology to expand digitalization over the next three
to five years. Gartner predicts that digital data generated by Generative AI
will expand from less than 1% at present to more than 10% in 2025.
In line with this trend of AI paradigm
shift and future strategic technology trend, Metabuild made great achievements
in the field of Generative AI in 2021. Metaubild has recently developed AI
multi-tonal vocal technology that can be used in music fields such as K-Pop,
which is hitting the world.
The technology called "MAI VOCAL"
developed by Metabuild is a kind of Singing Voice Synthesis (SVS) system that
consists of two models, Acoustical Model and Vocoder Model, to imitate the
tones of various singers and generate natural and high-quality singing voices.
The Acoustic Model performs the function of
generating Mel-Spectrogram, which is the size information of frequency
components, by inputting the duration of lyrics, notes, and notes. It was
developed based on the FastSpeech AI model that guarantees robustness against
continuous words and spaces and controls voice speed and rhyme.
The Vocoder Model performs the function of
generating a singing sound source waveform by inputting Mel-Spectrogram, the
output of the Acoustic Model, and was developed based on the HiHi-GAN AI model
with faster synthesis and learning speed than other approaches and good quality
of synthetic speech.
In order to acquire and learn the voice
data of singers required by MAI VOCAL, Metabuild acquired the voices of 92
singers divided into various voice characteristics according to age, gender,
tone, and genre through the Artificial Intelligence Learning Data Construction
Project of the Korea National Information Society Agency. In addition, 4,000
songs in which lyrics and MIDI information corresponding to the start and
duration of the singer's pronunciation were labeled as notes were collected and
processed from the acquired voices to construct vocal data for artificial
intelligence learning. Based on the established vocal learning data, the AI
multi-tonal vocal system was developed that can synthesize 100 vocal voices.
The AI multi-tonal vocal system developed
by Metabuild is characterized by various vocal performances such as K-Pop
dance, ballad songs, and children's songs in various tones for men and women in
their teens and 50s.
Metabuild's MAI VOCAL system continues to
evolve by learning songs and tones from various genres 24 hours a day on its
music platform cloud.
In addition, Metabuild and Chilloen created
virtual AI singer "Vivizen" with the concept of a clear tone of a
woman in her early 20s among the singing tones that can be synthesized through
the MAI VOCAL system. Vivizen was produced through a sophisticated 3D modeling
process by collecting and analyzing various 2D images of women in their early
20s. From the initial planning stage, considering the characteristics of
virtual digital singers, it was designed to naturally express the mouth shape,
emotional expression, and body movements when singing. In addition, the whole
body was produced through 3D modeling, enabling the expression of dance with
choreography through motion capture technology (Riggling) for joint movement.
Metabuild plans to develop AI Singer Vivizen, applied with AI vocal technology, into AI digital human that is active in the metaverse, including social media activities, virtual singer activities, advertising, through Chilloen. In addition, while continuously developing AI multi-tonal vocal technology, it plans to develop additional various AI models such as AI composition/compilation to lead Generative AI technology that can be used as a service in the music field.
Published: 31 Dec, 2021