In this topic we will discuss about a self-supervised speech pre-training method called TERA (Transformer Encoder Representations from Alteration). Recent approaches often learn through the formulation of a single auxiliary task like contrastive prediction, autoregressive prediction, or masked reconstruction. Unlike previous approaches, a multi-target auxiliary task is used to pre-train Transformer Encoders on a large amount of unlabeled speech.
The model learns through the reconstruction of acoustic frames from its altered counterpart, by using use a stochastic policy to alter along three dimensions: temporal, channel, and magnitude. TERA can be used to extract speech representations or fine-tune with downstream models. We evaluate TERA on several downstream tasks, including phoneme classification, speaker recognition, and speech recognition. TERA achieved strong performance on these tasks by improving upon surface features and outperforming previous methods.
Through alteration along different dimensions, the model learns to encode distinct aspects of speech. So far, transformers have proven their worth in dealing with discreet data such as words and mathematical symbols. “It’s easy to train a system like this because there is some uncertainty about which word could be missing but we can represent this uncertainty with a giant vector of probabilities over the entire dictionary, and so it’s not a problem,” LeCun says.
LeCun’s favored method to approach supervised learning is what he calls “latent variable energy-based models.” The key idea is to introduce a latent variable Z which computes the compatibility between a variable X (the current frame in a video) and a prediction Y (the future of the video) and selects the outcome with the best compatibility score. In his speech, LeCun further elaborates on energy-based models and other approaches to self-supervised learning.
E-SPIN as being a value integrator to assist enterprise customers to implement various digital transformation technology, including self-supervised learning, machine learning (ML) to accelerate their speed, scale and scope objective in related to digital transformation (DT). E-SPIN since 2005, already in the business of supply, consultancy, integration, training and maintenance of enterprise ICT strategic and operation mission-critical systems, network infrastructure to application, from continuous availability, security audit to protection Feel free to contact E-SPIN for your project and operation requirements.