说明:This paper describes a novel speech coding concept created by introducing sparsity constraints in a linear prediction scheme both on the residual and on the prediction vector. The residual is efficiently encoded using well known multi-pulse excitation procedures due to its sparsity. A robust statistical method for the joint estimation of the short-term and long-term predictors is also provided by exploiting the sparse characteristics of the predictor. Thus, the main purpose of this work is showing that better statistical modeling in the context of speech analysis creates an output that offers better coding properties. The proposed estimation method leads to a convex optimization problem, which can be solved efficiently using interior-point methods. Its simplicity makes it an attractive alternative to common speech coders based on minimum variance linear prediction.
说明:This paper proposes a new variant of the least square autoregressive (LSAR) method for speech reconstruction, which can estimate via least squares a segment of missing samples by applying the linear prediction (LP) model of speech. First, we show that the use of a single high-order linear predictor can provide better results than the classic LSAR techniques based on short- and long-term predictors without the need of a pitch detector. However, this high-order predictor may reduce the reconstruction performance due to estimation errors, especially in the case of short pitch periods, and non-stationarity. In order to overcome these problems, we propose the use of a sparse linear predictor which resembles the classical speech model, based on short- and long-term correlations, where many LP coefficients are zero. The experimental results show the superiority of the proposed approach in both signal to noise ratio and perceptual performance.
说明:Compressive sensing (CS) has been proposed for signals with sparsity in a linear transform domain. We explore a signal dependent unknown linear transform, namely the impulse response matrix operating on a sparse excitation, as in the linear model of speech production, for recovering compressive sensed speech. Since the linear transform is signal dependent and unknown, unlike the standard CS formulation, a codebook of transfer functions is proposed in a matching pursuit (MP) framework for CS recovery. It is found that MP is efficient and effective to recover CS encoded speech as well as jointly estimate the linear model. Moderate number of CS measurements and low order sparsity estimate will result in MP converge to the same linear transform as direct VQ of the LP vector derived from the original signal. There is also high positive correlation between signal domain approximation and CS measurement domain approximation for a large variety of speech spectra.
