Example: Signal separation using dual basis pursuit (Example 2)
Decompose a speech waveform into two distinct signal components using short-time Fourier transforms (STFT) with short and long windows. The two components accentuate the pitch harmonics and the formants, respectively.
Ivan Selesnick NYU-Poly selesi@poly.edu 2011
Contents
Start
clear close all printme = @(filename) print('-dpdf', sprintf('figures/Example_dualBP_2_%s', filename) );
Load data
Load speech waveform data
[y, fs] = wavread('data/arctic_a0001.wav' ); % fs : sampling rate (samples/second) y = y(18000:26000); y = y(:)'; M = length(y); % M : length of signal m = (0:M-1)'; LW = 0.1; % LW : linewidth figure(1) clf subplot(2,1,1) plot(m/fs, y, 'black', 'linewidth', LW) xlim([0 M/fs]) ymax = max(abs(y)); ylim(1.2*[-ymax 1.2*ymax]) box off title('Speech waveform'); xlabel('Time (seconds)') printme('speech')
Define transform 1
% A1 : STFT R1 = 32; A1T = @(x) pSTFT2(x, R1, 4, 3, R1); A1 = @(X) ipSTFT2(X, R1, 4, 3, M); p1 = 1; % Check perfect reconstruction property c1 = A1T(y); err = p1*y - A1(c1); re = max(abs(err(:))); fprintf('Transform 1 reconstruction error = %f\n', re) % Check Parseval energy identity E = sum(abs(y(:)).^2); E1 = sum(abs(c1(:)).^2); fprintf('Signal energy = %.3e\n', E) fprintf('Transform 1 energy = %.3e\n', E1/p1)
Transform 1 reconstruction error = 0.000000 Signal energy = 3.034e+01 Transform 1 energy = 3.034e+01
Define transform 2
% A2 : STFT R2 = 512; A2T = @(x) pSTFT2(x, R2, 4, 3, R2); A2 = @(X) ipSTFT2(X, R2, 4, 3, M); p2 = 1; % Check perfect reconstruction property c2 = A2T(y); err = p2*y - A2(c2); re = max(abs(err(:))); fprintf('Transform 2 reconstruction error = %f\n', re) % Check Parseval energy identity E = sum(abs(y(:)).^2); E2 = sum(abs(c2(:)).^2); fprintf('Signal energy = %.3e\n', E) fprintf('Transform 2 energy = %.3e\n', E2/p2)
Transform 2 reconstruction error = 0.000000 Signal energy = 3.034e+01 Transform 2 energy = 3.034e+01
Display spectrogram of speech signal
In the spectrogram, both the pitch harmonics (fine ridges) and the formatns (darker broad ridges) are clearly visible.
Y = pSTFT2(y, R2, 4, 3, R2); figure(1) clf dblim = [-50 0]; displaySTFT( Y, fs, M/fs, dblim) title(sprintf('Speech, STFT with %d-point window', R2)) orient portrait printme('STFT_y')
Peform signal separation using dual BP
Use the command 'dualBP' (dual basis pursuit) to separate the signal into two distinct components.
% Algorithm parameters theta = 0.5; % theta : trade-off parameter Nit = 200; % Nit : number of iterations mu1 = 10.0; % mu1, mu2 : ADMM parameters mu2 = 10.0; [y1,y2,c1,c2,costfn] = dualBP(y, A1, A1T, p1, A2, A2T, p2, theta, 1-theta, mu1, mu2, Nit);
Display cost function
figure(2) clf plot(1:Nit, costfn) xlim([0 Nit]) box off title('Cost function') xlabel('Iteration')
Calculate signal components
The two components y1 and y2 can be found by applying the transforms to the coefficients c1 and c2 produced by the dual basis pursuit algorithhm.
y1 = A1(c1); y2 = A2(c2); % Verify that y = y1 + y2 fprintf('Maximum of residual = %g\n', max(abs(y - y1 - y2)))
Maximum of residual = 1.52656e-16
Display signal components obtained using dual BP
figure(1) clf subplot(3,1,[1 2]) plot(m/fs, y, 'black', 'linewidth', LW) text(M/fs, 0.1, 'Speech', 'horizontalalignment','right', 'fontsize',14) vo = 0.4; line(m/fs, y1-vo, 'color', 'black', 'linewidth', LW) text(M/fs, -vo+0.1, 'Component 1', 'horizontalalignment','right', 'fontsize',14) line(m/fs, y2-2*vo, 'color', 'black', 'linewidth', LW) text(M/fs, -2*vo+0.1, 'Component 2', 'horizontalalignment','right', 'fontsize',14) set(gca, 'ytick', [-2 -1 0]*vo) set(gca, 'yticklabel', {}) t1 = 0.20; t2 = 0.24; line([t1 t2 t2 t1 t1],[-2.7 -2.7 1 1 -2.7]*vo) % , 'linestyle','--') ylim([-2.8 1.5]*vo) box off xlabel('Time (seconds)') title('Waveforms (0.5 s)'); orient portrait printme('components')
Display a short segment of the waveform to make their difference more visible.
k = (m/fs >= t1) & (m/fs < t2); figure(2) clf subplot(3,1,[1 2]) plot(m(k)/fs, y(k), 'black') % text(0.4, 0.1, 'Speech') vo = 0.4; line(m(k)/fs, y1(k)-vo, 'color', 'black') % text(0.4, -vo+0.1, 'Component 1') line(m(k)/fs, y2(k)-2*vo, 'color', 'black') % text(0.4, -2*vo+0.1, 'Component 2') set(gca, 'ytick', [-2 -1 0]*vo) set(gca, 'yticklabel', {}) box off xlabel('Time (seconds)') title('Waveforms (40 ms)'); orient portrait xlim([0.2 0.24]) set(gca, 'xtick', 0.2:0.01:0.24) ylim([-2.8 1.5]*vo) orient portrait printme('components_crop')
Note that omponent 1 consists of brief waveforms, each shorter than the pitch period. Component 2 consists of sustained oscillations that are of longer duration. The sustained oscillations visible in component 2 are resonant frequencies of the vocal tract (formants).
Display spectrograms of component signals
Note that in the spectrogram of component 1, the pitch harmonics (fine ridges) are clear and the formants are de-emphasized. While in the spectrogram of component 2, the formants are clear and the pitch harmonics are de-emphasized.
s1 = pSTFT2(y1, R2, 4, 3, R2); s2 = pSTFT2(y2, R2, 4, 3, R2); figure(1) clf displaySTFT( s1, fs, M/fs, dblim) title(sprintf('Component 1, STFT with %d-point window', R2)) orient portrait printme('STFT_y1') figure(2) clf displaySTFT( s2, fs, M/fs, dblim) title(sprintf('Component 2, STFT with %d-point window', R2)) orient portrait printme('STFT_y2')
Play sounds signals
sound(y1, fs) pause(.2) sound(y2, fs) pause(.2) sound(y, fs)