Estimation of missing data

Estimate missing data by least squares: Minimize the energy of second-order derivative subject to the data consistency constraint.

Ivan Selesnick
selesi@poly.edu

Contents

Start

clear
close all

Load data

load data.txt;
whos

y = data;                   % y : data value

N = length(y);
n = 1:N;
  Name        Size            Bytes  Class     Attributes

  data      200x1              1600  double              

Missing data appear as NaN's

y(1:10)                     % The first 10 data values
ans =

   -0.0144
       NaN
   -0.0126
       NaN
   -0.0108
   -0.0099
       NaN
       NaN
   -0.0065
       NaN

Display data

The NaN's appear as gaps in the plot

figure(1)
clf
plot(n, y)
title('Data')

Define matrix D

D represents the second-order derivitive (2nd-order difference). D is defined as a sparse matrix so that Matlab subsequently uses fast solvers for banded systems.

e = ones(N, 1);
D = spdiags([e -2*e e], 0:2, N-2, N);

Fist corner of D:

full(D(1:5, 1:5))
ans =

     1    -2     1     0     0
     0     1    -2     1     0
     0     0     1    -2     1
     0     0     0     1    -2
     0     0     0     0     1

Last corner of D:

full(D(end-4:end, end-4:end))
ans =

     1     0     0     0     0
    -2     1     0     0     0
     1    -2     1     0     0
     0     1    -2     1     0
     0     0     1    -2     1

Define matrices S and Sc

k = isfinite(y);                    % k : logical vector, indexes known values

S = speye(N);
S(~k, :) = [];                      % S : sampling matrix

Sc = speye(N);                      % Sc : complement of S
Sc(k, :) = [];

L = sum(~k)                         % L : number of missing values
L =

   100

Estimate missing data

Least square estimation of missing data. Note that the system matrix is banded so the system equations can be solved very efficiently with a fast banded system solver. By defining S and D as sparse matrices, Matlab calls a fast banded system solver by default.

v = -(Sc * (D' * D) * Sc') \ ( Sc * D' * D * S' * y(k));    % v : estimated samples

Fill in unknown values

Place the estimated samples into the signal.

x = zeros(N,1);
x(k) = y(k);
x(~k) = v;

% The above 3 lines is a more direct way to implement:
% x = Sc' * v + S'*y(k);

figure(1)
clf
plot(n, y, 'k', n(~k), x(~k) ,'k.')
legend('Known data', 'Estiamted data')
figure(1)
clf
plot(n, x )
title('Final signal')

Banded matrix visualization

As noted above, the system matrix is banded. This can be visualized with the 'spy' command in Matlab:

G = Sc * (D' * D) * Sc';

figure(1)
clf
spy(G)
title('Visualization of banded matrix')

% It can be seen that all the non-zero elements of G lie near the diagonal.