Skip to content

sig.autocor

Olivier Lartillot edited this page Aug 22, 2018 · 7 revisions

Analysis of periodicity in signals by looking at local correlation between samples.

If we take a signal x, such as for instance this trumpet sound:


the autocorrelation function is computed as follows:

For a given lag j, the autocorrelation Rxx(j) is computed by multiplying point par point the signal with a shifted version of it of j samples. We obtain this curve:

Hence when the lag j corresponds to a period of the signal, the signal is shifted to one period ahead, and therefore is exactly superposed to the original signal. Hence the summation gives very high value, as the two signals are highly correlated.

Flowchart Interconnections

sig.cepstrum accepts as input data type either:

  • sig.Signal objects, where the waveform can be segmented (using sig.segment), decomposed into channels (using sig.filterbank), and/or decomposed into frames (using sig.frame);
  • file name(s) or the 'Folder' keyword
  • data in the onset detection curve category (cf. aud.events):
    • sig.Envelope objects, frame-decomposed or not
    • fluxes (cf. sig.flux), frame-decomposed or not
  • sig.Spectrum objects
  • sig.Autocor objects, for further processing

Frame decomposition

sig.autocor(…,'Frame',…) performs first a frame decomposition, with by default a frame length of 50 ms and half overlapping. For the specification of other frame configuration using additional parameters, cf. this page.

Parameters specification

  • sig.autocor(…,'Min',mi) indicates the lowest delay taken into consideration. Default value: 0 s. The unit can be specified:
    • sig.autocor(…,'Min',mi,'s') (default unit)
    • sig.autocor(…,'Min',mi,'Hz')
  • sig.autocor(…,'Max',ma) indicates the highest delay taken into consideration. The unit can be specified as for 'Min'. Default value:
    • if the input is an audio waveform, the highest delay is by default 0.05 s (corresponding to a minimum frequency of 20 Hz).
    • input has a resolution lower than 1000 Hz, there is no highest delay by default.
    • if the input is an envelope, the highest delay is by default 2 s.
  • sig.autocor(…,n) specifies a normalization option for the cross-correlation ('biased', 'unbiased', 'coeff', 'none', 'coeffXchannels'). This corresponds exactly to the normalization options in Matlab xcorr function, as sig.autocor actually calls xcorr for the actual computation. The default value is 'coeff', corresponding to a normalization so that the autocorrelation at zero lag is identically 1. An additional option 'coeffXchannels' behaves like 'coeff' except that if the data is multi-channel, the normalization is such that the sum over channels at zero lag becomes identically 1. Note however that the 'coeff' routine is not used when the compression ('Compres') factor k is not equal to 2 (see below).

Post-processing options

  • sig.autocor(…,'Freq') represents the autocorrelation function in the frequency domain: the periods are expressed in Hz instead of seconds (see the last curve in the figure below for an illustration).
  • sig.autocor(…,'NormalWindow') divides the autocorrelation by the autocorrelation of the window. Boersma (1993) shows that by default the autocorrelation function gives higher coefficients for small lags, since the summation is done on more samples. Thus by dividing by the autocorrelation of the window, we normalize all coefficients in such a way that this default is completely resolved. At first sight, the window should simply be a simple rectangular window. But Boersma (1993) shows that it is better to use 'hanning' window in particular, in order to obtain better harmonic to noise ratio.
    • sig.autocor(…,'NormalWindow',w) specifies the window to be used, which can be any window available in the Signal Processing Toolbox. Besides w = 'rectangular' will not perform any particular windowing (corresponding to a rectangular (“invisible”) window), but the normalization of the autocorrelation by the autocorrelation of the invisible window will be performed nonetheless. The default value is w = 'hanning'. except when the input is an envelope or if the resolution is below 1000 Hz, in which case there is no windowing.
    • sig.autocor(…,'NormalWindow','off') toggles off this normalization (which is 'on' by default).
  • sig.autocor(…,'Halfwave') performs a half-wave rectification on the result, in order to just show the positive autocorrelation coefficients.

Generalized autocorrelation

sig.autocor(…,'Compres',k) – or equivalently sig.autocor(…,'Generalized',k) – computes the autocorrelation in the frequency domain and includes a magnitude compression of the spectral representation. Indeed an autocorrelation can be expressed using Discrete Fourier Transform as:

y = IDFT(|DFT(x)|2),

which can be generalized as:

y = IDFT(|DFT(x)|k)

Compression of the autocorrelation (i.e., setting a value of k lower than 2) are recommended in (Tolonen & Karjalainen, 2000) because this decreases the width of the peaks in the autocorrelation curve, at the risk however of increasing the sensitivity to noise. According to this study, a good compromise seems to be achieved using value k = .67. By default, no compression is performed (hence k = 2), whereas if the 'Compress' keyword is used, value k = .67 is set by default if no other value is indicated.

Enhanced autocorrelation

In the autocorrelation function, for each periodicity in the signal, peaks will be shown not only at the lag corresponding to that periodicity, but also to all the multiples of that periodicity. In order to avoid such redundancy of information, techniques have been proposed that automatically remove these harmonics. In the frequency domain, this corresponds to sub-harmonics of the peaks.

sig.autocor(…,'Enhanced',a): The original autocorrelation function is half-wave rectified, time-scaled by factor a (which can be a factor list as well), and subtracted from the original clipped function (Tolonen & Karjalainen, 2000). If the 'Enhanced' option is not followed by any value, the default value is a = 2:10, i.e., from 2 to 10.

If the curve does not start from zero at low lags but begins instead with strictly positive values, the initial discontinuity would be propagated throughout all the scaled version of the curve. In order to avoid this phenomenon, the curve is modified in two successive ways:

  • if the curve starts with a descending slope, the whole part before the first local minimum is removed from the curve,

  • if the curve starts with an ascending slope, the curve is prolonged to the left following the same slope but which is increased by a factor of 1.1 at each successive bin, until the curve reaches the x-axis.

See the figure below for an example of enhanced autocorrelation when computing the pitch content of a piano Amin3 chord, with the successive step of the default enhancement, as used by default in sig.pitch (cf. description of SigPitch).

1. 2.
3. 4.
5. 6.
7.
  • Fig 1: Waveform autocorrelation of a piano chord Amaj3 (blue), and scaled autocorrelation of factor 2 (red);
  • Fig 2: Subtraction of the autocorrelation by the previous scaled autocorrelation (blue), scaled autocorrelation of factor 3 (red);
  • Fig 3: Resulting subtraction (blue), scaled autocorrelation of factor 4(red);
  • Fig 4: Idem for factor 5;
  • Fig 5: Idem for factor 6;
  • Fig 6: Idem for factor 7;
  • Fig 7: Resulting autocorrelation curve in the frequency domain and peak picking

Music-theory based model

Music-theory representation of autocorrelation function is available in mus.autocor.

Accessible Output

Accessible using the get method.

  • To obtain the lag or frequency values associated to each bin, we recommend using 'Xdata', and using also 'FreqDomain' to check if those values are lags ('FreqDomain' = 0) or frequencies ('FreqDomain' = 1)

  • 'OfSpectrum': whether the input is a temporal signal (0), or a spectrum (1)

  • 'Window': contains the complete envelope signal used for the windowing

Clone this wiki locally