Skip to content

aud.pitch

Olivier Lartillot edited this page Feb 4, 2018 · 4 revisions

Extracts stable fundamental frequencies

The pitch content can be estimated in various ways:

  • aud.pitch(..., ‘Autocor’) computes an autocorrelation function of the audio waveform, using sig.autocor. This is the default strategy. Options related to sig.autocor can be specified:
    • 'Enhanced’, toggled on by default here,
    • 'Compress’, set by default to .5,
    • filterbank configuration can be specified: either ‘2Channels’ (default configuration), ‘Gammatone’ or ‘NoFilterbank’,
    • if a filterbank is used, ‘Sum’ specifies whether the channels are recombined once the autocorrelation function is computed (‘Sum’, 1, which is the default), or if on the contrary, the channels are kept separate, and pitch content is extracted in each channel separately (‘Sum’, 0).
  • aud.pitch(..., ‘Spectrum’) computes the FFT spectrum (sig.spectrum). Options related to sig.spectrum can be specified: ‘Res’ and ‘dB’.
  • aud.pitch(..., ‘AutocorSpectrum’) computes the autocorrelation (sig.autocor) of the FFT spectrum (sig.spectrum). mirpitch(..., ‘Cepstrum’) computes the cepstrum (sig.cepstrum).

Then a peak picking is applied to the autocorrelation function or to the cepstral representation. The parameters of the peak picking can be tuned.

  • aud.pitch(..., 'Total’, m) selects only the m best pitches.
  • aud.pitch(..., 'Mono’) only select the best pitch, corresponding hence to mirpitch(..., 'Total’, 1).
  • aud.pitch(..., 'Min’, mi) indicates the lowest pitch taken into consideration, in Hz. Default value: 75 Hz, following a convention in the Praat software (Boersma & Weenink, 2005).
  • aud.pitch(..., 'Max’, ma) indicates the highest pitch taken into consideration, expressed in Hz. Default value: 2400 Hz, because there seem to be some problems with higher frequency, due probably to the absence of pre-whitening in our implementation of Tolonen and Karjalainen autocorrelation approach (used by default).
  • aud.pitch(..., 'Threshold’, t) specifies the threshold factor for the peak picking. Default value: c = 0.4.
  • aud.pitch(..., 'Contrast’, c) specifies the contrast factor for the peak picking. Default value: c = 0.1.
  • aud.pitch(..., 'Order’, o) specifies the ordering for the peak picking. Default value: o = ‘Amplitude’.

aud.pitch accepts as input data type either:

  • output of sig.peaks computation,
  • sig.Autocor objects,
  • sig.Cepstrum objects,
  • sig.Spectrum objects,
  • sig.signal objects, where the audio waveform can be:
    • segmented (using sig.segment),
    • when pitch is estimated by autocorrelating the audio waveform (‘Autocor’ strategy), the audio waveform is be default first decomposed into channels (cf. the ‘Filterbank’ option below),
    • decomposed into frames or not (using sig.frame);
    • file name or the ‘Folder’ keyword: same behavior than for miraudio objects.

aud.pitch can return several outputs:

  • the pitch frequencies themselves, and
  • the sig.Autocor or sig.Cepstrum data, where is highlighted the (set of) peak(s) corresponding to the estimated pitch (or set of pitches).

Frame decomposition

aud.pitch(..., ‘Frame’, ...) performs first a frame decomposition, with by default a frame length of 46.4 ms and a hop factor of 10 ms (Tolonen & Karjalainen, 2000). For the specification of other frame configuration using additional parameters, cf. sig.frame.

Post-Processing Options

  • aud.pitch(..., 'Median’, l) performs a median filtering of the pitch curve. The length of the median filter is given by l (in s.). Its default value is .1 s. The median filtering can only be applied to mono-pitch curve. If several pitches were extracted in each frame, a mono-pitch curve is first computed by selecting the best peak of each successive frame.
  • aud.pitch((..., 'Stable’, th, n) remove pitch values when the difference (or more precisely absolute logarithmic quotient) with the n precedent frames exceeds the threshold th. if th is not specified, the default value .1 is used. if n is not specified, the default value 3 is used.

Accessible Output

Accessible using the get method.

  • 'Amplitude': the amplitude associated with each pitch component

Music-theory based model

Music-theory representation of pitch isavailable in mus.pitch.

Clone this wiki locally