MPEG/Audio Compression Tutorial - DCU

MPEG/Audio Compression Tutorial - DCU

MPEG/Audio Compression Tutorial Mike Blackstock CPSC 538a January 11, 2004 Overview Digital Sound Psychoacoustics Time to Frequency Domain Transformation MPEG/Audio Basic Algorithm Related Work Web references CPSC 538a MPEG Audio Tutorial January 12, 2004 2 of 17 Digital Sound Basics Sound is a continuous wave through the air Made up of pressure differences, detected by measuring pressure levels at a location.

Microphone changes analog sound pressure to analog voltage levels. To digitize sound, the signal must be sampled in time and encoded into numbers Quantization divides signal strength into levels, linearly or logarithmically. 8 bits > 256 levels; 16 > 65536 levels CPSC 538a MPEG Audio Tutorial January 12, 2004 3 of 17 Digital Audio Questions How often should sound be sampled? Need to sample at a rate at least twice as high as highest frequency, otherwise frequency is lost. Nyquist Theorum What quality is required? Telephone, radio, CD, different quality requirements. Signal to Noise Ratio (SNR) is a measure of the quality of a signal

noise may be introduced during conversion from sound to voltage and due to sampling/quantization. Format to use? .au, aiff, .wav, and of course .mp3 CPSC 538a MPEG Audio Tutorial January 12, 2004 4 of 17 Psychoacoustics Principles of the human perception of sound MPEG compression algorithm uses model of human hearing to remove data (perceptual coding algorithm) Frequency range is about 20 Hz to 20 kHz, most sensitive at 2 to 4 KHz. Dynamic range (quietest to loudest) is about 96 dB Normal voice range is about 500 Hz to 2 kHz

Low frequencies -> vowels, bass; High -> consonants CPSC 538a MPEG Audio Tutorial January 12, 2004 5 of 17 Human Hearing Sensitivity Experiment: Put a person in a quiet room. Raise level of 1 kHz tone until just barely audible. Vary the frequency, plot: CPSC 538a MPEG Audio Tutorial January 12, 2004 6 of 17 Human Frequency Masking Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB). Play test tone at a different level (e.g., 1.1 kHz), and raise level until just distinguishable.

Vary the frequency of the test tone and plot the threshold when it becomes audible CPSC 538a MPEG Audio Tutorial January 12, 2004 7 of 17 Frequency Masking CPSC 538a MPEG Audio Tutorial January 12, 2004 8 of 17 Temporal Masking If we hear a loud sound, then it stops, it takes a little while until we can hear a soft tone nearby.

Experiment: Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tone can't be heard (it's masked). Stop masking tone, then stop test tone after a short delay. Adjust delay time to the shortest time when test tone can be heard (e.g., 5 ms). Repeat with different level of the test tone and plot: CPSC 538a MPEG Audio Tutorial January 12, 2004 9 of 17 Combination

CPSC 538a MPEG Audio Tutorial January 12, 2004 10 of 17 Time to Frequency Transform Transforming time/level input signals to frequency/power FFT (here) most popular fast and easy, and in most numerical methods texts. Used by psychoacoustic model. DCT often used for spatial frequency since represents linear signals better. Something similar used by filter bank. Wavelets use non-sine/cosine functions for better performance on data with sharp Demo http://www.jhu.edu/~signals/fourier2/index.html discontinuities. CPSC 538a MPEG Audio Tutorial January 12, 2004 11 of 17 MPEG Basics

PCM audio input Time to frequency mapping filter bank Psychoacoustic model CPSC 538a MPEG Audio Tutorial January 12, 2004 Bit/noise allocation, quantizer, and coding Bitstream

formatting Encoded bitstream Acillary data (optional) 12 of 17 Algorithm overview 1. Use convolution filters to divide the audio signal (e.g., 48 kHz sound) into 32 frequency subbands --> subband filtering. 512 sample FIFO buffer used. 2. Determine amount of masking for each band caused by

nearby band using the psychoacoustic model shown above. 3. If the power in a band is below the masking threshold, don't encode it. 4. Otherwise, determine number of bits needed to represent the coefficient such that noise introduced by quantization is below the masking effect (Recall that one fewer bit of quantization introduces about 6 dB of noise). 5. Format bitstream CPSC 538a MPEG Audio Tutorial January 12, 2004

13 of 17 Example After analysis, the first levels of 16 of the 32 bands are: Band 1 Level(db)0 2 3 4 5 6

7 8 9 10 11 12 13 14 15 16 8 12 10 6 2 10 60 35 20 15 2 3 5

3 1 If the level of the 8th band is 60dB, it gives a masking of 12 dB in the 7th band, 15dB in the 9th. Level in 7th band is 10 dB ( < 12 dB ), so ignore it. Level in 9th band is 35 dB ( > 15 dB ), so send it. Only the amount above the masking level needs to be sent, so instead of using 6 bits to encode it, we can use 4 bits saving 2 bits (= 12 dB). CPSC 538a MPEG Audio Tutorial January 12, 2004 14 of 17 MPEG layers Layer 1 DCT-type filter with one frame equal frequency spread per band Psychoacoustic model only uses frequency masking.

Layer 2 Use three frames in filter before, current, next, a total of 1152 samples models a bit of temporal masking Layer 3 (mp3) Better critical band filter is used (non-equal frequencies) psychoacoustic model includes temporal masking effects takes into account stereo redundancy uses Huffman coder CPSC 538a MPEG Audio Tutorial January 12, 2004 15 of 17

Related Work MPEG phase 2 Multichannel (5.1) audio support Significant in driving DVD sales MPEG-4 Structured Audio Efficient, flexible description of synthetic music Copy protection and copyright Speech Processing Uses many similar techniques CPSC 538a MPEG Audio Tutorial January 12, 2004 16 of 17 References

SFU CMPT 365 Course Contents Spring 2003 Basics of Digital Audio, http://www.cs.sfu .ca/CC/365/mark/material/notes/Chap3/Chap3.1/Chap3.1.html, retrieved January 7, 2004 Audio Compression, http://www.cs.sfu .ca/CC/365/mark/material/notes/Chap4/Chap4.4/Chap4.4.html, retrieved January 7, 2004 Audio and Multimedia Layer 3, http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html, January 8, 2004 MP3 Backgrounder http://www.audioactive.com/intro/papers/backbone.html, January 8, 2004

Scheirer, E. D., The MPEG-4 Structured Audio, Proceedings of ICASSP98 Scheirer, E.D., SAOL / MPEG-4 Structured Audio homepage, http://web.media.mit.edu/~eds/mpeg4-old/ PERKOWSKI, M. A., Speech Signals in Time and Frequency Domain http://www.ee.pdx.edu/~mperkows/CLASS_480/transmit1/A003.time-andfrequency-domain.pdf, January 11, 2004 Graps, A. An Introduction to Wavelets" IEEE Computational Sciences and Engineering, Volume 2, Number 2, Summer 1995, pp 50-61. Also available at http://www.amara.com/IEEEwave/IEEEwavelet.html Signals Demonstrations http://www.jhu.edu/~signals/index.html CPSC 538a MPEG Audio Tutorial January 12, 2004 17 of 17

Recently Viewed Presentations

  • PowerPoint Presentation

    PowerPoint Presentation

    Text inside these brackets will not appear --> Use comments to annotate code or document changes Lesson 2 Summary Define elements and markup tags Identify HTML5 document structure tags Define Web site file structures Prepare your development environment Use paragraph...
  • Ordinanace Considerations Royal Oak Presprective (Draft)

    Ordinanace Considerations Royal Oak Presprective (Draft)

    Willow Creek (in excess of 2000 home sites) Infill of Plantation Oak. 16 new apartments on Knox McRae. Vacant unimproved property in many locations, already zoned multi-family. Construction Delays impact QOL. Plantation Oaks and the not yet started Country Club...
  • Progressive Era Reform and Jim Crow in the Southwest

    Progressive Era Reform and Jim Crow in the Southwest

    Common knowledge. Congressional intent In re Thind (1920) Asian Indians are White. Legal precedent United States v. Ali (1925) Punjabis (whether Hindu or Arabian) are not White. Common knowledge In re Feroz Din (1928) Afghanis are not White. Common knowledge...
  • Chapter 6: Bones and Skeletal Tissues

    Chapter 6: Bones and Skeletal Tissues

    Bony callus formation. Starts within a few days in areas closer to blood supply 2 months until firm union. Trabeculae appear in the soft callus join the living and dead portions of the original bone fragments gradually converted to a...
  • Aftermath of World War II - Methacton School District

    Aftermath of World War II - Methacton School District

    international agreement governing the humane treatment of wounded soldiers and prisoners of war . ... Stalin, however, eventually reneged on the promises made at Yalta and Potsdam. Nearly all of the Eastern European countries occupied by Soviet troops at war's...
  • FileNewTemplate - Deltek

    FileNewTemplate - Deltek

    Custom Fields. Track information pertinent to YOUR firm. Easily report on information stored in custom fields. Custom fields are available on all primary records in Ajera Dashboards & Widgets. Information at your fingertips based on user requirements & KPIs. Easily...
  • Gas Laws - Department of Atmospheric Sciences

    Gas Laws - Department of Atmospheric Sciences

    Can combine both gas laws into one: the Perfect Gas Law (a.k.a. the Ideal Gas Law) P= rRT. Where r is density (kgm-3), R is a gas constant, P is pressure (Nm-2), and T is temperature (K) One of the...
  • Volcanoes Week 2 - Tuscaloosa County School District

    Volcanoes Week 2 - Tuscaloosa County School District

    Lapilli are pea-size to walnut-sized pieces of volcanic rock. All types of lava produces lapilli. Shield volcanoes, cinder cones, and composite volcanoes all produce lapilli. Lava bombs. Lava bombs are volcanic rocks larger than 64 mm in size. Lava bombs...