ATRAC: Adaptive Transform Acoustic Coding for MiniDisc

Authors Avatar

ATRAC: Adaptive Transform Acoustic Coding for MiniDisc

Sony Corporate Research Laboratories
6-7-35 Kitashinagawa, Shinagawa-ku, Tokyo 141 Japan
 

Reprinted from the 93rd Audio Engineering Society Convention in San Fransisco, 1992 October 1-4

Abstract

ATRAC is an audio coding system based on psychoacoustic principles. The input signal is divided into three subbands which are then transformed into the frequency domain using a variable block length. Transform coefficients are grouped into nonuniform bands to reflect the human auditory system, and then quantized on the basis of dynamic sensitivity and masking characteristics. ATRAC compresses compact disc audio to approximately 1/5 of the original data rate with virtually no loss in sound quality.

1 Introduction

Recently, there has been an increasing consumer demand for a portable recordable high-quality digital audio media. The MiniDisc system was developed to meet this demand. The MiniDisc is based on a 64 mm optical or magneto-optical disc which has approximately 1/5 of the data storage capacity of a standard compact disc. Despite the reduced storage capacity, it was necessary that the MiniDisc maintain high sound quality and a playing time of 74 minutes. The ATRAC (Adaptive Transform Acoustic Coding) data compression system was therefor designed to meet the following criteria:

  • Compression of 16-bit 44.1 kHz stereo audio into less than 1/5 of the original data rate with minimal reduction in sound quality.
  • Simple and inexpensive hardware implementation suitable for portable players and recorders.

When digital audio data is compressed, there is normally a certain amount of quantization noise introduced into the signal. The goal of many audio coding systems [1-6] is to control the time-frequency distribution of this noise in such a way as to render it inaudible to the human ear. If this is completely successful, the reconstructed signal will be indistinguishable from the original.

In general, audio coders operate by decomposing the signal into a set of units, each corresponding to a certain range in time and frequency. Using this time-frequency distribution, the signal is analyzed according to psychoacoustic principles. This analysis indicates which units are critical and must be coded with high precision, and which units are less sensitive and can tolerate some quantization noise without degrading the perceived sound quality. Based on this information, the available bits are allocated to the time-frequency units. The spectral coefficients in each unit are then quantized using the allocated bits. In the decoder, the quantized spectra are reconstructed according to the bit allocation and then synthesized into an audio signal.

The ATRAC system operates as above, with several enhancements. ATRAC uses psychoacoustics not only in the bit allocation algorithm, but also in the time-frequency splitting. Using a combination of subband coding and transform coding techniques, the input signal is analyzed in nonuniform frequency divisions which emphasize the important low-frequency regions. In addition, ATRAC uses a transform block length which adapts to the input signal. This ensures efficient coding of stationary passages without sacrificing time resolution during transient passages.

Join now!

This paper begins with a review of the relevant psychoacoustic principles. The ATRAC encoder is then described in terms of time-frequency splitting, quantization of spectral coefficients, and bit allocation. Finally, the ATRAC decoder is described.

2 Psychoacoustics

2.1 Equi-loudness Curves

The sensitivity of the ear varies with frequency. The ear is most sensitive to frequencies in the neighbourhood of 4 kHz; sound pressure levels which are just detectable at 4 kHz are not detectable at other frequencies. In general, two tones of equal power but different frequency will not sound equally loud. The perceived loudness of a sound may ...

This is a preview of the whole essay