ATRAC: Adaptive Transform Acoustic Coding for MiniDisc

Sony Corporate Research Laboratories
6-7-35 Kitashinagawa, Shinagawa-ku, Tokyo 141 Japan

Reprinted from the 93rd Audio Engineering Society Convention in San Fransisco, 1992 October 1-4


ATRAC is an audio coding system based on psychoacoustic principles. The input signal is divided into three subbands which are then transformed into the frequency domain using a variable block length. Transform coefficients are grouped into nonuniform bands to reflect the human auditory system, and then quantized on the basis of dynamic sensitivity and masking characteristics. ATRAC compresses compact disc audio to approximately 1/5 of the original data rate with virtually no loss in sound quality.

1 Introduction

Recently, there has been an increasing consumer demand for a portable recordable high-quality digital audio media. The MiniDisc system was developed to meet this demand. The MiniDisc is based on a 64 mm optical or magneto-optical disc which has approximately 1/5 of the data storage capacity of a standard compact disc. Despite the reduced storage capacity, it was necessary that the MiniDisc maintain high sound quality and a playing time of 74 minutes. The ATRAC (Adaptive Transform Acoustic Coding) data compression system was therefor designed to meet the following criteria:

  • Compression of 16-bit 44.1 kHz stereo audio into less than 1/5 of the original data rate with minimal reduction in sound quality.
  • Simple and inexpensive hardware implementation suitable for portable players and recorders.

Important conclusions may be drawn from these graphs. First, simultaneous masking is more effective when the frequency of the masked signal is equal to or higher than that of the masker. Second, while forward masking is effective for a considerable time after the masker has stopped, backwards masking may only be effective for less than 2 or 3 ms before the onset of the masker.

2.3 Critical Bands

Critical bands [7] arose from the idea that the ear analyzes the audible frequency range using a set of subbands. The frequencies within a critical band are similar in terms of the ear's perception, and are processed separately from other critical bands. Critical bands arose naturally from experiments in human hearing and can also be derived from the distribution of sensory cells in the inner ear. Critical bands can be thought of as the frequency scale used by the ear [8].

The critical band scale is shown in Table 1. It is clear that the critical bands are much narrower at lower frequencies than at high frequencies; in fact, three quarters of the critical bands are located below 5 kHz. This indicates that the ear receives more information from the low frequencies and less from higher frequencies.

3 The ATRAC Encoder

A block diagram of the encoder structure is shown in Figure 4. The encoder has three components.

The above equation is not concerned with overall bit rate, and will in general allocate more bits than are available. In order to ensure a fixed data rate, an offset boff (the same for all BFU's) is calculated. This value is subtracted from btot(k) for each unit, giving the final bit allocation b(k):

b(k) = integer{btot(k)-boff}

If the subtraction generates a negative wordlength, that BFU is allocated 0 bits. This algorithm is illustrated in Figure 10.

4 The ATRAC Decoder

A block diagram of the decoder structure is shown in Figure 5. The decoder first reconstructs the MDCT spectral coefficients from the quantized values, using the wordlength and scale factor parameters. These spectral coefficients are then used to reconstruct the original audio signal (Figure 7). The coefficients are first transformed back into the time domain by the inverse MDCT (IMDCT) using either long mode or short mode as specified in the parameters. Finally, the three time-domain signals are synthesized into the output signal by QMF synthesis filters.


5 Conclusions

Through a combination of various techniques including psychoacoustics, subband coding and transform coding, ATRAC succeeds in coding digital audio with virtually no perceptual degradation in sound quality. Listening tests indicate that the difference between ATRAC sound and the original source is not perceptually annoying nor does it reduce the sound quality. Furthermore, the system is sufficiently compact to be installed in portable consumer products. Using ATRAC, the MiniDisc provides a practical solution for portable digital audio.

