Kraftwerk is about music. Kraftwerk: music starts with silence
Many years ago they made a musical revolution, and since then all modern musicians have been praying for them: from DURAN DURAN and RED HOT CHILI PEPPERS, to Madonna and…

Continue reading →

Kurt COBAIN: Last 30 Days
March 3, 1993 Thursday. On this day, Kurt Cobain stayed at the Exelsior Rome 5-star hotel, after the canceled Nirvana tour in Munich and Offenbach. The reason for the failure…

Continue reading →

Cakewalk: working with a stave. About working with the stave at Cakewalk
In Cakewalk, the Staff instrument is one of the most convenient and important means of creating a melody. For musicians using classical musical notes, guitarists who are accustomed to the…

Continue reading →

On the perception of sound and music. Sound Perception and Compression

Simple compression methods

Traditional lossless compression methods (Huffman, LZW, etc.) are usually poorly applied to compressing audio information (for the same reasons as when compressing visual information).

Some lossy compression methods are listed below:
Compression of silence (pauses) – defines periods of “silence”, works similarly to run-length coding.
ADPCM – Adaptive Differential Pulse Code Modulation (the term adaptive delta-pulse-code modulation (ADPCM) is used in Russian literature. For example, the CCITT G.721 standard is from 16 to 32 Kbits / sec:
Encoding the difference between two or more consecutive samples; then the difference is quantized -> when quantizing, part of the information is lost. Quantization is adaptive (changes the parameters depending on the signal), as a result, fewer bits are necessary to achieve a better SNR. It is necessary to predict how the sound will change -> difficult

Apple has developed a proprietary system called ACE / MACE. Lossy compression, trying to predict what the value of the next count will be. Compression of the order of 2: 1.
Linear Predictive Coding (LPC) – tries to describe the signal using the “speech model” and transfers the parameters of the model -> sounds like computer-synthesized speech, 2.4 kbits / sec.
Code Excited Linear Predictor (CELP) is the same as LPC, however it additionally transmits a quantization error (using a predefined set of “code words”) -> telephone quality at 4.8 kbits / sec.
Psychoacoustic Based Compression Techniques

Representatives: MPEG layers 2, MPEG layer 3 (MP3), AAC (Advanced audio coding), TwinVQ, Ogg Vorbis, etc.

A codec algorithm using psychoacoustics usually consists of the following steps:
Calculation of the psychoacoustic model (masking).
Signal division into frequency subbands (FFT, DCT / MDCT, FilterBanks, etc.).
The quantization of the signal in the subbands in accordance with the results of the psychoacoustic model. It is possible to use one quantum level. for several input values ​​at once (vector quantization – Vector Quantization) – TwinVQ.
Some facts about sound perception
The frequency spectrum perceived by a person is (approximately) from 20 Hz to 20 kHz, the highest sensitivity in the range from 2 to 4 KHz.
The dynamic range (from the quietest perceived sounds to the loudest) is about 96 dB (more than 1 in 30,000 on a linear scale).
It is well known that a person is able to distinguish between a frequency change of 0.3% at a frequency of the order of 1kHz.
If two signals differ by less than 1dB in amplitude, they are difficult to distinguish. The resolution in amplitude depends on the frequency and the highest sensitivity is observed in the range from 2 to 4 KHz.
Spatial resolution (ability to localize the sound source) – up to 1 degree.
Sounds of different frequencies travel through the air at different speeds. As a result, the high-frequency part of the spectrum from the source located at a distance from the listener is somewhat delayed.
A person is not able to notice the sudden disappearance of high frequencies if it does not exceed about 2ms.
Some studies show that a person is able to sense frequencies above 20kHz. With age, the frequency range narrows.

Speech
Frequency spectrum carrying information in human speech: from 500 Hz to 2 kHz Low frequencies – bass and vowels
Treble Consonants
The best compression of speech is achieved using parametric encoders (LPC, CELP, etc.), trying to represent speech as a set of parameters of some speech model. General purpose codecs (MPEG, etc.) tend to produce worse compression.

Ear device

On the perception of sound and music (perception and compression of sound)

In the general case, the ear is a non-linear system and cannot be accurately described using only linear elements (such as filters and delay lines). As a by-product of non-linearity, for example, the following effect may occur: when two tones with a frequency of 1000 and 1200 Hz are applied, a third tone with a frequency of 800 Hz can also be heard. However, in the range of amplitudes of interest to us, the nonlinearity is rather weak and is usually neglected.

Structure

The ear consists of three parts: the auricle (also called the outer ear), the middle ear and the inner ear – the cochlea. Passing through various parts of the ear, the sound undergoes a change.
One of the functions of the outer ear (auricle) is to improve the localization of the sound source in space. Due to its asymmetric shape, the frequency response of signals coming from different points in space varies differently. The auricle can only affect signals with a long wavelength comparable to the size of the ear (> 3kHz). The external ear canal resonates at a frequency of about 2kHz, which gives increased sensitivity in this range.
The middle ear acts as a hydraulic booster. Since there is liquid in the cochlea and air outside, it is necessary to coordinate the resistance of the medium. The middle ear also protects against low-frequency sounds of excessive amplitude.
The inner ear is the cochlea. In expanded form it will be a tube, with a diameter gradually decreasing to one of the ends.

The advent of electronic music
Electronic music owes its birth precisely to experiments with sound. Experiments with sound, or rather with sound vibrations, were essentially closer to science than to music. But it was precisely…

...

Musical notation. About musical notation
Musical notation is an ordered set of musical notation symbols designed to convey some musical material in writing. Musical notation - a system of graphic characters used to record music.…

...

To the musician: Frequencies that are useful to remember. Frequency Musician Memo
The network (power) is noisy at a frequency of 50 Hz (and multiplied). To eliminate this, you need to remove the frequencies of 50 and 100 Hz using a parametric…

...

The advent of electronic music
Electronic music owes its birth precisely to experiments with sound. Experiments with sound, or rather with sound vibrations, were essentially closer to science than to music. But it was precisely…

...