Voice chip definition: Convert voice signals into numbers through sampling, store them in the ROM of the IC, and then restore the numbers in the ROM into voice signals through circuits.
According to the output mode of the voice chip, it is divided into two categories. One is the PW output mode and the other is the DAC output mode. The PD output volume is not continuously adjustable and cannot be connected to an ordinary power amplifier. Most voice chips currently on the market use the Pm output mode. . The other is that the DAC is amplified by the internal EQ. The sound of the voice chip is continuously adjustable, can be digitally controlled and adjusted, and can be connected to an external power amplifier.
The playback function of ordinary voice chips is essentially a DAC process, and the ADC process data is completed by the computer, including sampling, compression, EQ and other processing of the voice signal.
The recording chip includes two processes, ADC and DAC, both of which are completed by the chip itself, including steps such as voice data collection, analysis, compression, storage, and playback.
The sound quality depends on the number of ADC and DAC bits. For example: 20 seconds to 340 seconds, the minimum is from 10 seconds to 340 seconds. Intuitively from the name, the voice chip is a chip related to voice. Voice is the stored electronic sound. Any chip that can emit sound is a voice chip. Commonly known as sound chip, more accurately in English it should be Voice IC. In the large family of voice chips, it can be divided into two types according to the type of sound: (Speech IC) and (Music IC). This should be regarded as the professional distinction between voice chips. method.
Mask production: In layman’s terms, mask production means burning the sound into the chip first, and then encapsulating it. Generally, there are quantitative requirements.
OTP production: The so-called OTP means one-time burning. First package the chip, and then use software to burn in the sound.
Voice chips have multiple channels (send multiple channels of sound at the same time) based on the physical structure of the IC itself and can be divided into various types:
The often mentioned monophonic chip is the most basic music IC. The effect of the monophonic chip is determined by the number of notes output within a certain period of time. There are as many as 64 notes, 128 notes, etc. The monophonic chip has a wide range of applications. The price is extremely low. The most common ones are monophonic chips and Happy Birthday card monophonic chips. Typical ones are GY20S, etc. Strictly speaking, the structures of single-channel music ICs and monophonic chips are different.
This structure may be determined by the actual application fields and prices of the products and solutions. The voice chip output is generally a single-channel sound output. There are very few products that support stereo sound. For high-end products, you must choose an MP3 main control chip. kind of plan
would like to add a few words here. There is also a music chip called meldy on the market. What is its definition? Simply put, it is a music chip that is better than a monophonic chip and worse than a polyphonic chip. Therefore, dual-tone chips are also called melody music chips. The melodv structure should be said to be a more advanced single-tone chip, or a single-tone chip with twice the effect.
Sounds with more than three channels are also called polyphonic music. The often said 4-chord music IC refers to a 4-channel music IC, such as GYO40.
Generally, multi-channel voice chips support both music IC (Music IC) and voice IC (Speech IC) functions.
(1) Quantification of speech signal
Sampling rate (f), number of bits (n), baud rate (T)
Sampling: Convert voice analog signals into digital signals.
Sampling rate: the number of samples per second (byte).
Baud rate: the number of bits sampled per second. The baud rate directly determines the sound quality. Bps:bitper second.
The number of sampling digits refers to the number of digits under binary conditions. Generally, unless otherwise specified, the number of sound sampling bits refers to 8 bits, ranging from 00H to FFH, and mute is set to 80H.
(2) Sampling rate
Nyquist Sampling Theorem (Nyquist Law): To restore the original signal from the sampled signal without distortion, the sampling frequency should be greater than 2 times the highest frequency of the signal. When the sampling frequency is less than 2 times the highest harmonic frequency, the signal spectrum has aliasing. When the sampling frequency is greater than 2 times the highest frequency of the spectrum, the spectrum of the signal has no aliasing.
The frequency band width of voice is about 20~20K H2, and ordinary voice is about below 3KHZ. Therefore, the sound quality of CD is generally 44.1K and 16bit, if you encounter some special sounds. For example, the sound quality of musical instruments is also available in 48K and 24bit, but it is not mainstream.
Generally, when we deal with ordinary voice ICs, the sampling rate is up to 16K. The speaking sound is generally about SK (such as telephone sound quality) and 6K. The effect is poor below 6K. The GY series voice chip sampling can reach 22K.
In the process of applying the microcontroller, the higher the sampling rate, the faster the timer interrupt speed, which will affect the monitoring and detection of other signals, so it must be considered comprehensively.
(3)Voice compression technology
Due to the huge amount of voice data, it is necessary to effectively compress the voice data, which allows us to record more voice content in the limited ROM space. There are several ways:
Voice segmentation: Cut out the repeatable parts of the speech, and play back the content completely through arrangement and combination.
Voice sampling: Generally, the frequency response curve of the speakers we use is in the mid-frequency part, and high frequencies are rarely used. Therefore, when the speaker sound quality is acceptable, the sampling frequency should be appropriately reduced to achieve the compression effect. This process is irreversible. The original appearance cannot be restored, which is called lossy compression.
Mathematical compression: mainly compresses the number of sampling bits. This method is also lossy compression. For example, the ADPCW compression format we often use compresses voice data from 16 bits to 4 bits, with a compression rate of 4 times. MP3 compresses data streams and involves data prediction. Its baud rate compression ratio is about 10 times.
Usually, the above compression methods are used in combination.
(4)Commonly used voice formats
PCM format: Pulse code Modulation, which samples the sound analog signal to obtain quantized voice data, is the most basic and original voice format. Very similar to it are RAW format and SND format. They are all speech-only formats.
WAV format: Wave Audio Files is a sound file format developed by Microsoft, also called waveform sound file, and is widely supported by Windows platforms and their applications. The WAV format supports many compression algorithms and supports a variety of audio bits, sampling frequencies and channels. However, the WAV format requires too much storage space and is not convenient for communication and dissemination. Each piece of data stored in the WAV file has its own independent identifier. These identifiers can tell the user what data it is. These data include sampling frequency and number of bits, mono or stereo, etc.
ADPCM format: It uses several sample values of the pair to predict the current input sample value, and makes it have an adaptive prediction function to compare with the actual detection value, and automatically quantify the measured difference at any time. processing so that it always changes synchronously with the signal. It is suitable for situations where the voice change rate is moderate and the sound playback process is brief. Its advantage is that the processing of human voices is relatively realistic, generally reaching more than 90%, and it has been widely used in the field of telephone communications.
MP3 format: Moving Picture Experts Group Audio Laver III, referred to as MP3. It uses the technology of MPEG Audio Laver 3 and adopts an encoding algorithm called “sensory encoding technology”: when encoding, the audio file is first analyzed for spectrum, then a filter is used to filter out the noise level, and then the remaining audio is quantized. Each bit below is scattered and arranged, and finally an m3 file with a higher compression ratio is formed, so that the compressed file can achieve a sound effect closer to the original sound source during playback. Its essence is that vbr (variantBitrate variable baud rate) can dynamically select an appropriate baud rate according to the encoded content, so the encoding result ensures sound quality while taking care of the file size. MP3 compression rate is 10 times or even 12 times. It is the first high-compression rate speech format Linear Scale format: according to the change rate of the sound, the sound is divided into several segments, and each segment is compressed using a linear ratio, but its ratio is variable.
Logpcm format: basically perform linear compression on the entire sound and remove the last few bits. This compression method is easy to implement on hardware, but the sound quality is worse than Linear scale, especially when the volume is small and the sound is delicate. Mainly used for pure speech. mid format. Mid-format voice occupies a relatively small space, and sometimes more than ten pieces of mid-format music can be loaded into a chip in just 20 seconds.
(1) Music channels and timbre:
Envelope square wave (patch) channel (channel1)
(2)Compression of music:
Due to the huge amount of music data, it is necessary to effectively compress the music data, which allows us to record more music content in the limited ROM space. There are several ways:
The speech chip is a visualization of the expression, represented by the length of the speech:
That is: the length that the chip can play with a 6k sampling rate.
The cost of chips of the same variety is directly proportional to the size of the chip.
The sound quality is reduced, the sampling is reduced, and the voice seconds are longer.
Calculation method of voice seconds: M/(n*f)
M—ROM size (bit)n*f—Baud rate
Introduction to sound processing software:
Voice chips are divided according to the type of integrated circuits. All integrated circuits related to sound are collectively called voice chips (also called voice ICs, here they should be called voice ICs). However, among the large types of voice chips, they are also divided into voice chips. There are two types: IC (should be called Speech Ic here) and music IC (should be called Music Ic here).
Monophonic chip: It is the most basic music IC. It is a single channel of music. The number of notes output at the same time determines the effect of the monophonic chip. There are more than 70, more than 100 notes, etc.
Music channels: 2 channels, 3 channels, 4 channels, 8 channels, 12 channels and more. . .
Control mode: button control, first-line serial port control, second-line serial port control, third-line serial port control, parallel port control, microcontroller control, etc.
The main voice chips are roughly divided into 20 seconds, 40 seconds, 80 seconds, 170 seconds, etc. Compared with traditional chips, most of these ICs use It is an 8-pin package, making the operation easier.
10.Ringtones, etc.
Your Trusted Partner for PCB Fabrication, Component Sourcing & PCB Assembly
Don’t miss our future updates! Get Subscribed Today!
©2023. Geyuan Electronics Co., Ltd., All Rights Reserved.