Basic knowledge of voice chip

Voice basics

1. What is a voice chip?

Voice chip definition: Convert voice signals into numbers through sampling, store them in the ROM of the IC, and then restore the numbers in the ROM into voice signals through circuits.
According to the output mode of the voice chip, it is divided into two categories. One is the PW output mode and the other is the DAC output mode. The PD output volume is not continuously adjustable and cannot be connected to an ordinary power amplifier. Most voice chips currently on the market use the Pm output mode. . The other is that the DAC is amplified by the internal EQ. The sound of the voice chip is continuously adjustable, can be digitally controlled and adjusted, and can be connected to an external power amplifier.
The playback function of ordinary voice chips is essentially a DAC process, and the ADC process data is completed by the computer, including sampling, compression, EQ and other processing of the voice signal.
The recording chip includes two processes, ADC and DAC, both of which are completed by the chip itself, including steps such as voice data collection, analysis, compression, storage, and playback.

  • ADC=Analog Digital Change analog-to-digital conversion
  • DAC= Digital Analog Change digital-to-analog conversion

The sound quality depends on the number of ADC and DAC bits. For example: 20 seconds to 340 seconds, the minimum is from 10 seconds to 340 seconds. Intuitively from the name, the voice chip is a chip related to voice. Voice is the stored electronic sound. Any chip that can emit sound is a voice chip. Commonly known as sound chip, more accurately in English it should be Voice IC. In the large family of voice chips, it can be divided into two types according to the type of sound: (Speech IC) and (Music IC). This should be regarded as the professional distinction between voice chips. method.

2. How to produce voice chips

Mask production: In layman’s terms, mask production means burning the sound into the chip first, and then encapsulating it. Generally, there are quantitative requirements.

OTP production: The so-called OTP means one-time burning. First package the chip, and then use software to burn in the sound.

Voice chips have multiple channels (send multiple channels of sound at the same time) based on the physical structure of the IC itself and can be divided into various types:

1. Single channel:

  1. Single-channel speech IC (Speech IC) (this kind of speech chip does not support music IC music storage method); common speech IC is a single-channel speech chip, GYO20-OTP20 seconds and GY010 animal sounds are the most typical single-channel speech ICs. channel voice chip.
  2. Single-channel music IC (Music IC), a music IC that can only emit one type of music in the same unit of time. The electronic sound file is a .Mid suffix file with only one channel.

The often mentioned monophonic chip is the most basic music IC. The effect of the monophonic chip is determined by the number of notes output within a certain period of time. There are as many as 64 notes, 128 notes, etc. The monophonic chip has a wide range of applications. The price is extremely low. The most common ones are monophonic chips and Happy Birthday card monophonic chips. Typical ones are GY20S, etc. Strictly speaking, the structures of single-channel music ICs and monophonic chips are different.

2.2 channels:

  1. 2-channel voice IC, 2-channel and multi-channel voice chips. In actual applications, voice playback is generally fixed in a certain channel for sound playback (equivalent to single channel). However, this type of product is smaller than a single channel. The cost of the channel’s speech IC (Speech ic) will be higher, and the price will be higher. In order to balance product price and application during design, speech chip manufacturers generally do a more perfect job in terms of functional support and sound effects.

This structure may be determined by the actual application fields and prices of the products and solutions. The voice chip output is generally a single-channel sound output. There are very few products that support stereo sound. For high-end products, you must choose an MP3 main control chip. kind of plan

  1. A 2-channel music chip, commonly known as a dual-tone chip (lusic with Dual Tone IC). As the name suggests, a music IC electronic sound source file that can emit music on both channels within the same unit of time is generally a two-channel .Mid File, common Christmas series music ICs such as: GYM3S16, GYM3S16-A, etc.

would like to add a few words here. There is also a music chip called meldy on the market. What is its definition? Simply put, it is a music chip that is better than a monophonic chip and worse than a polyphonic chip. Therefore, dual-tone chips are also called melody music chips. The melodv structure should be said to be a more advanced single-tone chip, or a single-tone chip with twice the effect.

3. 4 channels, 8 channels or above:

Sounds with more than three channels are also called polyphonic music. The often said 4-chord music IC refers to a 4-channel music IC, such as GYO40.

Generally, multi-channel voice chips support both music IC (Music IC) and voice IC (Speech IC) functions.

(a) Introduction to “Voice Chip”:

(1) Quantification of speech signal

Sampling rate (f), number of bits (n), baud rate (T)

Sampling: Convert voice analog signals into digital signals.

Sampling rate: the number of samples per second (byte).

Baud rate: the number of bits sampled per second. The baud rate directly determines the sound quality. Bps:bitper second.

The number of sampling digits refers to the number of digits under binary conditions. Generally, unless otherwise specified, the number of sound sampling bits refers to 8 bits, ranging from 00H to FFH, and mute is set to 80H.

(2) Sampling rate

Nyquist Sampling Theorem (Nyquist Law): To restore the original signal from the sampled signal without distortion, the sampling frequency should be greater than 2 times the highest frequency of the signal. When the sampling frequency is less than 2 times the highest harmonic frequency, the signal spectrum has aliasing. When the sampling frequency is greater than 2 times the highest frequency of the spectrum, the spectrum of the signal has no aliasing.

The frequency band width of voice is about 20~20K H2, and ordinary voice is about below 3KHZ. Therefore, the sound quality of CD is generally 44.1K and 16bit, if you encounter some special sounds. For example, the sound quality of musical instruments is also available in 48K and 24bit, but it is not mainstream.

Generally, when we deal with ordinary voice ICs, the sampling rate is up to 16K. The speaking sound is generally about SK (such as telephone sound quality) and 6K. The effect is poor below 6K. The GY series voice chip sampling can reach 22K.

In the process of applying the microcontroller, the higher the sampling rate, the faster the timer interrupt speed, which will affect the monitoring and detection of other signals, so it must be considered comprehensively.

(3)Voice compression technology

Due to the huge amount of voice data, it is necessary to effectively compress the voice data, which allows us to record more voice content in the limited ROM space. There are several ways:

Voice segmentation: Cut out the repeatable parts of the speech, and play back the content completely through arrangement and combination.

Voice sampling: Generally, the frequency response curve of the speakers we use is in the mid-frequency part, and high frequencies are rarely used. Therefore, when the speaker sound quality is acceptable, the sampling frequency should be appropriately reduced to achieve the compression effect. This process is irreversible. The original appearance cannot be restored, which is called lossy compression.

Mathematical compression: mainly compresses the number of sampling bits. This method is also lossy compression. For example, the ADPCW compression format we often use compresses voice data from 16 bits to 4 bits, with a compression rate of 4 times. MP3 compresses data streams and involves data prediction. Its baud rate compression ratio is about 10 times.

Usually, the above compression methods are used in combination.

(4)Commonly used voice formats

PCM format: Pulse code Modulation, which samples the sound analog signal to obtain quantized voice data, is the most basic and original voice format. Very similar to it are RAW format and SND format. They are all speech-only formats.

WAV format: Wave Audio Files is a sound file format developed by Microsoft, also called waveform sound file, and is widely supported by Windows platforms and their applications. The WAV format supports many compression algorithms and supports a variety of audio bits, sampling frequencies and channels. However, the WAV format requires too much storage space and is not convenient for communication and dissemination. Each piece of data stored in the WAV file has its own independent identifier. These identifiers can tell the user what data it is. These data include sampling frequency and number of bits, mono or stereo, etc.

ADPCM format: It uses several sample values of the pair to predict the current input sample value, and makes it have an adaptive prediction function to compare with the actual detection value, and automatically quantify the measured difference at any time. processing so that it always changes synchronously with the signal. It is suitable for situations where the voice change rate is moderate and the sound playback process is brief. Its advantage is that the processing of human voices is relatively realistic, generally reaching more than 90%, and it has been widely used in the field of telephone communications.

MP3 format: Moving Picture Experts Group Audio Laver III, referred to as MP3. It uses the technology of MPEG Audio Laver 3 and adopts an encoding algorithm called “sensory encoding technology”: when encoding, the audio file is first analyzed for spectrum, then a filter is used to filter out the noise level, and then the remaining audio is quantized. Each bit below is scattered and arranged, and finally an m3 file with a higher compression ratio is formed, so that the compressed file can achieve a sound effect closer to the original sound source during playback. Its essence is that vbr (variantBitrate variable baud rate) can dynamically select an appropriate baud rate according to the encoded content, so the encoding result ensures sound quality while taking care of the file size. MP3 compression rate is 10 times or even 12 times. It is the first high-compression rate speech format Linear Scale format: according to the change rate of the sound, the sound is divided into several segments, and each segment is compressed using a linear ratio, but its ratio is variable.

Logpcm format: basically perform linear compression on the entire sound and remove the last few bits. This compression method is easy to implement on hardware, but the sound quality is worse than Linear scale, especially when the volume is small and the sound is delicate. Mainly used for pure speech. mid format. Mid-format voice occupies a relatively small space, and sometimes more than ten pieces of mid-format music can be loaded into a chip in just 20 seconds.

(b) Introduction to “Music Chip”:

(1) Music channels and timbre:

Envelope square wave (patch) channel (channel1)

  • Envelope: part of a synthesized timbre, the change in note output per unit time, commonly known as “ADSR”
  • Square wave: Part of the synthesized timbre, the change of square wave current of the note per unit time. (See also triangle wave, etc.)
  • Channel: The number of notes output by the chip at the same time, that is, the number of “monophonic instruments”.
  • PCT: A type of analog timbre, which simulates the pitch of each note by sampling 256 points of musical instrument sounds. (The sound is soft, takes up little space, but is not realistic enough)
  • FULL WAVE: Simulate the pitch of each note by collecting the sound of an instrument. (The sound of musical instruments is realistic, but it takes up a lot of space and requires high quality of timbre collection)

(2)Compression of music:

Due to the huge amount of music data, it is necessary to effectively compress the music data, which allows us to record more music content in the limited ROM space. There are several ways:

  • Music segmentation: Cut out the repeatable parts of the music, and play back the content completely through arrangement and combination.
  • Tone: The selection of Full wave, PCT, and dual tone is determined based on the fullness and demand of the music. The space occupied by each tone is not clear, and the tone quality is also different. Mathematical
  • compression: It mainly compresses the sampled timbre (full wave). This method is also lossy compression. It performs downsampling and processing on the timbre to be collected to reduce the size of the collected timbre (the same as voice modification).

Representation of speech ROM space

The speech chip is a visualization of the expression, represented by the length of the speech:

  • Ordinary voice chips use a 6K sampling rate to calculate the voice length, and the maximum sampling rate is 22K.
  • The recording IC calculates the voice length based on the 6K sampling rate.

That is: the length that the chip can play with a 6k sampling rate.

Elements of voice chip

The cost of chips of the same variety is directly proportional to the size of the chip.

  • The allocation of port 1/0 and the size of ROM (voice seconds) determine the chip cost. Low-second voice chips have fewer I/0 ports.
  •  The sound quality is improved, the sampling is improved, and the voice seconds are shortened.

The sound quality is reduced, the sampling is reduced, and the voice seconds are longer.

Calculation method of voice seconds: M/(n*f)

M—ROM size (bit)n*f—Baud rate

Introduction to sound processing software:

  • SoundForge
  • Cooledit
  • goldwave
  • Calewalk

Voice chip classification

Voice chips are divided according to the type of integrated circuits. All integrated circuits related to sound are collectively called voice chips (also called voice ICs, here they should be called voice ICs). However, among the large types of voice chips, they are also divided into voice chips. There are two types: IC (should be called Speech Ic here) and music IC (should be called Music Ic here).

(a) Classification of common voice chips on the market today:

  1.  Short-time chips include 10 seconds, 20 seconds, 40 seconds, 80 seconds, and 170 seconds.
  2. Commonly used modules include: 6 minutes, 8 minutes, 16 minutes, 1 hour, etc.
  3. Common chips include: 3 seconds to 340 seconds

(b) Classification of common music chips on the market today:

Monophonic chip: It is the most basic music IC. It is a single channel of music. The number of notes output at the same time determines the effect of the monophonic chip. There are more than 70, more than 100 notes, etc.

Music channels: 2 channels, 3 channels, 4 channels, 8 channels, 12 channels and more. . .

Control mode: button control, first-line serial port control, second-line serial port control, third-line serial port control, parallel port control, microcontroller control, etc.

(c) The current voice chips are mainly developed and produced in Guangzhou and Shenzhen.

The main voice chips are roughly divided into 20 seconds, 40 seconds, 80 seconds, 170 seconds, etc. Compared with traditional chips, most of these ICs use It is an 8-pin package, making the operation easier.

Application scope

  1. Home appliance industry: induction cookers, rice cookers, refrigerators, washing machines, air conditioners, fans, etc.
  2. Security alarm: reversing radar, forklift alarm, home anti-theft, access control system, etc.
  3. Medical equipment: amblyopia treatment device, blood pressure monitor, ozone therapy device, blood glucose meter, etc.
  4. Advertising media: voice billboards,Mirror advertising machines, welcome machines, advertising machines, etc.
  5. Toy series: speech recognition, cars, rag dolls, etc.
  6. Intelligent transportation: all-in-one card equipment, traffic light reminders, etc.
  7. Transportation: voice electric vehicles, voice bus stations, etc.
  8. Gift cards: greeting cards, birthday cake wishes
  9. Voice chip(3 photos)

10.Ringtones, etc.

Get an Online Quote Today

Your Trusted Partner for PCB Fabrication, Component Sourcing & PCB Assembly