语音信号处理之时域分析-音量及其Python实现

1.概述（Introduction）

本系列文主要介绍语音信号时域的4个基本特征及其Python实现，这4个基本特征是：
(1)音量（Volume）；
(2)过零率（Zero-Crossing-Rate）；
(3)音高（Pitch）；
(4)音色（Timbre）。

2.音量（Volume）

音量代表声音的强度，可由一个窗口或一帧内信号振幅的大小来衡量，一般有两种度量方法：
（1）每个帧的振幅的绝对值的总和：

$volume = \sum_{i=1}^{n}|s_{i}|$. 其中$s_{i}$为第该帧的$i$个采样点，$n$为该帧总的采样点数。这种度量方法的计算量小，但不太符合人的听觉感受。
（2）幅值平方和的常数对数的10倍： $volume = 10 * log_{10}\sum_{i=1}^{n}s_{i}^{2}$. 它的单位是分贝（Decibels），是一个对数强度值，比较符合人耳对声音大小的感觉，但计算量稍复杂。

音量计算的Python实现如下：

import math
import numpy as np

# method 1: absSum
def calVolume(waveData, frameSize, overLap):
    wlen = len(waveData)
    step = frameSize - overLap
    frameNum = int(math.ceil(wlen*1.0/step))
    volume = np.zeros((frameNum,1))
    for i in range(frameNum):
        curFrame = waveData[np.arange(i*step,min(i*step+frameSize,wlen))]
        curFrame = curFrame - np.median(curFrame) # zero-justified
        volume[i] = np.sum(np.abs(curFrame))
    return volume

# method 2: 10 times log10 of square sum
def calVolumeDB(waveData, frameSize, overLap):
    wlen = len(waveData)
    step = frameSize - overLap
    frameNum = int(math.ceil(wlen*1.0/step))
    volume = np.zeros((frameNum,1))
    for i in range(frameNum):
        curFrame = waveData[np.arange(i*step,min(i*step+frameSize,wlen))]
        curFrame = curFrame - np.mean(curFrame) # zero-justified
        volume[i] = 10*np.log10(np.sum(curFrame*curFrame))
    return volume

对于给定语音文件aeiou.wav，利用上面的函数计算音量曲线的代码如下：

import wave
import pylab as pl
import numpy as np
import Volume as vp

# ============ test the algorithm =============
# read wave file and get parameters.
fw = wave.open('aeiou.wav','r')
params = fw.getparams()
print(params)
nchannels, sampwidth, framerate, nframes = params[:4]
strData = fw.readframes(nframes)
waveData = np.fromstring(strData, dtype=np.int16)
waveData = waveData*1.0/max(abs(waveData))  # normalization
fw.close()

# calculate volume
frameSize = 256
overLap = 128
volume11 = vp.calVolume(waveData,frameSize,overLap)
volume12 = vp.calVolumeDB(waveData,frameSize,overLap)

# plot the wave
time = np.arange(0, nframes)*(1.0/framerate)
time2 = np.arange(0, len(volume11))*(frameSize-overLap)*1.0/framerate
pl.subplot(311)
pl.plot(time, waveData)
pl.ylabel("Amplitude")
pl.subplot(312)
pl.plot(time2, volume11)
pl.ylabel("absSum")
pl.subplot(313)
pl.plot(time2, volume12, c="g")
pl.ylabel("Decibel(dB)")
pl.xlabel("time (seconds)")
pl.show()

运行以上程序得到下图：

参考（References）

[1]Volume (音量):http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)
[2]用Python做科学计算-声音的输入输出:http://hyry.dip.jp:8000/pydoc/wave_pyaudio.html