曦曦のblog

实验:建立五个词的识别系统(步骤以及结果显示)

by tracylling on 四.14, 2010, under 语音识别

(PS:只作为尝试,不保证正确)

系统简介:

HTK是隐马尔可夫模型工具箱,由剑桥大学工程系研发而成。该工具箱的目的是搭建使用隐马尔可夫模型。

详见:http://htk.eng.cam.ac.uk/

搭建步骤:

a)      训练库创建:词汇集中的每个元素进行多次录制,且对相应词汇做好标签;

b)      声学分析:将波形数据文件转换为一系列系数向量;

c)      模型定义:为总词汇集中的每个元素定义一个HMM原型;

d)      模型训练:使用训练数据对每个HMM模型进行初始化、训练;

e)      任务定义:识别系统的语法(什么可被识别)的定义;

f)       未知输入信号识别;

g)      评估:识别系统的性能可通过测试数据进行评估。

工作环境构建:

创建如下目录结构:

a)      data/:存储训练和测试数据(语音信号、标签等等) ,包括2个子目录,data/train/和 data/test/,用来区分识别系统的训练数据和评估数据;

b)      analysis/:存储声学分析步骤的文件;

c)      training/:存储初始化和训练步骤的相关文件;

d)      model/:存储识别系统的模型(HMMs)的相关文件;

e)      def/:存储任务定义的相关文件;

f)       test/:存储测试相关文件。

后期要建立的几个文件:analysis.conf   targetlist.txt  hmmlist.txt   trainlist.txt

过程:

1、  建立训练资料

a. 录制音频
HSLab name.sig
b. 标记信号
在HSLab中标记信号位置

2、声学分析

a. 配置参数(analysis.conf)
#
# Example of an acoustical analysis configuration file
#
SOURCEFORMAT = HTK # Gives the format of the speech files
TARGETKIND = MFCC_0_D_A # Identifier of the coefficients to use
# Unit = 0.1 micro-second :
WINDOWSIZE = 250000.0 # = 25 ms = length of a time frame
TARGETRATE = 100000.0 # = 10 ms = frame periodicity
NUMCEPS = 12 # Number of MFCC coeffs (here from c1 to c12)
USEHAMMING = T # Use of Hamming function for windowing frames
PREEMCOEF = 0.97 # Pre-emphasis coefficient
NUMCHANS = 26 # Number of filterbank channels
CEPLIFTER = 22 # Length of cepstral liftering
# The End
b. 源目标列表(targetlist.txt)
data/train/sig/name.sig data/train/mfcc/name.mfcc
etc...

c. 使用HCopy进行声学分析
>>>HCopy -A -D -C analysis.conf -S targetlist.txt

HCopy -A -D -C analysis.conf -S targetlist.txt

HTK Configuration Parameters[9]

Module/Tool     Parameter                  Value

#                 CEPLIFTER                     22

#                 NUMCHANS                      26

#                 PREEMCOEF               0.970000

#                 USEHAMMING                  TRUE

#                 NUMCEPS                       12

#                 TARGETRATE         100000.000000

#                 WINDOWSIZE         250000.000000

#                 TARGETKIND            MFCC_0_D_A

#                 SOURCEFORMAT                 HTK

HTK Configuration Parameters[9]

Module/Tool     Parameter                  Value

CEPLIFTER                     22

NUMCHANS                      26

PREEMCOEF               0.970000

USEHAMMING                  TRUE

NUMCEPS                       12

TARGETRATE         100000.000000

WINDOWSIZE         250000.000000

TARGETKIND            MFCC_0_D_A

SOURCEFORMAT                 HTK

3. 定义模型
~o <VecSize> 39 <MFCC_0_D_A>
~h "lable"
<BeginHMM>
<NumStates> 6
<State> 2
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<State> 3
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<State> 4
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<State> 5
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<TransP> 6
0.0 0.5 0.5 0.0 0.0 0.0
0.0 0.4 0.3 0.3 0.0 0.0
0.0 0.0 0.4 0.3 0.3 0.0
0.0 0.0 0.0 0.4 0.3 0.3
0.0 0.0 0.0 0.0 0.5 0.5
0.0 0.0 0.0 0.0 0.0 0.0
<EndHMM>

4. 模型训练
a. 初始化
>HInit -A -D -T 1 -S trainlist.txt -M model/hmm0 –H  model/proto/hmm_name –l name -L data/train/lab/ name

……

No HTK Configuration Parameters Set

Initialising  HMM name . . .

States   :   2  3  4  5 (width)

Mixes  s1:   1  1  1  1 ( 39  )

Num Using:   0  0  0  0

Parm Kind:  MFCC_D_A_0

Number of owners = 1

SegLab   : name

maxIter  :  20

epsilon  :  0.000100

minSeg   :  3

Updating :  Means Variances MixWeights/DProbs TransProbs

- system is PLAIN

10 Observation Sequences Loaded

Starting Estimation Process

Iteration 1: Average LogP = -4510.80420

Iteration 2: Average LogP = -4420.17188  Change =    90.63232

Iteration 3: Average LogP = -4408.49854  Change =    11.67334

Iteration 4: Average LogP = -4403.09863  Change =     5.39990

Iteration 5: Average LogP = -4400.53662  Change =     2.56201

Iteration 6: Average LogP = -4398.79785  Change =     1.73877

Iteration 7: Average LogP = -4398.36572  Change =     0.43213

Iteration 8: Average LogP = -4398.14648  Change =     0.21924

Iteration 9: Average LogP = -4397.99072  Change =     0.15576

Iteration 10: Average LogP = -4397.79785  Change =     0.19287

Iteration 11: Average LogP = -4397.42090  Change =     0.37695

Iteration 12: Average LogP = -4397.29492  Change =     0.12598

Iteration 13: Average LogP = -4397.29492  Change =     0.00000

Estimation converged at iteration 14

Output written to directory model/hmm0

No HTK Configuration Parameters Set

b. 训练

HRest 迭代(即当前再估计迭代中的迭代)显示在屏幕上,通过 change量度标示收敛性。一旦这个量度不再从一个 HRest迭代到下个迭代减少(绝对值),过程就该停止了。
???Questions: HRest训练如何选择最优收敛模型以及如何确定迭代次数

实验迭代12次后仍旧不能很好地收敛,怀疑是和HMM的模型定义有关系。

>>>HRest -A -D -T 1 -S trainlist.txt -M model/hmm1 -H model/hmm0/hmm_name -l name -L data/train/lab/ name

……

No HTK Configuration Parameters Set

Reestimating HMM name . . .

States   :   2  3  4  5 (width)

Mixes  s1:   1  1  1  1 ( 39  )

Num Using:   0  0  0  0

Parm Kind:  MFCC_D_A_0

Number of owners = 1

SegLab   :  name

MaxIter  :  20

Epsilon  :  0.000100

Updating :  Transitions Means Variances

- system is PLAIN

10 Examples loaded, Max length = 69, Min length = 43

Ave LogProb at iter 1 = -4397.05420 using 10 examples

Ave LogProb at iter 2 = -4396.95020 using 10 examples  change =    0.10400

Ave LogProb at iter 3 = -4396.83057 using 10 examples  change =    0.11963

Ave LogProb at iter 4 = -4396.80225 using 10 examples  change =    0.02832

Ave LogProb at iter 5 = -4396.80127 using 10 examples  change =    0.00098

Ave LogProb at iter 6 = -4396.80029 using 10 examples  change =    0.00098

Ave LogProb at iter 7 = -4396.80078 using 10 examples  change =   -0.00049

Ave LogProb at iter 8 = -4396.79980 using 10 examples  change =    0.00098

Ave LogProb at iter 9 = -4396.79980 using 10 examples  change =    0.00000

Estimation converged at iteration 9

No HTK Configuration Parameters Set

5. 定义任务
a. 语法(gram.txt)
/*
* Task grammar
*/
$WORD = NAME1|NAME2|……;
( { START_SIL } [ $WORD ] { END_SIL } )

QQQ:为了可以识别连续的多个词需要修改语法网,不知如何改,所以每次只能识别出一个词。
b. 字典(dict.txt)
NAME  [name]  name

……
START_SIL [sil] sil
END_SIL [sil] sil

c. 使用HParse和HSGen建立状态网络
HParse -A -D -T 1 def/gram.txt def/net.slf
HSGen -A -D -n 10 -s def/net.slf def/dict.txt

!! dict.txt的文件末尾一定要添一个换行符!!!
6. 识别未知信号,使用HVite
>>>HSLab test.sig

>>>HCopy -A -D -C analysis.conf -S test_targetlist.txt

HCopy -A -D -C analysis.conf -S test_targetlist.txt

HTK Configuration Parameters[9]

Module/Tool     Parameter                  Value

#                 CEPLIFTER                     22

#                 NUMCHANS                      26

#                 PREEMCOEF               0.970000

#                 USEHAMMING                  TRUE

#                 NUMCEPS                       12

#                 TARGETRATE         100000.000000

#                 WINDOWSIZE         250000.000000

#                 TARGETKIND            MFCC_0_D_A

#                 SOURCEFORMAT                 HTK

HTK Configuration Parameters[9]

Module/Tool     Parameter                  Value

CEPLIFTER                     22

NUMCHANS                      26

PREEMCOEF               0.970000

USEHAMMING                  TRUE

NUMCEPS                       12

TARGETRATE         100000.000000

WINDOWSIZE         250000.000000

TARGETKIND            MFCC_0_D_A

SOURCEFORMAT                 HTK

D:\My_Graduation_Project\Demo>HVite -A -D -T 1 -H model/hmm6/hmm_name1 -H model

/hmm6/hmm_name2 …… -i reco_test_4.mlf -w def/net.slf def/dict.txt hmmlist.txt data/test/test.mfcc

……

No HTK Configuration Parameters Set

Read 6 physical / 6 logical HMMs

Read lattice with 11 nodes / 18 arcs

Created network with 20 nodes / 27 links

File: data/test/test_4.mfcc

START_SIL TAIWAN END_SIL END_SIL END_SIL END_SIL  ==  [83 frames] -93.5447 [Ac=-

7764.2 LM=0.0] (Act=17.7)

No HTK Configuration Parameters Set


:, ,

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...

    Archives

    All entries, chronologically...