Johannes Kiehl - sbrecog FAQ

> back to qod.de/kiehl ("professional stuff")
> on to jeudi.de ("personal stuff")
Frequently asked questions on SBRecog
Last updated: 03-19-2001
Bill Browne
> 
> I am interested in the speech recognition software you wrote and left on the
> net (cambridge, I think it was).  Have you advanced the system any further?

It's something to get started, but it's old. Don't trust it any further
than you can throw a $3.95 DTMF recognizer. This is the technology.

*****

Liau Chu Yee
> 	Second question is Do you have the paper that you stated in your 
> explanation. I can't find any of it in my library, it will be nice if you 
> can send me all the papers.

Yes I do but would you want to read it in German? I admit: This is 
a flaw of my literature list. I should have tried to track down    
kirstein publications in English. I've tried this for one guy and
was not very successful. 

> 	Thirdly, do you have any references in speech recognition 
> especially the basic one.

Try to get early articles by the IKP in Bonn, if you're interested
in this kind of very primitive recognition. A name to look for would 
be Tillmann. They published a lot, with English abstracts of course,
in Phonetics magazines of 1965-1975. Here is one of the "initial"
texts on the IKP's DAWID speech recognizer:

H. G. Tillmann, G. Heike, H. Schnelle, G. Ungeheuer (1965): DAWID I - 
ein Beitrag zur automatischen 'Spracherkennung'. In: Proc. 5th Int. 
Congr. Acoustics. Ličge 1965.

*****

Juan Jose Suarez
> I would like to ask you, if you have any idea of some program for a more
> precise voice recognition. For example, to distinguish individuals letters
> inside a word. I am thinking in an algorithm that should give a better
> reliability in the process. This as you know means a harder step further
> instead of complete words or composites sounds that are distintives among
> each others. 

So you're pretty familiar with the basics of recognition? But then you must
have heard that going into individual sounds is very hard, if not impossible
to do. They vary too much, they might even be omitted entirely depending on
how "important" a word is in the sentence, aka how much stress it receives. 
There are "phoneme recognizers" on the market (Voice Processing Corp has one).
But this kind of hardware, to my knowledge, does a lot of guesswork (the
fashionable term would be "Hidden Markov Models"), then goes on to check it 
up against "what COULD the speaker possibly have said at this point in 
his/her sentence". So what you do is you leave the area of phonetics and 
enter the fields of grammar, linguistics, semantics. 

I for one would be VERY happy with a good single-word recognizer. With this
and well-structured dialogues in the application you can get almost everywhere
you wnat to.
Just that this level of quality, in my opinion, can never be reached
with a technique as simple as the zero-crossing counts done by Kirstein (and
SBRECOG).

*****

Luca Carozza

> First I cannot open the project (I don't know why) so I included it in the main.
> But I still have one error:
> 
> Compiling 1.CPP:
> Error 1.CPP 202: Cannot convert 'void *' to 'char *'

No idea! I have never used a C++ compiler. What if you replace void * by
char * in the code?
 
> What is "zero crossings of a speech signal" ?

Like in Trigonometry: each time f(x) crosses the x axis, you get a 
zero crossing. With signals, the number of zero crossings during one
second equals the frequency. sbrecog reduces the signal to ONLY zero
crossings, i.e.

 **                       ****    *      (>0)    
*  *    *      becomes        
    *  *                      ****       (<=0)
     **

(this is called clipping). The funny thing about this is: speech 
remains (barely) understandable.

> What do you mean with the expression "It is easy to see why the number of possible interval sizes 
> is reduced to 64 instead of Kirstein's 200" ? What 64?

64 (or 200) possible interval sizes. "interval" means: how many bytes pass
between two zero crossings? In the functions I have asciipainted above,
there are two zero crossings with an interval of 5 bytes in between.
sbrecog counts these intervals, up to a size of 64 bytes (so, if e.g. 122
bytes "pass" between a zero crossing and the next one, this would be counted
as "64 bytes or greater"). You have to understand that these interval sizes
reflect the signal frequency. If a signal has been sampled 11000 times per
second, and it crosses the x axis only once every 64 samples, its frequency
equals 11000/64/2 Hz == 85.9 Hz. 

A bit of phonetics: The basic frequency of the human voice is normaly
between 60 Hz (male, deep) and 240 Hz (female, high). The resonance 
frequencies added by the "hollow" space above the vocal chords start at 200 
Hz and go up to 4000 Hz. Those are called "formant frequencies", and that's
where the interesting stuff for voice recognition happens. The formants make
an /a/ sound different from a /o/ while the basic frequency depends on the
speaker.

That's why 64 byte intervals are "low enough". The original implementation
used a higher sampling frequency, so bigger intervals made sense.

> "My main interest was voice recognition in the telephone network..."
> What did you make?

I toyed with an "intelligent" answering machine that recognized
callers by their voices and had a simple dialogue with them ("do 
you want me to tell you a joke? Please say yes or no"). The result
was not extremely satisfaying.

> Could you advice me some Internet places abut that argument?
> Or some books? I am really interested about it.
> I am studying at the Milan's university of information. I am just at the fisrt year but I hope to 
> know more about computer sound.

Try to find Tony Robinson's WWW site. He's been moderating the comp.speech
newsgroup for years, and he collected a lot of interesting stuff on the
ftp server of his institute.

*****

Many users ask this one:
> Where can I find the direct.c program, so I can learn the code, and make a
> few changes if necessary. Which part of the Blast software. I've search
> in the www, and all I got is a confusion. Cause there is so much about
> Blast software, but there is nothing about direct.c program.

1. blast was today, April 13 2000, still at this address:

: File: Nov 13  1992 sb.libraries
: ---------              ----------            -----------
: BLAST13.ZOO       garbo.uwasa.fi               pc/sb

  The filename has been changed to BLAST13.ZOO. Direct.C is 
  part of the archive.

2. I'm sorry not to be able to name any -- but there is certainly much
   better software for soundcard programming out there by now. BLAST
   is just very, very outdated!
Contact me here: