Hi I am working on a project about noise suppression. Part of the algorithm that I am designing is based on vector quantization. I am wondering which training algorithm I should use? Here is an example: If I have a 3D-vector (x,y,z) and each coordinate of this vector is random but limited to an interval from say 0 to C where C is a constant, the directional variance will not be more or less the same when choosing arbitrary directions. This is a problem if I know that the "true" centroid (codeword) is located on the edge of the 3D-box. If I use the k-means algorithm to calculate the centroids I won't get the optimal result, because the mass of points within the box will "pull" on the centroids while there is nothing outside the box to "pull" in the other direction.....So I am wondering how do I solve this problem? Thank you.

# vector quantization problem/question

Started by ●December 1, 2005

Reply by ●December 1, 20052005-12-01

"John" <joehatesspam@nospam.spamshit> writes:> I am working on a project about noise suppression. Part of the algorithm > that I am designing is based on vector quantization. I am wondering which > training algorithm I should use? > > Here is an example: > > If I have a 3D-vector (x,y,z) and each coordinate of this vector is random > but limited to an interval from say 0 to C where C is a constant, the > directional variance will not be more or less the same when choosing > arbitrary directions.Careful! Unless you're choosing uniformly distributed _angles_ the directions you choose will not be uniform (arbitrary). The way you've described it, the directions of the vectors will not be arbitrary (for a start, they're all in the positive [x>0, y>0, z>0] quadrant).> This is a problem if I know that the "true" centroid (codeword) is located > on the edge of the 3D-box. If I use the k-means algorithm to calculate the > centroids I won't get the optimal result, because the mass of points within > the box will "pull" on the centroids while there is nothing outside the box > to "pull" in the other direction.....So I am wondering how do I solve this > problem?If you know something about the distribution of the way the points are pulled away from the true mean, they you might be able to use that information. Ciao, Peter K.

Reply by ●December 1, 20052005-12-01

Hi thanks for the answer... I have uploaded the estimated probability density function for each coefficient in the 10 dimensional observation vector. The coefficients are LSF-coefficients and as you can see the distribution neither looks like a gaussian or uniform distribution...Is it possible to find the most probable vectors based on the estimated distributions? Based on the peaks in the 10 distributions I would say that the most probable vector is [0.29,0.58,0.86,1.14,1.43,1.7,2,2.28,2.57,2.85] but the time-information is lost when making these probability density functions....so how would I know that the coefficients of the vector occur at the same time ? Lots of questions.... :o) hope you can help me..... Here is the matlab-figure: http://users.cybercity.dk/~dsl159353/pdfLSF.fig Or the JPG-equivalent if you want to see that: http://users.cybercity.dk/~dsl159353/pdfLSF.jpg Thank you...

Reply by ●December 1, 20052005-12-01

John wrote:> > thanks for the answer... >You're welcome.> I have uploaded the estimated probability density function for each > coefficient in the 10 dimensional observation vector. The coefficients are > LSF-coefficients and as you can see the distribution neither looks like a > gaussian or uniform distribution...Is it possible to find the most probable > vectors based on the estimated distributions?As you say below, the mode (or some other measure of central tendancy) might be a useful measure. However, generally it doesn't contain much information. One way to proceed would be to remove (subtract) it from the coefficients you have and look at the variability remaining. However, this doesn't tackle the main problem: that the LSFs will vary over time, and you probably want to figure out what that variability is.> Based on the peaks in the 10 distributions I would say that the most > probable vector is > > [0.29,0.58,0.86,1.14,1.43,1.7,2,2.28,2.57,2.85] >That vector is just the mode (a form of average, though not the usual one http://en.wikipedia.org/wiki/Average).> but the time-information is lost when making these probability density > functions....so how would I know that the coefficients of the vector occur > at the same time ?Well, you wouldn't. :-) The only way to see this is to partition the data you have and only look at the LSFs in a particular time period (in the same way you are now; the matlab plots below) and see if that tells you anything.> Lots of questions.... :o) hope you can help me..... > > Here is the matlab-figure: > > http://users.cybercity.dk/~dsl159353/pdfLSF.fig > > Or the JPG-equivalent if you want to see that: > > http://users.cybercity.dk/~dsl159353/pdfLSF.jpg > > Thank you...No worres. Keep asking. Ciao, Peter K.

Reply by ●December 1, 20052005-12-01

Hi again Peter... I think I get what you are saying....you are saying that I should "plot" a time-dependent mode-vector, right?? If the time-window is narrow enough then statistics performed on a sliding time-window will reveal what the state-vectors look like and it will also show state-changes, right? Thanks again... ---------------------> The only way to see this is to partition the data you have and only > look at the LSFs in a particular time period (in the same way you are > now; the matlab plots below) and see if that tells you anything.

Reply by ●December 1, 20052005-12-01