How can I display all CPT of a DBN in the Bayes Net Toolbox (MATLAB  BNT)? 
In the Bayes Net Toolbox, what is called Dynamic Bayesian Network is in
fact just a Temporal Bayesian Network in which we can specify a different
structure for the first timeslice:
http://bnt.googlecode.com/svn/trunk/docs/usage_dbn.html :
Note that "temporal Bayesian network" would be a better name than
"dynamic Bayesian network", since it is assumed that the model
structure does not change, but the term DBN has become entrenched. We
also normally assume that the parameters do not change, i.e., the
model is timeinvariant. However, we can always add extra hidden nodes
to represent the current "regime", thereby creating mixtures of models
to capture periodic nonstationarities.
[...]
To specify a DBN, we need to define the intraslice topology (within
a slice),

Bayes Network for classification in Matlab (BNT) 
The conditional probability table (CPT) for 'class' should have 8 (2*2*2)
elements in this case. The posterior output (marg.T) of the inference
engine seems right for a binary variable.
It reads as: "with 0.8 probability the 'class' node is in state 1, and with
0.2 probability it is in state 2". From this point on, it is up to the user
to decide whether to appoint 'class' to state 1 or 2.
When it comes to classification, in the simplest (and not very advisable)
case, you can define a posterior probability threshold of 0.5 and say:
if P(class=1)> 0.5
class = 1
else
class = 2
end
In assessing the performance of your binary classification, you can look
into predictive accuracy or Area Under the ROC curve (AUC) or do more
intelligent things that take into account the prior probabilitie

gaussian mixture model probability matlab 
In one dimension, the maximum value of the pdf of the Gaussian distribution
is 1/sqrt(2*PI). So in 50 dimensions, the maximum value is going to be
1/(sqrt(2*PI)^50) which is around 1E20. So the values of the pdf are all
going to be of that order of magnitude, or smaller.

How to get the standard deviation from gaussian fitted curve in Matlab 
The output of fy says that you are fitting a model that consist of a linear
combination of two Gaussian functions. The functional form of the model is:
fy(x) = a1*exp(((xb1)/c1)^2) + a2*exp(((xb2)/c2)^2)
Remembering that a Gaussian is defined as:
f(x) = exp((xx0)^2/(2*s^2)) where: x0 is the mean, s is the std.dev.
then the standard deviation of each Gaussian in your model can be computed
as (respectively):
s1 = c1/sqrt(2)
s2 = c2/sqrt(2)
See http://en.wikipedia.org/wiki/Gaussian_function for more infomation.

Randomly generating numbers within a fixed nonGaussian distribution in matlab 
Not quite sure what you are asking precisely, but I guess you could take a
look at the random() function in the statistics toolbox:
>> help random
RANDOM Generate random arrays from a specified distribution.
R = RANDOM(NAME,A) returns an array of random numbers chosen from the
oneparameter probability distribution specified by NAME with
parameter
values A.
R = RANDOM(NAME,A,B) or R = RANDOM(NAME,A,B,C) returns an array of random
numbers chosen from a two or threeparameter probability distribution
with parameter values A, B (and C).
The size of R is the common size of the input arguments. A scalar input
functions as a constant matrix of the same size as the other inputs.
R = RANDOM(NAME,A,M,N,...) or R = RANDOM(NAME,A,[M,N,...]) returns an
MbyNby...

Naive Bayes probability always 1 
"...the probability outputs from predict_proba are not to be taken too
seriously"
I'm the guy who wrote that. The point is that naive Bayes tends to predict
probabilities that are almost always either very close to zero or very
close to one; exactly the behavior you observe. Logistic regression
(sklearn.linear_model.LogisticRegression or
sklearn.linear_model.SGDClassifier(loss="log")) produces more realistic
probabilities.
The resulting GaussianNB object is very big (~300MB), and prediction is
rather slow: around 1 second for one text.
That's because GaussianNB is a nonlinear model and does not support sparse
matrices (which you found out already, since you're using toarray). Use
MultinomialNB, BernoulliNB or logistic regression, which are much faster at
predict time and also s

Bayes network classification 
Assuming all variables you mention are categorical and the edge directions
are from up to down:
Priors:
In the first Naive Bayes example, the conditional probability table (CPT)
of 'class' consists solely of its prior distribution because it is a root
node, i.e. does not have any parents. If 'class' can take on 2 states (e.g.
black and white), its CPT will consist of 2 values.
In the second Bayesian Network (BN) example, the CPT of 'class' is
dependent on 'cause1' and 'consequence'. Lets say 'consequence' has 3
states, 'cause1' has 4 states and as before 'class' has 2 states. In this
case, the CPT of 'class' would contain 3*4*2 values. When you are learning
this CPT, you can incorporate your prior beliefs as a dirichlet
distribution (if all variables are categorical). For an example of

Bayes Learning  MAP hypotesis 
Such question should be asked (and now probably migrated) on the
math.stackexchange.com or stats.stackexchange.com .
Your question is basic application of the Bayes Theorem
P(Y=0h1)P(h1) 0.2*0.2 0.04
P(h1Y=0) =  =  = 
P(Y=0) P(Y=0) P(Y=0)
P(Y=0h2)P(h2) 0.3*0.4 0.12
P(h2Y=0) =  =  = 
P(Y=0) P(Y=0) P(Y=0)
So the h2 is the more probable hypothesis, as P(Y=0)>0

Model in Naive Bayes 
Naive Bayes constructs estimations of conditional probabilities
P(f_1,...,f_nC_j), where f_i are features and C_j are classes, which,
using bayes rule and estimation of priors (P(C_j)) and evidence (P(f_i))
can be translated into x=P(C_jf_1,...,f_n), which can be roughly read as
"Given features f_i I think, that their describe object of class C_j and
my certainty is x". In fact, NB assumes that festures are independent, and
so it actualy uses simple propabilities in form of x=P(f_iC_j), so "given
f_i I think that it is C_j with probability x".
So the form of the model is set of probabilities:
Conditional probabilities P(f_iC_j) for each feature f_i and each class
C_j
priors P(C_j) for each class
KNN on the other hand is something completely different. It actually is not
a "learne

naive bayes classifier error in r 
Seems like building of the model failing (and as a result the classifier is
not constructed). Without looking at your data, my best guess would be that
you have incomplete cases.
You could try removing cases with missing data using complete.cases as
follows.
d <
read.table("Modeling_Data.txt",header=FALSE,sep=" ",comment.char="",quote="")
# remove incomplete cases
d[complete.cases(d),]
# divide into training and test data 70:30
trainingIndex < createDataPartition(d$V32, p=.7, list=F)

Calculation of probabilities in Naive Bayes in C# 
This code is called over and over again for each particular word w in the
text (e.g. tweet) being analyzed. All the variables are conditional
probabilities estimated using frequencies.
bw is the probability that the word w is seen given that the word is a
category 1 text
gw is the probability that the word w is seen given that the word is a
category 2 text
pw rescales the probability bw so that rarely seen words are on a similar
scale to frequently seen words (mathematically, the division indicates that
pw is a conditional probability)
fw simply shifts the scale so that pw can't be zero (or one). So if, for
example, pw=0 and n=10, fw = ((1 * 0.5) + (10 * 0)) / (1 + 10) = 0.045. (In
general, a good way to understand this code is to play around with some
different numbers and see what ha

Understanding this application of a Naive Bayes Classifier 
First of all, the formula
P(Terrorism  W) = P(Terrorism) x P(kill  Terrorism) x P(bomb  Terrorism)
x P(kidnap  Terrorism) x P(music  Terrorism) x P(movie  Terrorism) x
P(TV  Terrorism)
isn't quite right. You need to divide that by P(W). But you hint that this
is taken care of later when it says that "they do a few sums", so we can
move on to your main question.
Traditionally when doing Naive Bayes on text classification, you only look
at the existence of words, not their counts. Of course you need the counts
to estimate P(word  class) at train time, but at test time P("music" 
Terrorism) typically means the probability that the word "music" is present
at least once in a Terrorism document.
It looks like what the implementation you are dealing with is doing is it's
trying to

Normal Bayes Classifer Negative Samples 
First of all, you need to decide whether you are going to do recognition.
Recognition and detection are different processes.
are you going to have 3 systems detecting cars, trucks and animals
respectively
or are you going to have 1 system detecting all of these, but also
classifying somehow with a recognition step.
Second, "animal" detection is a hard process, where "cat" detection is
easier. Please narrow down your range and make the positives similar. Check
this link for a similar problem.
Third, as you already noticed, you actually need more negatives than
positives for a proper training.

Naive Bayes Classification for Categorical Data 
because labels of your dataset are in numeric format, R decide to use
regression instead of classification.
change labels of data set to characters instead of numbers. so R will not
confuse.

Document Classification using Naive Bayes classifier 
A lot of this gets into how good "accuracy" is as a measure of performance,
and that depends on your problem. If misclassifying "A" as "B" is just as
bad/ok as misclassifying "B" as "A", then there is little reason to do
anything other than just mark everything as "A", since you know it will
reliably get you a 98% accuracy (so long as that unbalanced distribution is
representative of the true distribution).
Without knowing your problem (and if accuracy is the measure you should
use), the best answer I could give is "it depends on the data set". It is
possible that you could get past 99% accuracy with standard naive bays,
though it may be unlikely. For Naive Bayes in particular, one thing you
could do is to disable the use of priors (the prior is essentially the
proportion of each class).

How to improve the accuracy of a Naive Bayes Classifier? 
This appears to be the classic problem of "overfitting"... where you get a
very high % accuracy on the training set, but a low % in real situations.
You probably need more training instances. Also, there is the possibility
that the 26 categories don't correlate to the features you have. Machine
Learning isn't magical and needs some sort of statistical relationship
between the variables and the outcomes. Effectively, what NBC might be
doing here is effectively "memorizing" the training set, which is
completely useless for questions outside of memory.

how to Load CSV Data in scikit and using it for Naive Bayes Classification 
The following should get you started you will need pandas and numpy. You
can load your .csv into a data frame and use that to input into the model.
You all so need to define targets (0 for negatives and 1 for positives,
assuming binary classification) depending on what you are trying to
separate.
from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np
# create data frame containing your data, each column can be accessed # by
df['column name']
df = pd.read_csv('/your/path/yourFile.csv')
target_names = np.array(['Positives','Negatives'])
# add columns to your data frame
df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.75
df['Type'] = pd.Factor(targets, target_names)
df['Targets'] = targets
# define training and test sets
train = df[df['is_train']

ChoiceModelR  Hierarchical Bayes Multinomial Logit Model 
I know that this may not be helpful since you posted so long ago, but if it
comes up again in the future, this could prove useful.
One of the most common reasons for this error (in my experience) has been
that either the scenario variable or the alternative variable is not in
ascending order within your data.
id scenario alt x1 ... y
1 1 1 4 1
1 1 2 1 0
1 3 1 4 2
1 3 2 5 0
2 1 4 3 1
2 1 5 1 0
2 2 1 4 2
2 2 2 3 0
This dataset will give you errors since the scenario and alternative
variables must be ascending, and they must not skip any values.

naive bayes, 15 features on 35instances dataset  7 classes 
naive Bayes works well when your features don't effect each other...
but it works very well when you have small dataset
on the other hand you can try logistic regression or SVM
but in real World scenarios your algorithm doesn't matter that much , your
Dataset(instances,features) is more important

Classifying Multinomial Naive Bayes Classifier with Python Example 
The original code trains on the first 100 examples of positive and negative
and then classifies the remainder. You have removed the boundary and used
each example in both the training and classification phase, in other words,
you have duplicated features. To fix this, split the data set into two
sets, train and test.
The confusion matrix is higher (or different) because you are training on
different data.
The confusion matrix is a measure of accuracy and shows the number of false
positives etc. Read more here:
http://en.wikipedia.org/wiki/Confusion_matrix

Naive Bayes Classifier in e1071 package [R]  Editing Data 
Just figured it out
Essentially, the classifier is a set of 4 values, the apriori
probabilities, the mean and standard deviations of each of the
probabilities, the different classes, and the original call.
Each of those values is a nested list with one item, and if you keep on
delving into the individual lists you can get at the individual items,
including the individual probability matrices, and work from there. The
first value of each is the mean, and the second is the standard deviation.
From there you can pull whatever data you want, and edit to your heart's
extent.

Sklearn naive bayes classifier for data belonging to the same class 
The naive Bayes classifier classifies each input individually (not as a
group). If you know that all of the inputs belong to the same (but unknown)
class, then you need to do some additional work to get your result. One
option is to select the class with the greatest count in the result from
clf.predict but that might not work well if you are only have two instances
in the group.
Another option would be to call predict_proba for the GaussianNB
classifier, which will return the probabilities of all classes for each of
the inputs. You can then use the individual probabilities (e.g., you could
just sum them for each class) to decide how you want to classify the group.
You could even combine the two approaches  Use predict and select the
class with the highest count but use predict_proba to

R: Naives Bayes classifier bases decision only on apriori probabilities 
It looks like you trained the model using whole sentences as inputs, while
it seems that you want to use words as your input features.
Usage:
## S3 method for class 'formula'
naiveBayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)
## Default S3 method:
naiveBayes(x, y, laplace = 0, ...)
## S3 method for class 'naiveBayes'
predict(object, newdata,
type = c("class", "raw"), threshold = 0.001, ...)
Arguments:
x: A numeric matrix, or a data frame of categorical and/or
numeric variables.
y: Class vector.
In particular, if you train naiveBayes this way:
x < c("john likes cake", "marry likes cats and john")
y < as.factor(c("good", "bad"))
bayes<naiveBayes( x,y )
you get a classifier able to recognize just these two sentences:
Naive Ba

scikit learn use multinomial naive bayes for a trigram classifier? 
CountVectorizer will extract trigrams for you (using ngram_range=(3, 3)).
The text feature extraction documentation introduces this. Then, just use
MultinomialNB exactly like before with the transformed feature matrix.
Note that this is actually modeling:
P(document  label) = P(wordX, wordX1, wordX2  label) * P(wordX1,
wordX2, wordX3  label) * ...
How different is that? Well, that first term can be written as
P(wordX, wordX1, wordX2  label) = P(wordX  wordX1, wordX2, label) *
P(wordX1, wordX2  label)
Of course, all the other terms can be written that way too, so you end up
with (dropping the subscripts and the conditioning on the label for
brevity):
P(X  X1, X2) P(X1  X2, X3) ... P(3  2, 1) P(X1, X2) P(X2, X3)
... P(2, 1)
Now, P(X1, X2) can be written

Predicting Classifications with Naive Bayes and dealing with Features/Words not in the training set 
From a practical perspective (keeping in mind this is not all you're
asking), I'd suggest the following framework:
Train a model using an initial train set, and start using it for
classificaion
Whenever a new word (with respect to your current model) appears, use some
smoothing method to account for it. e.g. Laplace smoothing, as suggested in
the question, might be a good start.
Periodically retrain your model using new data (usually in addition to the
original train set), to account for changes in the problem domain, e.g. new
terms. This can be done on preset intervals, e.g once a month; after some
number of unknown words was encountered, or in an online manner, i.e. after
each input document.
This retrain step can be done manually, e.g. collect all documents
containing unknown terms

Naive Bayes Classifier, Explain model fitting and prediction algorithms 
Please see below. If you want more details of the mathematics involved you
might be better off posting on crossvalidated.
Could someone outline why the logsumexp trick is/needs to be done?
This is for numerical stability. If you search for "logsumexp" you will see
several useful explanations. E.g.,
https://hips.seas.harvard.edu/blog/2013/01/09/computinglogsumexp, and
logsumexp trick why not recursive. Essentially, the procedure avoids
numerical error that can occur with numbers that are too big / too small.
specifically what the argument Li,: reads as
The i means take the ith row, and the : means take all values from that
row. So, overall, Li,: means the ith row of L. The colon : is used in
Matlab (and its open source derivative Octave) to mean "all indices" when
subscr

How to use WEKA Machine Learning for a Bayes Neural Network and J48 Decision Tree 
Here is one way to do it with the commandline. This information is found
in Chapter 1 ("A commandline primer") of the Weka manual that comes with
the software.
java weka.classifiers.trees.J48 t training_data.arff T test_data.arff p
1N
where:
t <training_data.arff> specifies the training data in ARFF format
T <test_data.arff> specifies the test data in ARFF format
p 1N specifies that you want to output the feature vector and the
prediction,
where N is the number of features in your feature vector.
For example, here I am using soybean.arff for both training and testing.
There are 35 features in the feature vector:
java weka.classifiers.trees.J48 t soybean.arff T soybean.arff p 135
The first few lines of the output look like:
=== Predictions on test dat

Save and Load testing classify Naive Bayes Classifier in NLTK in another method 
I don't have the environment setup to test out your code, but I have the
feeling it's not right in the part where you save/load the pickle.
Referring to the Storing Taggers section of the NLTK book, I would change
your code and do it like this:
def save_classifier(classifier):
f = open('my_classifier.pickle', 'wb')
pickle.dump(classifier, f, 1)
f.close()
def load_classifier():
f = open('my_classifier.pickle', 'rb')
classifier = pickle.load(f)
f.close()
return classifier
Hope it helps.


Blur.js is probably what you are looking for!
http://blurjs.com/
It works for moving DIVs as well, which satisfies your presupposition!
Have a look at the draggable example
$('.target').blurjs({
draggable: true,
overlay: 'rgba(255,255,255,0.4)'
});

gaussian binning of data 
It sounds like you have a single measurement at a single point (X=100,
e=5000), and also know the value of the FWHM (FWHM = 4).
If this is indeed the case, you can compute the standard deviation sigma
like so:
sigma = FWHM/ 2/sqrt(2*log(2));
and you can make bins like so:
[N, binCtr] = hist(sigma*randn(e,1) + X, Nbins);
where N is the amount of electrons in each bin, binCtr are the bin centers,
Nbins is the amount of bins you want to use.
If the number of electrons gets large, you could run out of memory. In such
cases, it's better to do the same thing but in smaller batches, like so:
% Example data
FWHM = 4;
e = 100000;
X = 100;
Nbins = 100;
% your std. dev.
sigma = FWHM/ 2/sqrt(2*log(2));
% Find where to start with the bin edges. That is, find the point
% where the PDF i

Gaussian fit to plot, plotted using bar(x,y) 
A trick, which however does NOT recover the original raw data (therefore
the fit suffers also from the approximations introduced by the binning), is
to decode with the runlength algorithm your x, and run histfit() on that
data:
% data from your previous question
x = [0 0.0278 0.0556 0.0833 0.1111 0.1389 0.1667 0.1945 0.2222];
y = [1 3 10 13 28 53 66 91 137];
% histfit
histfit(rude(y,x),9,'normal')
where you can find the runlength encoding/decoding function on FEX:
rude().
The result:

Not sure how to fit data with a gaussian python 
It seems like you can just use the definitions of the mean and covariance
parameters to fit them using maximum likelihood estimates from your data,
right ?
import numpy as np
data = np.loadtxt("/home/***/****/***", usecols=(2, 3))
mu = data.mean(axis=0)
sigma = np.cov(data, rowvar=0)
In my understanding, this is what "fitting" means. However, it sounds like
you're then looking to show these quantities on a plot. This can be a
little trickier, but only because you need to use some linear algebra to
find the eigenvectors of the estimated covariance matrix, and then use
those to draw the covariance ellipse.
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
# compute eigenvalues and associated eigenvectors
vals, vecs = np.linalg.eigh(sigma)
# compute "tilt" of ellip

How do you implement a calculated Gaussian kernel? 
It sounds like you want to compute a convolution of the original image with
a Gaussian kernel, something like this:
blurred[x][y] = Integral (kernel[s][t] * original[xs][yt]) ds dt
There are a number of techniques for that:
Direct convolution: go through the grid and compute the above integral at
each point. This works well for kernels with very small support, on the
order of 5 grid points in each direction, but for kernels with larger
support becomes too slow. For Gaussian kernels a rule of thumb for
truncating support is about 3*sigma, so it's not unreasonable to do direct
convolution with sigma under 2 grid points.
Fast Fourier Transform (FFT). This works reasonable fast for any kernel.
Therefore FFT became the standard way to compute convolution of nearly
anything with nearly an

Gaussian filtering a image with Nan in Python 
The simplest thing would be to turn nans into zeros via nan_to_num. Whether
this is meaningful or not is a separate question.

How to obtain a gaussian filter in python 
In general terms if you really care about getting the the exact same result
as MATLAB, the easiest way to achieve this is often by looking directly at
the source of the MATLAB function.
In this case, edit fspecial:
...
case 'gaussian' % Gaussian filter
siz = (p21)/2;
std = p3;
[x,y] = meshgrid(siz(2):siz(2),siz(1):siz(1));
arg = (x.*x + y.*y)/(2*std*std);
h = exp(arg);
h(h<eps*max(h(:))) = 0;
sumh = sum(h(:));
if sumh ~= 0,
h = h/sumh;
end;
...
Pretty simple, eh? It's <10mins work to port this to Python:
import numpy as np
def matlab_style_gauss2D(shape=(3,3),sigma=0.5):
"""
2D gaussian mask  should give the same result as MATLAB's
fspecial('gaussian',[shape],[sigma])
"""
m,n = [(ss1.

Fitting gaussian to data in python 
You currrently draw a scatter plot. The docs have a demo for plotting a
histogram, which might be Gaussian, depending on the distribution of your
data.
You need something like
plt.hist(x, 50, normed=1, histtype='stepfilled')
There are further demos here

RenderScript Intrinsics Gaussian blur 
I'm guessing you've got some issue with the UI parts rather than the RS
parts. The RS parts look fine; maybe try a outputBitmap.prepareToDraw()
after the RS bits have finished?
Note that in general it's not a great idea to create and destroy RS
contexts in the critical path like that. There's potentially a nontrivial
startup/teardown cost depending on the hardware resources that have to be
allocated, so it would be much better to allocate it at startup and use it
for the lifetime of the application.

SVG Gaussian Blur not working in Firefox 
it is just that you have a crazy hudge value.
try this:
<svg xmlns="http://www.w3.org/2000/svg" version="1.1">
<defs>
<filter id="f1" x="0" y="0">
<feGaussianBlur stdDeviation="5" />
</filter>
</defs>
</svg>

1D gaussian filter over non equidistant data 
use this:
function yy = smooth1D(x,y,delta)
n = length(y);
yy = zeros(n,1);
for i=1:n;
ker = sqrt(6.0/pi*delta^2)*exp(6.0*(xx(i)).^2 /delta^2);
%the gaussian should be normalized (don't forget dx), but if you
don't want to lose (signal) energy, uncomment the next line
%ker = ker/sum(ker);
yy(i) = y'*ker;
end
end

Normal(Gaussian) Distribution Function in C++ 
If I got your question right you looking for the estimated normal
distribution,
that is for the sample mean and the variance of the sample .
The former is calculated as:
and the latter as:
The sample mean can be used as expected value and the sample variance as
in the gaussian distribution:
If you want more information check out:
http://mathworld.wolfram.com/SampleVariance.html
http://en.wikipedia.org/wiki/Sample_mean_and_sample_covariance
I hope that answered your question ;)
