Neural Nets Mixed Realvalued and Categorical Input Features 
I can answer to question #2, I'm not really prepared on RFs so I will
just leave that answer to more skilled people.
As far as point 2 goes, if you transform each of your categorical
inputs into a kvector (with k = # of classes) you are just
introducing k new inputs, which are scaled in the range [0, 1], so if
your realvalued input features are themselves scaled in that range
you're pretty much

Vowpal Wabbit training and testing data formats 
The bar symbol () must be also in the format for predictions:
 price:.23 sqft:.25 age:.05 2006
 price:.18 sqft:.15 age:.35 1976
 price:.53 sqft:.32 age:.87 1924
If you don't include the correct labels, vw cannot compute the test
loss, of course.
To get the predictions use vw d test_set.vw t p predictions.txt.
The training set in the tutorial (with three examples only) is too
small to tra

Can I use a Naive Bayesian Classifier with enumerated data? 
Yes, in bayesian classification, u just need to determine the class
specific distribution on its support which you can easily do from the
data. Now u can compute the posterior distribution for each class and
then do a map estimates. Actually for documents the distribution is
defined for each word of a dictionary given the document class as spam
or not spam. For details refer to andrew ng notes on

Ranking algorithm with missing values and bias 
In this case, two imputation methods can be used:
As everyone would try at first, fill with the most likely value i.e.
average mean.
Predict based on other attributes which is called imputation by
regression.
Actually, I think the second method seems better for this dataset
where users mostly rank more than one product.
Also, if you have another datasets depending on users, you may use it
too

Getting filename when using TextDirectoryLoader  weka 
In weka, those text files and classes become an Instance and the
filenames are not saved in Instance class.
Instead, you can get the text content of that file which got
classified.
double pred = 0d;
Instance current = getInstance();
pred = classifier.classifyInstance(current);
System.out.println("
Text: "+current.attribute(0)); // Change index according to your
dataset
System.out.prin

Plotting the Kohonen map  Understanding the visualization 
The SOM is a nonsupervised clustering algorithm. As such it represent
similar samples, closer on the feature map (this is, similar samples
will fire nodes that are closer together).
So lets assume you have 10000 samples with 10 features each, and a
2dSOM of 20x20x10 (400 nodes with 10 features). After training you
therefore clustered 10000 samples into 400 nodes. Further, you can try
to identif

Why would my neural network give different values for the same input? 
XOR function is not trivial for a neural network. With a very few
samples of training data you will be able to build OR function or AND
functions. However, for XOR you may need more training data, more
neurons, or more layers. If you are just testing your learning system,
I suggest to start with a simple function like OR. If it works, then
give it more training data and try to adjust the hyperpar

How does Support Vector Machine compare to Logistic Regression? 
Why sometime SVM can perform better than LR? And sometime not?
You could pose this question for any two statistical methods x and y.
There will always exist certain cases where one performs better than
the other. This behaviour is often summarized by the words "there is
no free lunch".
Now, your particular question on support vector machines and logistic
regression is very broad, such that

Measuring success of Restricted Boltzmann Machine 
A RBM is an unsupervised learning paradigm, and therefore is difficult
to access whether one is better than another.
Nevertheless, they are usually used as a pretraining of recent and
more exciting networks such as DBN. So my suggestion would be to train
as much RBMs as you want to compare (unsupervised learning) and then
give them to a feedforward layer for learning (supervised learning).
From

Classification algorithm used as Regression algorithm 
In general  no. Classification is not directly convertable to the
regression (the opposite direction is much easier). You could
obviously create some finite set of "buckets" of values and treat them
as labels but in general I have never seen it perform better than even
simpliest regressors. Why would you want to do something like this?
Why do not use regressors for regression tasks?

How possible vector operations on a matrix that does not fit memory 
Yes, the famous Hadoop is an open source computing platform, which can
be used for operations on pretty big matrices (and not only for that).
For examples, please read this page.

How to predict several unlabelled attributes at once using WEKA and NaiveBayes? 
I can't tell if there's something wrong with your arff file.
However, here's one idea: you can add a NominalToBinary
unsupervisedAttributefilter to make sure that the attributes
slot1slot96 are recognized as binary.

How to classify text with scikit's SVM? 
As others have pointed out, your matrix is just a list of feature
vectors for the documents in your corpus. Use these vectors as
features for classification. You just need classification labels y
and then you can use SVC().fit(X, y).
But... the way that you have asked this makes me think that maybe you
don't have any classification labels. In this case, I think you want
to be doing clustering

stanford NER classification with additional classes 
The major hassle for training the model over other classes is the
training data.
Models require highly accurate training data like I brought a
<START:product> Mac Book Pro <END> in September and synced
it with my <START:device> IPhone <END>. Observe that
Iphone could be annotated with either device or product.
If you can generate or annotate at least 15,000 sentences annot

Vowpal Wabbit Logistic Regression 
Predictions are in the range [50, +50] (theoretically any real
number, but Vowpal Wabbit truncates it to [50, +50]).
To convert them to {1, +1}, use binary. Positive predictions are
simply mapped to +1, negative to 1.
To convert them to [0, +1], use link=logistic.
This uses the logistic function 1/(1 + exp(x)).
You should also use loss_function=logistic if you want to interpret
the nu

Using RapidMiner to train a model from multiple files 
It is possible to use the Update Model operator to update a previously
created model with new example set data. Not all model operators can
be used this way, Naive Bayes and kNN do work as does Weka's WIBk.
It would be possible to create a process within RapidMiner to split
files into smaller pieces, read them one by one and create a model
from these.

Liblinear vs Pegasos 
Both LIBLINEAR and Pegasos are linear classification techniques that
were specifically developed to deal with large sparse data with a huge
number of instances and features. They are only faster than the
traditional SVM on this kind of data.
I never used Pegasos before, but I can assure you that LIBLINEAR is
very fast with this kind of data and the authors say that "it is
competitive with or even

Why do we use gradient descent in linear regression? 
The example you gave is onedimensional, which is not usually the case
in machine learning, where you have multiple input features.
In that case, you need to invert a matrix to use their simple
approach, which can be hard or illconditioned.
Usually the problem is formulated as a least square problem, which is
slightly easier. There are standard least square solvers which could
be used instead of

Clustering before classification in Weka 
One way that you could add cluster information to your data is using
the below method (in Weka Explorer):
Load your Favourite Dataset
Choose your Cluster Model (In my case, I used SimpleKMeans)
Modify the Parameters of the Clusterer as Required
Use the Training Set for the Cluster Mode
Start the Clustering Process
Once the Clusters have been generated, RightClick on the Result List
and select '

Classify new instance that have new value in some features with existing model 
As a shortterm solution, perhaps what you could do is set the value
of the attribute to 0 or 1 (within the range of the original dataset)
depending on the value of the attribute.
A longerterm solution would be to include such cases in future
training of the neural network. Such values may cause the values of
other instances to be skewed to the left or right so some attention
may be required fo

Confusion regarding difference of machine learning and statistical learning algorithms 
The authors seem to distinguish probabilistic vs nonprobabilistic
models, that is models that produce a distribution p(output  data) vs
those that just produce an output output = f(data).
The description of the nonprobabilistic algorithms is a bits odd to
my taste, though. The difference between a (linear) support vector
machine, a perceptron and logistic regression from the model and
algorith

Multilabel classification involving range of numbers as labels 
You can preprocess your data with OneHotEncoder to convert your one
1to100 feature into 100 binary features corresponding to each value
of interval [1..100]. Then you'll have 100 labels and learn a
multiclass classifier.
Though, I suggest to use Regression instead.

In DBSCAN, how to determine border points? 
This largely depends on the implementation. The best way is to just
play with the implementation yourself.
In the original DBSCAN1 paper, core point condition is given as
N_Eps>=MinPts, where N_Eps is the Epsilon neighborhood of a certain
data point, which is excluded from its own N_Eps.
Following your example, if MinPts = 4 and N_Eps = 3 (or 4 including
itself as you say), then they don't for

Text Feature Representation As Vectors for SVM 
It doesn't matter what classifier you use (SVM or not) the feature
generation for text is the same.
I suggest you to take a look at this:
Binary Feature Extraction
Also this library would make your life much easier:
http://cogcomp.cs.illinois.edu/page/software_view/LBJ
A tutorial is here:
http://cogcomp.cs.illinois.edu/page/tutorial.201310

apply machine learning to analysis mixed language 
Natural language processing is a large and diverse field. You can
think about your example a number of ways.
The first is character sets and symbol encoding. Most nonromance
languages will have characters outside the standard 26 letter
alphabet. If you see characters from inside and outside the core
character ranges for a language, it works around needing a lot of
dictionaries.
The second is

Feature scaling (normalization) for clustering algorithms (as Kmeans & EM) 
Kmeans and EM are for numeric data only.
It does not make much sense to apply them on name/date/price typed
data.
As the name indicates, the algorithm needs to compute means. How would
you compute a mean in your "name" column? You can hack something for
the date, but not for the name.
Wrong tool for your job.

Machine Learning  Feature selection and training data 
Simply put, feature selection essentially says (for example): "Of the
5 attributes of the input vector, only features 1,3,4 are useful.
Features 2,5 are junk. Don't use them at all". This goes for both the
training and the test patterns, since they come from the same
distribution. So you drop features 2 and 5 from both the training and
test patterns, and then you train and test your classifier in

Multilabel model scores better than the same model with binarylabels in scikitlearn 
It seems like when you binarize labels, random forest can predict
multiple labels at once, while predicting only the most probable label
in the initial case. F1 score is sensitive to that.
UPD: I'm wrong. I've tested it and it my case it always returns only
one label, but score is still bad.
UPD2: I'm not so wrong as I thought. sum(sum(prediction2)) appears to
be lesser than len(prediction), so

How to predict a continuous dependent variable that expresses target class probabilities? 
You might be able to approximate this using sample weighting  assign
a sample to the class which has the highest probability, but weight
that sample by the probability of it actually belonging. Many of the
scikitlearn estimators allow for this.
Example:
X = [1, 2, 3, 4] > class 0 with probability .7 would become X = [1,
2, 3, 4] y = [0] with sample weight of .7 . You might also normalize
so t

Implementations of Hierarchical Reinforcement Learning 
Your actual state is the robot's position and orientation in the
world. Using these sensor readings is an approximation, since it is
likely to render many states indistinguishable.
Now, if you go down this road, you could use linear function
approximation. Then this is just 24 binary features (12 01 + 6*2
nearfarvery_far). This is such a small number that you could even
use all pairs of featur

SVM as a type of instancebased learning? 
I think the best would be to ask Prof Domingos directly.
SVMs indeed employ a hyperplane  both are binary after all. However
comparing SVM with formulation of LR  unlike LR, SVM is not
probabilistic. HTH, although surely one could argue that all ML is
instancebased.

What's the low dimensional? 
There was the question about difference between PCA and SVD on math
section. You can check this out:
http://math.stackexchange.com/questions/3869/whatistheintuitiverelationshipbetweensvdandpca

How prolog work as intellegent language? 
Prolog power derives from logical variables, coupled with an embedded
search algorithm, and expressive and uniform data structuring
facilities.
It implements a relational data model, but on a more structured values
domain than SQL.
In a sense, I think of it as the antesignan 'No SQL' language.
So we can code  with care  relations among complex data structures,
like those that in early NLP rese

Backpropagation algorithm converging too quickly to poor results 
There are many parameters that need to be tuned to get a multilayer
neural net to work. Based on my experiment, my first suggestions are:
1 give it a small set of synthesized data and run a baby project to
see if the framework works.
2 Use a more convex cost function. There is no function that
guarantees convexity, but there are many functions that are more
convex that RMS.
3 Try scaling yo

Navie Bayes classifier in Data Mining 
Evaluation is (usually) not part of the classifier.
It's something you do seperately to evaluate if you did a good job, or
not.
If you classify your test data using naive bayes, you can perform
exactly the same kind of evaluation as with other classifiers!

How do we get/define filters in convolutional neural networks? 
You can follow the tutorial :
http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
This is like a lecture on both autoencoders and some simple stuff
about CNN (convolution and pooling). When you complete this tutorial
you will have both autoencoder implementation and
stackedautoencoder in your words deep autoencoder implementation
ready.
This tutorial will have exactly what you ask for:

How is the desired output of a neural network represented so as to be compared with the actual output? 
The main idea is that you don't create one single output for
everything and ask it "what digit is this??". You create one output
for each digit, and you ask each one "is this digit x??".
So, the desired output must be encoded with a 1Xn vector, where n is
the number of classes. All values will be 0, and the value
corresponding to the desired class will be 1. In your case for
example, create a 1X

Implementation advice on semisupervised automated tagging 
For this exact problem I have written a PhD thesis which I called
Generative AI. Since you probably are not going to read the thesis
here is the general algorithm for these kind of problems:
1)
normalize the data: make certain that the range is between 0 and 1, or
1 and 1 if you have numbers; if you have words/names use only
lowercase (or only uppercase); if you have both, split the data in
numb

Not able to Compute cost for 1 variable in Cost Function 
Is it possible that you make an error when calling computeCost? You
mention you are running the script from computeCost.m. (I think the
best is you describe better which code pasrt is in which file and how
you call the functions)
The rule is: If your function name is "computeCost", the function
should be implemented (function until endfunction) in a file called
"computeCost.m". (There are some ex
