w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
Use NLP / Machine Learning to teach a machine how to detect if a string is mathematical?
Parsing is a better tool than machine learning for this problem. What you have described is a relatively simple grammar for arithmetic, with some aliases for numbers, and a touch of syntax for those aliases. A tokenizer and some basic syntactic analysis, which you could code directly, will produce better more reliable results with significantly less computational effort than machine learning and optimization will. One reason why parsing is sufficient is that you don't need to worry about misspellings as often as you do, say, with people's names. If you want to get fancy about that, then use your Jaro-Winkler-based things for lexical analysis and then use syntatic analysis on what you think are your tokens. That is still much cheaper and less complex than machine learning. I don't know mu

Categories : PHP

One Hot Encoding for Machine learning
Many learning algorithms either learn a single weight per feature, or they use distances between samples. The former is the case for linear models such as logistic regression, which are easy to explain. Suppose you have a dataset having only a single categorical feature "nationality", with values "UK", "French" and "US". Assume, without loss of generality, that these are encoded as 0, 1 and 2. You then have a weight w for this feature in a linear classifier, which will make some kind of decision based on the constraint w×x + b > 0, or equivalently w×x < b. The problem now is that the weight w cannot encode a three-way choice. The three possible values of w×x are 0, w and 2×w. Either these three all lead to the same decision (they're all < b or ≥b) or "UK" and "French" lead to

Categories : Machine Learning

Conditions in formula in machine learning
Simply add a feature to the dataset that returns 1 if the condition is true and 0 (or -1 depending on your activation function) if false, or vice versa. If you ever want to force a ML algorithm to consider something, simply add it as a feature. It will get ignored if it is not predictive by most algorithms.

Categories : R

Right database for machine learning on 100 TB of data
The simplest and most direct answer would be to just put the files directly in HDFS or S3 (since you mentioned AWS) and point Hadoop/Mahout directly at them. Other databases have different purposes, but Hadoop/HDFS is designed for exactly this kind of high-volume, batch-style analytics. If you want a more database-style access layer, then you can add Hive without too much trouble. The underlying storage layer would still be HDFS or S3, but Hive can give you SQL-like access to the data stored there, if that's what you're after. Just to address the two other options you brought up: MongoDB is good for low-latency reads and writes, but you probably don't need that. And I'm not up on all the advanced features of MySQL, but I'm guessing 100TB is going to be pretty tough for it to deal with, es

Categories : Mysql

A data set for machine learning that has something to do with education
UCI is a great source of machine learning datasets There is a publicaly avaliable dataset for Teaching Assistant's evaluation which could suit your needs: http://archive.ics.uci.edu/ml/datasets/Teaching+Assistant+Evaluation Collector: Wei-Yin Loh (Department of Statistics, UW-Madison) Donor: Tjen-Sien Lim (limt '@' stat.wisc.edu) Data Set Information: The data consist of evaluations of teaching performance over three regular semesters and two summer semesters of 151 teaching assistant (TA) assignments at the Statistics Department of the University of Wisconsin-Madison. The scores were divided into 3 roughly equal-sized categories ("low", "medium", and "high") to form the class variable. Attribute Information: Whether of not the TA is

Categories : Machine Learning

Machine Learning using R linear regression
The newdata parameter of predict needs to contain the variables used in the fit: new <- data.frame(memory = 1377678051, Date=as.Date("2013-08-28)) Only then it is actually used for prediction, otherwise you get the fitted values. You can then cbind the predicted values with new.

Categories : R

complex machine learning application
Depends on the model. A linear model such as linear regression cannot reliably learn the distance formula since it's a cubic function of the given variables. You'd need to add v×t and a×t² as a feature to get a good prediction of the distance. A non-linear model such as a cubic-kernel SVM regression or a multi-layer ANN should be able to learn this from the given features, though, given enough data. More generally, predicting multiple values with a single model sometimes works and sometimes doesn't -- when in doubt, just fit several models. You can try. Whether it'll work depends on the relation between the variables and the model.

Categories : Machine Learning

Machine Learning with Google Data
As was noted in the comments the link is: Intuition & Data-Driven Machine Learning He particularly piqued my interest with this quote: "... in certain cases, you are simply better off working on getting more data, then spending your time on improving the algorithm..." Excellent presentation and presenter (Ilya Grigorik)! Highly recommended for anyone wanting to start down the path of machine learning.

Categories : Ruby

NLP/Machine Learning text comparison
It does not seem to be a machine learning problem, you are simply looking for some text similarity measure. Once you select one, you just sort your data according to achieved "scores". Depending on your texts, you can use one of the following metrics (list from the wiki) or define your own: Hamming distance Levenshtein distance and Damerau–Levenshtein distance Needleman–Wunsch distance or Sellers' algorithm Smith–Waterman distance Gotoh distance or Smith-Waterman-Gotoh distance Monge Elkan distance Block distance or L1 distance or City block distance Jaro–Winkler distance Soundex distance metric Simple matching coefficient (SMC) Dice's coefficient Jaccard similarity or Jaccard coefficient or Tanimoto coefficient Tversky index Overlap coefficient Euclidean distance or L2 distance

Categories : Machine Learning

a summary of frequentist view in machine learning
It is a bit hard, in my opinion, to make it brief. It's like banalizing two different visions of a world. Nevertheless, a very short reduction to a single argument can be this: frequentists do not treat hidden parameters as random variables and do not seek to describe them by subjective priors. Here is a good discussion that goes in depth touching arguments as parameter estimation algorithmic complexty in the two approaches. This is an old discussion, arisen in the famous Langford's blog. And, to confirm my point about the requirment of a bit of statistics at least to understand the different perspectives, here is a video.

Categories : Machine Learning

How to approach a machine learning programming competition
So, I had never heard of Kaggle until reading your post--thank you so much, it looks awesome. Upon exploring their site, I found a portion that will guide you well. On the competitions page (click all competitions), you see Digit Recognizer and Facial Keypoints Detection, both of which are competitions, but are there for educational purposes, tutorials are provided (tutorial isn't available for the facial keypoints detection yet, as the competition is in its infancy. In addition to the general forums, competitions have forums also, which I imagine is very helpful. If you're interesting in the mathematical foundations of machine learning, and are relatively new to it, may I suggest Bayesian Reasoning and Machine Learning. It's no cakewalk, but it's much friendlier than its counterparts

Categories : Machine Learning

Combining different machine learning algorithms with boosting in R
take a look at the packages caret http://cran.r-project.org/web/packages/caret/index.html C50 http://cran.r-project.org/web/packages/C50/index.html GAMBoost http://cran.r-project.org/web/packages/GAMBoost/index.html mboost http://cran.r-project.org/web/packages/mboost/index.html

Categories : R

Machine Learning - Support Vector Machines
I think that the issue is a semantic one: you refer to the set of 4000 samples as being both "unknown" and "negative" -- which of these apply is the critical difference. If the labels for the 4000 samples are truly unknown, then I'd do a 1-class SVM using the 6000 labelled samples [c.f. validation below]. And then the predictions would be generated by testing the N=4000 set to assess whether or not they belong to the setosa class. If instead, we have 6000 setosa, and 4000 (known) non-setosa, we could construct a binary classifier on the basis of this data [c.f. validation below], and then use it to predict setosa vs. non on any other available non-labelled data. Validation: Usually as part of the model construction process you will take only a subset of your labelled training data

Categories : Machine Learning

PyDev can't find machine learning data
The file mnist.pkl.gz is probably not in the same directory as the script that you are trying to run. You would be better off receiving the actual location of the file as a command line parameter to the script and then load the file using that path

Categories : Python

Best techinique to approximate a 32-bit function using machine learning?
Multilayer perceptron neural networks would be worth taking a look at. Though you'll need to process the inputs to a floating point number between 0 and 1, and then map the outputs back to the original range.

Categories : Function

Where does map-reduce/hadoop come in in machine learning training?
Yes. There are many MapReduce implementations such as hadoop streaming and even some easy tools like Pig, which can be used for learning. In addition, there are distributed learning toolset built upon Map/Reduce such as vowpal wabbit (https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial). The big idea of this kind of methods is to do training on small portion of data (split by HDFS) and then averaging the models and commutation with each nodes. So the model get updates directly from submodels built on part of the data.

Categories : Hadoop

how to handle large number of features machine learning
try to use some dimensionality reduction first (PCA, kernel PCA, or LDA if you are classifying the images) vectorize your gradient descent - with most math libraries or in matlab etc. it will run much faster parallelize the algorithm and then run in on multiple CPUs (but maybe your library for multiplying vectors already supports parallel computations)

Categories : Machine Learning

Machine Learning in practice: Writing algorithms yourself or using Weka?
This is not a good question for Stackoverflow. It's an opinion question, not a programming problem. Nevertheless, here is my take: It depends on what you want to do. If you want to find which algorithm works best for your data problem at hand, try ELKI, Weka, R, Matlab, SciPy, whatever. Try out all the algorithms you can find, and spend even more time on preprocessing your data. If you know which algorithm you need and need to get it into production, many of these tools will not perform good enough or be easy enough to integrate. Instead, check if you can find low level libraries such as libSVM that provide the functionality you need. If these don't exist, roll your own optimized code. If you want to do research in this domain, you are best off with extending the existing tools. ELKI a

Categories : Machine Learning

Implementing scikit-learn machine learning algorithm
To do this in TextBlob (as alluded to in the comments), you would do from text.blob import TextBlob tweets = ['This is tweet one, and I am happy.', 'This is tweet two and I am sad'] for tweet in tweets: blob = TextBlob(tweet) print blob.sentiment #Will return (Polarity, Subjectivity)

Categories : Python

For a machine learning or NLP application -- Google App Engine vs Heroku vs App-fog
How are you using sklearn, nltk, or gensim? You may want to consider a platform designed for high performance scientific computation, such as PiCloud.

Categories : Google App Engine

The intersection of Machine Learning and Programming Languages fields
If you are interested in NLP, then I would focus on two aspects of listed PL disciplines: Syntax & Semantics - as this is incredibly closely realted to the NLP field, where in most cases the understanding is based on the various language grammars. Searching for papers regarding language modeling, information extraction, deep parsing would yield dozens of great research topics which are heavil related to the sytax/semantics problems. logic programming -"in good old years" people believed that this is a future of AI, even though it is not (currently) true, it is still quite widely used forreasoning in some fields. In particular, prolog is a good example of language that can be used to reson (for example spatial-temporal reasoning) or even parse language (due to its "grammar like" produ

Categories : Machine Learning

Applying AI, recommendation or machine learning techniques to search feature
This looks more like an optimization problem. You have some hard constraints and some preferences. Look at Linear Programming. Also google Constraint based Scheduling, there are several tutorials. Just a warning: This is in general an NP-hard problem, so unless you are trying to solve it for a small number participants, you will need to use some heuristics and approximations. If you want to go a little bit overboard, there is a coursera class on optimization running right now.

Categories : Machine Learning

How should I teach machine learning algorithm using data with big disproportion of classes? (SVM)
The most basic approach here is to use so called "class weighting scheme" - in classical SVM formulation there is a C parameter used to control the missclassification count. It can be changed into C1 and C2 parameters used for class 1 and 2 respectively. The most common choice of C1 and C2 for a given C is to put C1 = C / n1 C2 = C / n2 where n1 and n2 are sizes of class 1 and 2 respectively. So you "punish" SVM for missclassifing the less frequent class much harder then for missclassification the most common one. Many existing libraries (like libSVM) supports this mechanism with class_weight parameters. Example using python and sklearn print __doc__ import numpy as np import pylab as pl from sklearn import svm # we create 40 separable points rng = np.random.RandomState(0) n_sample

Categories : Machine Learning

Sentimental analysis of tweets in python using a machine learning algorithm
You are describing a standar text classification problem. In this setting, the set of features is a (finite) set of words instead of the Sepal length, width, ... As a result, each document is represented with respect to all such features (all documents have the same number of features) but most of the values will be zero, creating a very sparse vector. This is the best way to predict polarity/sentiment but you should improve your knowledge of the topic a bit more. I would suggest a read of Sebastiani's survey on Text Classification. Regards,

Categories : Misc

Some doubts related to statistic entropy concept in ID3 machine learning algorithm
When calculating entropy you do a summation by iterating over the unique classification values at the node in question. You do this on each iteration by counting how many members of the set have the value, and then use the log formula. In your problem case, the only classification value that occurs is YES, meaning that entropy is zero based on the single iteration. You cannot iterate on a NO value because none of the examples have that value.

Categories : Algorithm

Machine Learning Model for Multi-Label Classification where we know relationship between the labels
Some doubts Your question is far from being clear, for example: We want to optimize that most X reaches S3, so based on input data we decide whether to allow X to go through S1 or not Actually suggest, that the best model would be "always answer yes" ,as it maximized number of objects reaching S3 (as it simply lets any object reach this point) General ideas I assume two possible interpretations: You have a labels "pipeline", which simply means, that object cannot be labelled S_n if it has not been already labelled with all S_i for i < n This does not seem to be the problem for one single model, you can pipeline models in a natural way, ie. train a model 1 which regognizes, if object x should have label S_1. Next, you train a model 2 on all data that has label S_1 in the trai

Categories : Machine Learning

How to use WEKA Machine Learning for a Bayes Neural Network and J48 Decision Tree
Here is one way to do it with the command-line. This information is found in Chapter 1 ("A command-line primer") of the Weka manual that comes with the software. java weka.classifiers.trees.J48 -t training_data.arff -T test_data.arff -p 1-N where: -t <training_data.arff> specifies the training data in ARFF format -T <test_data.arff> specifies the test data in ARFF format -p 1-N specifies that you want to output the feature vector and the prediction, where N is the number of features in your feature vector. For example, here I am using soybean.arff for both training and testing. There are 35 features in the feature vector: java weka.classifiers.trees.J48 -t soybean.arff -T soybean.arff -p 1-35 The first few lines of the output look like: === Predictions on test dat

Categories : Machine Learning

Supervised Machine Learning: Classify types of clusters of data based on shape and density (Python)
Could Neural networks help , the "pybrain" library might be the best for it. You could set up the neural net as a feed forward network. set it so that there is an output for each class of object you expect the data to contain. Edit :sorry if I have completely misinterpreted the question. I'm assuming you have preexisting data you can feed to train the networks to differentiate clusters. If there are 3 categories you could have 3 outputs to the NN or perhaps a single NN for each one that simply outputs a true or false value.

Categories : Python

What is the difference between point-wise and pair-wise ranking in machine learning
Point wise ranking is analogous to regression. Each point has an associated rank score, and you want to predict that rank score. So your labeled data set will have a feature vector and associated rank score given a query IE: {d1, r1} {d2, r2} {d3, r3} {d4, r4} where r1 > r2 > r3 >r4 Pairwise ranking is analogous to classification. Each data point is associated with another data point, and the goal is to learn a classifier which will predict which of the two is "more" relevant to a given query. IE: {d1 > d2} {d2 > d3} {d3 > d4}

Categories : Machine Learning

MATLAB dll integration with asp.net website in 64 bit machine giving exception
try to use this Applications created from C# are compiled as managed code, which makes them platform independent (like Java, for example). Thus, when you compile a C# application on a 32-bit machine and then deploy it on a 64-bit machine, it will, by default, try to run as a 64-bit application. Then, it will try to find the 64-bit version of the MWArray.dll, and if it fails, the mentioned error will be shown. To work around this issue, set the option "Properties -> Build -> Platform target" to "x86" instead of "Any CPU" before compiling your C# application. This will have the effect that the application will start in 32-bit mode on the 64-bit machine.

Categories : C#

Execute a program compiled using intel-fortran from MATLAB on Linux machine
Ideally, this should go as a comment, but I dont have enough reputation. But nonetheless, the error with the creating the child process is unrelated to MATLAB. The shell is erroring out. Are you able to run the program from the terminal? Secondly, you are using: !/home/atrac/code case172.jcl but you should be using !./home/atrac/code case172.jcl

Categories : Linux

How to copy only file permissions and user:group from one machine and apply them on another machine in linux?
How about this? #!/bin/bash user="user" host="remote_host" while read file do permission=$(stat -c %a $file) # retrieve permission owner=$(stat -c %U $file) # retrieve owner group=$(stat -c %G $file) # retrieve group # just for debugging echo "$file@local: p = $permission, o = $owner, g = $group" # copy the permission ssh $user@$host "chmod $permission $file" < /dev/null # copy both owner and group ssh $user@$host "chown $owner:$group $file" < /dev/null done < list.txt I am assuming that the list of the files is saved in "list.txt". Moreover you should set the variables "user" and "host" accordingly to your setup. I would suggest to configure ssh to have "automatic login". Otherwise you should insert the password twice per loop. Here

Categories : Shell

ModelState.isValid = false on production machine, true on development machine
As I mentioned in the edit, I found the cause of and solution to my problem. It was caused by an error being thrown by the automatic binding of parameters, and the reason it was having issues with dates on my production machine was due to the locale settings (dd/mm/yy vs mm/dd/yy) on that machine. To fix, I set the computers region settings (in the control panel), and also added a line into my Web.config.

Categories : Jquery

MySQL statement failing on local machine but not deployment machine?
The failure is referring to the d.id in the nested subquery: ON dol.id=( SELECT MIN(dol2.id) FROM DOLineEJB dol2 WHERE d.id=dol2.DeliveryOrderEJB_lines ) You can fix the query: SELECT d.id, ..., dol.weight, d.status FROM DeliveryOrderEJB d join InbondEJB_DeliveryOrderEJB_link lnk on d.id=lnk.DeliveryOrderEJB_id left join DOLineEJB dol ON dol.id=( SELECT MIN(dol2.id) FROM DOLineEJB dol2 WHERE d.id=dol2.DeliveryOrderEJB_lines ) WHERE lnk.InbondEJB_itNo='...' ORDER BY d.id I believe the problem is because you have different versions of MySQL on the machines. MySQL changed the semantics of the , around version 5.0.

Categories : Mysql

PyDev: Running Code in local machine to remote machine
One solution would be: Install python on the remote machine Package your code into a python package using distutils (see http://wiki.python.org/moin/Distutils/Tutorial). Basically the process ends when you run the command python setup sdist in the root dir of your project, and get a tar.gz file in the dist/ subfolder. Copy your package to the remote server using scp, for example, if it is an amazon machine: scp -i myPemFile.pem local-python-package.tar.gz remote_user_name@remote_ip:remote_folder Run sudo pip install local-python-package.tar.gz on the remote server Now you can either SSH to the remote machine and run your code or use some remote enabler such as fabric to start commands on the remote server (works for any shell command, specifically python scripts) Alternatively, you c

Categories : Python

Is it possible to remotely debug a 64 bit machine from 32 bit Host Machine(which has visual studio)
According to MSDN, the answer in the other article is incorrect, and you can debug x64 from an x86 host: http://msdn.microsoft.com/en-us/library/vstudio/ms184681%28v=vs.100%29.aspx If you are debugging remotely, Visual Studio can run under WOW64 or on a 32-bit computer. You can debug both IA64 and x64 applications, in addition to 32-bit applications that are running under x64 WOW mode or on 32-bit operating systems.

Categories : Dotnet

How to calculate the offset of machine instructions using machine code itself?
In general, a machine instruction set uses three types of storage addresses: Absolute addresses which refer to the exact storage location of interest. Base-relative addresses which, when added to the contents of a "base register", refer to the storage location of interest. Instruction-relative addresses which, when added to the address of the instruction containing them, refer to the storage location of interest. Usually the 3rd type does not need to be "relocated", since the location being referred to will generally be in the same code segment as the referring instruction, so they move together. The second type may or may not require change upon binding/loading, depending on details of the compiled code binding scheme. The first type will almost always require changing, but it's u

Categories : Assembly

Powershell tasks from local machine to remote machine
Reading from docs on MSDN: To run a single command on a remote computer, use the ComputerName parameter. To run a series of related commands that share data, use the New-PSSession cmdlet to create a PSSession (a persistent connection) on the remote computer, and then use the Session parameter of Invoke-Command to run the command in the PSSession. To run a command in a disconnected session, use the InDisconnectedSession parameter. To run a command in a background job, use the AsJob parameter. So basically you should do something like: $session = New-PSSession Invoke-Command -Session $session -FilePath <PathToScript>

Categories : Powershell

host webapp in my machine trough virtual machine
Yes to both questions, if your browser machine can hit the port the server machine is serving the app on. On WAN, you would need to make sure your router didn't block requests to the server's port, and the same for any modem/firewall that connects you to the internet at large. Also, if your public ip is not static, it may change.

Categories : Iis

Connecting Clients machine to MySQL Server machine
You can use other tool with good looking GUI. For example: HeidiSQL. Then, you need to enable remote access. If you use windows, there is nice wizard for you to enable remote access with a few clicks. The wizard is located it at MySQL Server 5.5inMySQLInstanceConfig.exe. Then Reconfigure Instance -> Next -> Standard Configuration -> Next -> Next. You will see this screen. Tick on "Enable root access from remote machines". Or you can manually configure it to allow remote access from my.ini file. See this link for how to do it. If having done above still does not allow you to connect to your MySQL. Please make sure that Firewall does not block MySQL port.

Categories : Mysql



© Copyright 2017 w3hello.com Publishing Limited. All rights reserved.