w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
oracle sql navigator data-mining text-mining
You can use REGEXP_SUBSTR: SQL> SELECT txt, regex, regexp_substr(txt, regex, 1, 1, 'c', '3') result 2 FROM (SELECT q'¤@A'123'HEY'345''@B'K¤' txt, 3 q'¤^([^']*|('[^']*')*?)*?'([^']*[@#][^']*)'¤' regex 4 FROM dual); TXT REGEX RESULT -------------------- ---------------------------------------- --------- @A'123'HEY'345''@B'K ^([^']*|('[^']*')*?)*?'([^']*[@#][^']*)' @B

Categories : C#

Topic mining algorithm in c/c++
If you wish to count the occurrences of each word in an array then you can do no better than O(n) (i.e. one pass over the array). However, if you try to store the word counts in a two dimensional array then you must also do a lookup each time to see if the word is already there, and this can quickly become O(n^2). The trick is to use a hash table to do your lookup. As you step through your word list you increment the right entry in the hash table. Each lookup should be O(1), so it ought to be efficient as long as there are sufficiently many words to offset the complexity of the hashing algorithm and memory usage (i.e. don't bother if you're dealing with less than 10 words, say). Then, when you're done, you just iterate over the entries in the hash table to find the maximum. In fact, I wo

Categories : C++

Data Mining from HTML
Typically whenever you're retrieving data from a database in order to display information in the UI, it's best to avoid copy and paste "inheritance". Instead you might want to look into template based data binding. What specific approach to use is dependent on the technology you're using. In the above case it looks like it would make send to bind your dropdown to a data source

Categories : PHP

Web usage mining with rapidminer
This was asked in the rapidforum: yes, the RapidMiner 4.6 Community Edition together with its text mining plugin are suitable for web usage mining. The RapidMiner 4.6 operator LogFileSource allows to directly import web server log files. RapidMiner supports aggregations of web usage statistics, automated web page visitor session extraction, search robot filtering, mash-ups with web services to map ip addresses to countries, cities, and map coordinates, automated clustering of visits and/or click paths, frequent path item set mining and association rule generation, 2D and 3D visualization of web usage statistics, click path sequence analysis, personalized product recommendations for cross-selling, and many other things.

Categories : Misc

Sequential Pattern - Data Mining
In general, and keep in mind, this is inherently opinion based, data mining refers to the process of taking data that is in a relatively unusable format and converting it into a format that is more usable. For instance, if I have a huge .txt dump of unstructured text and I then extract relevant portions (according to some formal definition of relevant) and place it into a .bson store or something similar, that would be data mining, regardless of exactly how I do the extraction. However, since your data is already in a SQL database, I wouldn't consider this data mining. I would consider it SQL development, though again, this is largely opinion-based. A SQL database is already a highly useful way of storing data, so accessing that data isn't introducing a level of functionality that wasn'

Categories : SQL

Use data mining in SQL Server 2008 R2
First you should get SQL Server Data Tools which runs in Visual Studio. You will need Analysis Services installed; if you don't have it just run the SQL Server installer again and look for the option to install it. After that you can take a look at this post I wrote a few months ago: http://www.sqlservercentral.com/Forums/Topic480010-147-1.aspx I wrote it specifically targeting the Neural Network models, but it contains details on several background steps you will need to do. Finally - since you're using an evaluation version, you may want to just go for SQL Server 2012 (that's what I use, so I know it works).

Categories : Sql Server

Data mining library for hadoop
Why not use Spark? It's a very efficient open source cluster computing system, both fast to run and fast to write. For distributed data mining, Spark is a very good tool. Hope helpes!

Categories : Hadoop

How to use a regular expression inside TermDocumentMatrix for text mining?
I'm not sure that you can put regex in the dictionary function as it only accepts a character vector or a term-document matrix. The work-around I'd suggest is using regex to subset the terms in the term-document matrix, then do word counts: # What I would do instead tdm <- TermDocumentMatrix(crude, control = list(removePunctuation = TRUE)) # subset the tdm according to the criteria # this is where you can use regex crit <- grep("cru", tdm$dimnames$Terms) # have a look to see what you got inspect(tdm[crit]) A term-document matrix (2 terms, 20 documents) Non-/sparse entries: 10/30 Sparsity : 75% Maximal term length: 7 Weighting : term frequency (tf) Docs Terms 127 144 191 194 211 236 237 242 246 248 273 349 352 353 368

Categories : Regex

stratum-mining-proxy error - Can't decode message
I am a little curious, I don't know as a fact but I was under the impression that the mining proxy was for BTC not LTC. But anyways I believe I got a similar message when I first installed it as well. To fix, or rather to actually get it running I had to use the Git installation method instead of installing manually. Installation on Linux using Git This is advanced option for experienced users, but give you the easiest way for updating the proxy. 1.git clone git://github.com/slush0/stratum-mining-proxy.git 2.cd stratum-mining-proxy 3.sudo apt-get install python-dev # Development package of Python are necessary 4.sudo python distribute_setup.py # This will upgrade setuptools package 5.sudo python setup.py develop # This will install required dependencies (namely Twisted and Stratum

Categories : Python

How to convert a termDocumentMatrix which I have got from text mining in R into excel or CSV file?
I assume you have a list of strings elements separated by a comma, with different number of elements. Names <- c("aaron, matt, patrick", "jiah, ron, melissa, john, patrick") ## get max number of elements mm <- mm <- max(unlist(lapply(strsplit(Names,','),length))) ## set all rows the same length lapply(strsplit(Names,','),function(x) {length(x) <- mm;x}) ## create a data frame with the data welle formatted res <- do.call(rbind,lapply(strsplit(Names,','),function(x) {length(x) <- mm;x})) ## save the file write.csv(res,'output.csv') I think also you can use rbind.fill from plyr package, but you have to coerce each row to a data.frame( certain cost).

Categories : R

Pairing qualitative user data with text-mining results
... as ben mentioned: vec <- as.character(x[,"place of comments"]) Corpus(VectorSource(vec)) perhaps some customer id as meta data would be nice... hth

Categories : R

Data Mining: grouping based on two text values (IDs) and one numeric (ratio)
Sounds like a classic matrix factorization task to me. With a weighted matrix, instead of a binary one. So some fast algorithms may not be applicable, because they support binary matrixes only. Don't ask for source on Stackoverflow: asking for off-site resources (tools, libraries, ...) is off-topic.

Categories : Python

can gcc do loop optimizations (strip-mining/blocking) on unknown iteration count?
Definitely read the GCC 4.5.0 Optimize Options docs. (Search for -floop-strip-mine, about 1/3 of the way down the page) Also, make sure GCC's getting the --with-ppl and --with-cloog options (as noted in the docs about using Graphite in -floop-strip-mine). Without those, GCC probably won't even try to perform strip mining on your code. Based on the behavior description and pseudocode examples in the docs, which show some pseudocode loops with finite strip lengths and iteration counts, I'd say that GCC probably does not do strip mining on unknown iteration counts. From the docs: Pseudocode original loop: DO I = 1, N A(I) = A(I) + C ENDDO Pseudocode strip-mined loop: DO II = 1, N, 51 DO I = II, min (II + 50, N) A(I) = A(I) + C ENDDO ENDDO

Categories : C

Create web services for Data Mining using Business Intelligence Development Studio (BIDS)
I assume here that you know how to write a web service. You should use the adomd.net to fetch your cube data. Refer: ADOMD.NET Client Programming Example: Displaying a grid using ADOMD.NET and MDX Code: AdomdConnection conn = new AdomdConnection(strConn); conn.Open(); AdomdCommand cmd = new AdomdCommand(MDX_QUERY, conn); CellSet cst = cmd.ExecuteCellSet();

Categories : C#

Will Twitter's rate limits allow me to do the data mining necessary to construct a complete social network graph of about 600K users?
I'll answer these questions in reverse order, starting with David Marx first: Well, I do have access to a pretty robust computer research center with a ton of storage capacity, so that should not be an issue. I don't know if the software can handle it, however. Chances are that I will have to scale down the project, which is OK. The idea for me is to start out with a bigger idea, figure out how big it can be, and then pare down accordingly. Following up on Anony-Mousse's question now: Part of my problem is that I am not sure I am interpreting the Twitter rate limits correctly. I'm not sure if it's 15 requests per 15 minutes, or 30 requests per 15 minutes. And I think 1 request will get 5000 followers/friends, so you could presumably collect 75,000 friends or followers every 15 minut

Categories : Twitter

Any specific rules for converting MySQL data to Prolog rules for exploratory mining?
You could maybe use nth1/1 and the "univ" operator, doing something like this: fieldnames(t, [id,this,that]). get_field(Field, Tuple, Value) :- Tuple =.. [Table|Fields], fieldnames(Table, Names), nth1(Idx, Names, Field), nth1(Idx, Fields, Value). You'd need to create fieldnames/2 records for each table structure, and you'd have to pass the table structure along to this query. It wouldn't be terrifically efficient, but it would work. ?- get_field(this, t(testId, testThis, testThat), Value) Value = testThis You could then build your accessors on top of this pretty easily: findThisById(X, This) :- get_field(this, X, This). Edit: Boris points out rightly that arg/3 will do this with even less work: get_field(Field, Tuple, Value) :- functor(Tuple, Table, _), f

Categories : Mysql

Mining pdf Data with python through clipboard - Python Scripting the OS
I have settled on using pyPdf. It has a simple method that just extracts the text from the pdf. I have written simple functions to find the relevant information I need in this text. Splitting the text into list for easy data identification. Have also written a loop to to pick up the relevant files using glob search and feeding it into the parser. import pyPdf pdf = pyPdf.PdfFileReader(open(filename, "rb")) data = '' for page in pdf.pages: data += page.extractText() data2 = data.split(' ')

Categories : Python



© Copyright 2017 w3hello.com Publishing Limited. All rights reserved.