Data mining is the process of analyzing large amounts of data in order to find patterns and commonalities.

learn more… | top users | synonyms

0
votes
1answer
64 views

Code for finding repeated entries with different data

project is the data frame. For the purpose of the code, HOUSE.NO is a column of the type character, and ...
3
votes
1answer
100 views

Mining association rules in Java

Let \$I = \{ i_1, i_2, \dots, i_d \}\$ be the set of all possible items, and \$T = \{ t_1, t_2, \dots, t_N \}\$ be the (multi)set of all given transactions, where \$t_i \subseteq I\$ for all \$i \in ...
2
votes
1answer
48 views

Nested loops - Random Forest, multiple parameters

I'm writing a code which task is to grow Random Forest trees based on multiple parameters. In short: Firstly, I declare a data frame in which model parameters and some stats will be saved. Secondly, ...
5
votes
2answers
95 views

Document term matrix in Clojure

This is my very first foray into Clojure (I'm normally a Python-pushing data-type). I'm trying to create a simple term-document matrix as a vector of vectors, out of a vector of strings. For those ...
4
votes
1answer
81 views

Data analytics on static file of 50,000+ tweets

I'm trying to optimize the main loop portion of this code, as well as learn any "best practices" insights I can for all of the code. This script currently reads in one large file full of tweets (50MB ...
1
vote
1answer
4k views

Apriori algorithm for frequent itemset generation in Java

I have this algorithm for mining frequent itemsets from a database. In that problem, a person may acquire a list of products bought in a grocery store, and he/she wishes to find out which product ...
4
votes
3answers
111 views

Analyze very large sets of engineering data from Excel files

I am an electrical power engineer with some programming skills. My boss asked me to make a program which could analyze very large data, make some calculations and give the result. The task looks like ...
2
votes
1answer
139 views

File parser to extract data from text file

I am trying to extract the data from input file and store it for plotting. I have tested this code for a few files of same format. I am not sure if the code works correctly with the little change in ...
2
votes
0answers
141 views

C# port of data mining algorithm much slower than reference implementation

I was trying to implement the algorithm specified in this research paper (please ignore the math, since it's irrelevant to the question). This algorithm is very basic in formal concept analysis. The ...
3
votes
0answers
88 views

Frequent subgraph mining program

I'm trying to make a programme that reads graphs from a .txt file, puts them in a vector, and finally puts the frequent closed graphs in another resulting file. ...
4
votes
1answer
1k views

Implementation of KNN in R

I have implemented the K-Nearest Neighbor algorithm with Euclidean distance in R. It works fine but takes tremendously huge time than the library function (get.knn). Please point out the possibility ...
2
votes
1answer
158 views

CSMR for large-scale text-prcessing

I'm working on a project for large-scale text-processing, which is a first implementation of the basic idea of CSMR. CSMR is an algorithm that measures the similarity between documents by calculating ...
5
votes
2answers
3k views

AutoComplete program using the n-gram model

For my Advanced Data Mining class (undergrad) we were to design a program that would predict the next word a user is likely to type via automatic text classification using the n-gram model. The ...
3
votes
1answer
6k views

Apriori algorithm using Pandas

I want to optimize my Apriori algorithm for speed: ...
5
votes
1answer
4k views

Alternative to Python's Naive Bayes Classifier for Twitter Sentiment Mining

I am doing sentiment analysis on tweets. I have code that I developed from following an online tutorial (found here) and adding in some parts myself, which looks like this: ...