All Questions
Tagged with data-mining java
12 questions
8
votes
1
answer
254
views
TF-IDF Implementation in Java
I have tried following the formulas for Term frequency–Inverse document frequency (TF-IDF) calculation and Cosine similarity calculation, and translated it into code. The results I get seems to be ...
5
votes
1
answer
229
views
A simple probabilistic AI for generating random sentences in Java
Motivation
I have this repository. It contains a program that analyzes an input text file and builds a word graph: in the graph, each node represents a word in the analyzed text. Now, if there are two ...
1
vote
1
answer
129
views
Data mining in Java: finding undrawn lottery rows - follow-up
(See the previous (initial) iteration.)
This time, I have substantially reduced the usage of the final and this keywords. Also, ...
2
votes
2
answers
145
views
Data mining in Java: finding undrawn lottery rows
(See the next iteration.)
Introduction
Suppose Evil Lottery Inc is interested in not paying millions of dollars back to players. They gather the drawn lottery rows first, after which they mine rows ...
4
votes
1
answer
269
views
General class for basic Statistical Measures mode, arithmetic mean,geometric mean, median , variance ,and stander division functions
I am trying to do general class for basic Statistical Measures mode, arithmetic mean,geometric mean, median , variance ,and stander division functions.
I am looking for some general feedback on how I ...
1
vote
3
answers
850
views
General Java class to find mode
I am trying to make general class to find mode. I am looking for some general feedback on how I can improve the structure and efficiency of my code.
...
4
votes
2
answers
1k
views
Calculate Geometric and Arithmetic Mean
I am trying to make general class to calculate Geometric mean using exponential of the arithmetic mean of logarithms , and Arithmetic Mean.
I am looking for some general feedback on how I can improve ...
9
votes
1
answer
1k
views
Cosine Similarity on Huge Dataset
I have a very large data file full of movie ratings that I am looking at for work. I wanted to do this in a clean and very effective manner. The ratings file contains on a per column by column basis:
...
2
votes
2
answers
7k
views
Mining association rules in Java
Let \$I = \{ i_1, i_2, \dots, i_d \}\$ be the set of all possible items, and \$T = \{ t_1, t_2, \dots, t_N \}\$ be the (multi)set of all given transactions, where \$t_i \subseteq I\$ for all \$i \in \{...
2
votes
1
answer
23k
views
Apriori algorithm for frequent itemset generation in Java
I have this algorithm for mining frequent itemsets from a database. In that problem, a person may acquire a list of products bought in a grocery store, and he/she wishes to find out which product ...
2
votes
1
answer
222
views
CSMR for large-scale text-prcessing
I'm working on a project for large-scale text-processing, which is a first implementation of the basic idea of CSMR. CSMR is an algorithm that measures the similarity between documents by calculating ...
5
votes
2
answers
6k
views
AutoComplete program using the n-gram model
For my Advanced Data Mining class (undergrad) we were to design a program that would predict the next word a user is likely to type via automatic text classification using the n-gram model.
The ...