Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
8 votes
1 answer
254 views

TF-IDF Implementation in Java

I have tried following the formulas for Term frequency–Inverse document frequency (TF-IDF) calculation and Cosine similarity calculation, and translated it into code. The results I get seems to be ...
Malde's user avatar
  • 81
5 votes
1 answer
229 views

A simple probabilistic AI for generating random sentences in Java

Motivation I have this repository. It contains a program that analyzes an input text file and builds a word graph: in the graph, each node represents a word in the analyzed text. Now, if there are two ...
coderodde's user avatar
  • 31.7k
1 vote
1 answer
129 views

Data mining in Java: finding undrawn lottery rows - follow-up

(See the previous (initial) iteration.) This time, I have substantially reduced the usage of the final and this keywords. Also, ...
coderodde's user avatar
  • 31.7k
2 votes
2 answers
145 views

Data mining in Java: finding undrawn lottery rows

(See the next iteration.) Introduction Suppose Evil Lottery Inc is interested in not paying millions of dollars back to players. They gather the drawn lottery rows first, after which they mine rows ...
coderodde's user avatar
  • 31.7k
4 votes
1 answer
269 views

General class for basic Statistical Measures mode, arithmetic mean,geometric mean, median , variance ,and stander division functions

I am trying to do general class for basic Statistical Measures mode, arithmetic mean,geometric mean, median , variance ,and stander division functions. I am looking for some general feedback on how I ...
Eslam Ali's user avatar
  • 423
1 vote
3 answers
850 views

General Java class to find mode

I am trying to make general class to find mode. I am looking for some general feedback on how I can improve the structure and efficiency of my code. ...
Eslam Ali's user avatar
  • 423
4 votes
2 answers
1k views

Calculate Geometric and Arithmetic Mean

I am trying to make general class to calculate Geometric mean using exponential of the arithmetic mean of logarithms , and Arithmetic Mean. I am looking for some general feedback on how I can improve ...
Eslam Ali's user avatar
  • 423
9 votes
1 answer
1k views

Cosine Similarity on Huge Dataset

I have a very large data file full of movie ratings that I am looking at for work. I wanted to do this in a clean and very effective manner. The ratings file contains on a per column by column basis: ...
Al-geBra's user avatar
2 votes
2 answers
7k views

Mining association rules in Java

Let \$I = \{ i_1, i_2, \dots, i_d \}\$ be the set of all possible items, and \$T = \{ t_1, t_2, \dots, t_N \}\$ be the (multi)set of all given transactions, where \$t_i \subseteq I\$ for all \$i \in \{...
coderodde's user avatar
  • 31.7k
2 votes
1 answer
23k views

Apriori algorithm for frequent itemset generation in Java

I have this algorithm for mining frequent itemsets from a database. In that problem, a person may acquire a list of products bought in a grocery store, and he/she wishes to find out which product ...
coderodde's user avatar
  • 31.7k
2 votes
1 answer
222 views

CSMR for large-scale text-prcessing

I'm working on a project for large-scale text-processing, which is a first implementation of the basic idea of CSMR. CSMR is an algorithm that measures the similarity between documents by calculating ...
IrishDog's user avatar
  • 131
5 votes
2 answers
6k views

AutoComplete program using the n-gram model

For my Advanced Data Mining class (undergrad) we were to design a program that would predict the next word a user is likely to type via automatic text classification using the n-gram model. The ...
user40915's user avatar