I am working Lucene Algorithm in Java.
We have 100K stop names in MySQL Database. The stop names are like
NEW YORK PENN STATION,
NEWARK PENN STATION,
NEWARK BROAD ST,
NEW PROVIDENCE
etc
When user gives a search input like NEW YORK, we get the NEW YORK PENN STATION stop in a result, but when user gives exact NEW YORK PENN STATION in a search input then it returns zero results.
My Code is -
public ArrayList<String> getSimilarString(ArrayList<String> source, String querystr)
{
ArrayList<String> arResult = new ArrayList<String>();
try
{
// 0. Specify the analyzer for tokenizing text.
// The same analyzer should be used for indexing and searching
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
// 1. create the index
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
IndexWriter w = new IndexWriter(index, config);
for(int i = 0; i < source.size(); i++)
{
addDoc(w, source.get(i), "1933988" + (i + 1) + "z");
}
w.close();
// 2. query
// the "title" arg specifies the default field to use
// when no field is explicitly specified in the query.
Query q = new QueryParser(Version.LUCENE_40, "title", analyzer).parse(querystr + "*");
// 3. search
int hitsPerPage = 20;
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// 4. Get results
for(int i = 0; i < hits.length; ++i)
{
int docId = hits[i].doc;
Document d = searcher.doc(docId);
arResult.add(d.get("title"));
}
// reader can only be closed when there
// is no need to access the documents any more.
reader.close();
}
catch(Exception e)
{
System.out.println("Exception (LuceneAlgo.getSimilarString()) : " + e);
}
return arResult;
}
private static void addDoc(IndexWriter w, String title, String isbn) throws IOException
{
Document doc = new Document();
doc.add(new TextField("title", title, Field.Store.YES));
// use a string field for isbn because we don't want it tokenized
doc.add(new StringField("isbn", isbn, Field.Store.YES));
w.addDocument(doc);
}
In this code source is list of Stop Names and query is user given search input.
Does Lucene algorithm work on Large String?
Why Lucene algorithm is not working on Exact String?