First was the question, which of the libs needed to be supplied for a client to use a remote Katta cluster. Please note that I am referring here to a "canonical" setup with a distributed Lucene index (which I created on Hadoop from data in HBase using a MapReduce job). I found these libs needed to be added, the rest is for the server:
katta-core-0.6.rc1.jar lucene-core-3.0.0.jar zookeeper-3.2.2.jar zkclient-0.1-dev.jar hadoop-core-0.20.1.jar log4j-1.2.15.jar commons-logging-1.0.4.jar
Here is the code for the client, please note that this is a simple test app that expects to get the name of the index, the default Lucene search field and query on the command line. I did not add usage info as this is just a proof of concept.
package com.worldlingo.test; import net.sf.katta.lib.lucene.Hit; import net.sf.katta.lib.lucene.Hits; import net.sf.katta.lib.lucene.LuceneClient; import net.sf.katta.util.ZkConfiguration; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Writable; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.Query; import org.apache.lucene.util.Version; import java.util.Arrays; import java.util.Map; public class KattaLuceneClient { public static void main(String[] args) { try { Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); Query query = new QueryParser(Version.LUCENE_CURRENT, args[1], analyzer).parse(args[2]); // assumes "/katta.zk.properties" available on classpath! ZkConfiguration conf = new ZkConfiguration(); LuceneClient luceneClient = new LuceneClient(conf); Hits hits = luceneClient.search(query, Arrays.asList(args[0]).toArray(new String[1]), 99); int num = 0; for (Hit hit : hits.getHits()) { MapWritable mw = luceneClient.getDetails(hit); for (Map.Entry<Writable, Writable> entry : mw.entrySet()) { System.out.println("[" + (num++) + "] key -> " + entry.getKey() + ", value -> " + entry.getValue()); } } } catch (Exception e) { e.printStackTrace(); } } }
The first part is standard Lucene code were we parse the query string with an analyzer. The seconds part is Katta related as it creates a configuration object, which assumes we have a ZooKeeper configuration in the class path. That config only needs to have these lines set:
zookeeper.embedded=false zookeeper.servers=server-1:2181,server-2:2181
The first line is really only used on the server, so it can be left out on the client. I simply copied the server
katta.zk.properties
to match my setup. The important line is the second one, which tells the client where the ZooKeeper responsible for managing the Katta cluster is running. With this info the client is able to distribute the search calls to the correct Katta slaves.Further along we create a
LuceneClient
instance and start the actual search. Here I simply used no sorting and set the maximum number of hits returned to 99. These two values could be optionally added to the command line parameters but are trivial and not required here - this is a minimal test client after all ;)The last part of the app is simply printing out the fields and their values of each found document. Please note that Katta is using the low-level
Writable
class as part of its response. This is not "too" intuitive for the uninitiated. These are actually Text
instances so they can safely be convert to text using ".toString()".Finally, I also checked the test project into my GitHub account for your perusal. Have fun!
Nice Tutorial
ReplyDeleteBut, I am getting error with zookepper.
following is the log
11/05/18 11:10:34 INFO zookeeper.ClientCnxn:937 - Server connection successful
11/05/18 11:10:34 INFO zkclient.ZkClient:434 - zookeeper state changed (SyncConnected)
11/05/18 11:10:34 WARN zookeeper.ClientCnxn:967 - Exception closing session 0x12ffce416be008e to sun.nio.ch.SelectionKeyImpl@bfea1d
java.io.IOException: Xid out of order. Got 198 expected -8
at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:663)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:719)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
11/05/18 11: INFO zkclient.ZkClient:434 - zookeeper state changed (Disconnected)
Please suggest something...