Solr分布式搜索系统搭建及使用

5,894浏览
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

今天偶然发现了我四年前写的一篇关于solr搭建使用的文章,当时是因为要在工作中使用到搜索引擎,发现solr不错,于是就使用了它,并将搭建使用过程用文档总结了下来,之后也将文档发给了几位同事,以供参考。

现在既然偶然发现了它,我想就把它发在博客中吧,虽然是四年前的文档,但我想也不会过时,总会有点参考价值的吧。

1.     概述

Solr是Lucene下面的一个用Java写的开源子工程项目,它是一个非常强大的企业级搜索平台,它的主要特征包括强大的全文检索、高亮显示、分面搜索、动态聚类、数据库集成、富文本(比如word、PDF)及地理空间搜索等。Solr高度可扩展,提供分布式搜索及索引复制,它为许多世界上大的网站提供了强大的搜索服务,诸如CNet、Zappos等。

在这里我主要对Solr的分布式搜索特性来予以说明,包括搭建分布式框架,以及如何调用其方法来进行增加、查询、删除等处理。

2.     搭建

Ø   说明

SolrCloud是solr-4.0.0中新引入的分布式特性的名称,使用它,可以构建一个具有高可用性、容错性能高的Solr分布式检索系统。

上面是solr官网中solrCloud方面的一个截图,从图中可以看出SolrCloud采用了ZooKeeper作为节点间的协调器,也就是节点间的状态信息都由ZooKeeper来维护。图中这个集群是由两个分片(Shard)组成,每一个Shard又由两个节点组成,如果我们在每台服务器上只启动一个SolrCloud服务,那么一个节点就对应着一台服务器(我们也可以在一台服务器上启动多个节点)。这两个节点一个为Leader,一个为Replica,Replica相当于是Leader的一个拷贝,也可以直接对Replica请求索引更新操作。如果一个分片的节点都挂掉了,则整个集群将不能提供搜索服务。

在SolrCloud的官方说明中是以内嵌的Jetty作为Web容器来说明的,而且也都只是在一台服务器上创建多个节点来说明的,这与我们实际的应用不符合,所以我在这里是以tomcat作为web容器,在多台服务器上来创建节点搭建集群来予以说明。

下面我将在三台服务器上来搭建SolrCloud集群,由于服务器数量有限,所以只能有两个分片,更多服务器的情况下搭建过程是一样的。

三台服务器分别是218.241.108.84、218.241.108.83、218.241.108.82,Zookeepr的服务将在218.241.108.84上启动。

三台服务器将分别启动三个tomcat(版本为apache-tomcat-7.0.27)服务,端口号分别是8080、8081、8081.

Ø   步骤

步骤一

从Solr官网(http://lucene.apache.org/solr/)上下载不低于4.0的最新版本,截止到目前是4.0.Beta版,下载后上传到三台服务器上的/usr/local目录,然后解压进入到目录

apache-solr-4.0.0-BETA/example/webapps,将solr.war解压开来为solr目录,将整个目录solr复制到三台服务器的目录/home/program(可以为其它目录)里。

步骤二

将apache-solr-4.0.0-BETA/example/下的solr目录及其下内容上传到三台服务器上的/home目录。然后修改solr目录中的solr.xml文件,三台服务器上的修改分别如下:

218.241.108.84:

  <cores adminPath=”/admin/cores” defaultCoreName=”collection1″ host=”master”

hostPort=”8080″ zkClientTimeout=”${zkClientTimeout:15000}”>

  <core name=”collection1″ instanceDir=”.” dataDir=”/home/solr/collection1/data”/>

  </cores>

218.241.108.83

  <cores adminPath=”/admin/cores” defaultCoreName=”collection1″ host=”218.241.108.83″

hostPort=”8081″ zkClientTimeout=”${zkClientTimeout:15000}”>

  <core name=”collection1″ instanceDir=”.” dataDir=”/home/solr/collection1/data”/>

  </cores>

218.241.108.82

  <cores adminPath=”/admin/cores” defaultCoreName=”collection1″ host=”218.241.108.82″

hostPort=”8081″ zkClientTimeout=”${zkClientTimeout:15000}”>

  <core name=”collection1″ instanceDir=”.” dataDir=”/home/solr/collection1/data”/>

  </cores>

在上面的配置中,各配置了一个名为collection1的core(核),host为各自服务器所在地址(也可以为hostname),hostPort对应节点(tomcat)的端口号,zkClientTimeout是超时时间,默认为15秒,dataDir是该节点索引存放路径。

步骤三

分别修改conf/server.xml文件,设定三台服务器tomcat的端口号分别为8080、8081、8081,分别加入以下一行:

<Context path=”/solr” docBase=”/home/program/solr” />

步骤四

分别进入到三台服务器的/home/program/solr/WEB-INF目录,修改web.xml文件,将以下内容的注释去掉:

    <env-entry>

       <env-entry-name>solr/home</env-entry-name>

       <env-entry-value>/put/your/solr/home/here</env-entry-value>

       <env-entry-type>java.lang.String</env-entry-type>

</env-entry>

并修改为:

    <env-entry>

       <env-entry-name>solr/home</env-entry-name>

       <env-entry-value>/home/solr</env-entry-value>

       <env-entry-type>java.lang.String</env-entry-type>

</env-entry>

步骤五

218.241.108.84

进入到/usr/local/apache-tomcat-7.0.27/bin/,新建一个脚本文件:solrCloud.sh,内容为:

#! /bin/sh

java -Xms1024M -Xmx1024M -Dbootstrap_confdir=/home/solr/collection1/conf

-Dcollection.configName=clusterconf -DzkRun -DnumShards=3 -Djava.ext.dirs=./ -jar bootstrap.jar &

218.241.108.83218.241.108.82

进入到/usr/local/apache-tomcat-7.0.27/bin/,新建一个脚本文件:solrCloud.sh,内容为:

#! /bin/sh

java -Xms1024M -Xmx1024M -DzkHost=218.241.108.84:9080 -Djava.ext.dirs=./ -jar bootstrap.jar &

在上面的内容中可以看到ZooKeeper的端口号为9080,这端口是默认的,也就是在节点端口号的基础上加1000,由于ZooKeeper是在218.241.108.84上启动的,而其端口号为8080,所以ZooKeeper的端口号就为9080。

3.     运行

我们先要进入218.241.108.84上启动服务,进入到/usr/local/apache-tomcat-7.0.27/bin/下,运行solrCloud.sh文件,刚开始的时候可能会报找不到其它节点的错,因为其它节点现在还没运行起来,然后依次运行218.241.108.83、218.241.108.82上的solrCloud.sh文件,运行后,在打印的日志中可以看到更新集群等的信息,无误后,在浏览器输入地址:

http://218.241.108.84:8080/solr/,如下图:

注意到左侧的Cloud->Graph链接,点击后如下图:

可以很清楚的在右侧看到一个核(collection1)包含了两个分片,每个分片的节点数情况,节点为灰色的表示不可用的,被选中绿色的为Leader节点,咖啡色的为Replica节点。

然后可以点节点的链接,进入后跟上面的情况是一样的,这样我们就知道SolrCloud的集群安装成功了!

我们还可以在浏览器里运行:

http:// 218.241.108.84:8080/solr/collection1/select?q=*:*

来搜索目前存在的索引,结果例子如图:

当然,刚开始的时候,索引库里都是空的,如果要尽快看到效果,可以通过如下操作来获得一些附带的索引数据,进入到各Solr目录,如:

cd exampledocs

java -Durl=http://localhost:8080/solr/collection1/update -jar post.jar ipod_video.xml

java -Durl=http://localhost:8080/solr/collection1/update -jar post.jar monitor.xml

java -Durl=http://localhost:8080/solr/collection1/update -jar post.jar mem.xml

4.     使用

在这里大概说明一下如何调用Solr的方法来使用Solr分布式检索系统。

在Solr的类包中,提供了一个org.apache.solr.client.solrj包,一般情况下我们对Solr的使用都可以通过调用这个类包里的方法即可。

Ø   索引增加

         private static void add() throws SolrServerException, IOException{

                   String url1 = “http://218.241.108.84:8080/solr/collection1”;

                   String url2 = “http://218.241.108.83:8081/solr/collection1”;

                   String url3 = “http://218.241.108.82:8081/solr/collection1”;

                  

                   String[] urls = new String[]{url1,url2,url3};

                   LBHttpSolrServer lbHttpSolrServer = new LBHttpSolrServer(urls);

                   for (int i=0;i<25;i++){

                            UpdateRequest up = new UpdateRequest();

                            SolrInputDocument doc = new SolrInputDocument();

                            doc.setField( “id”, i );

                            doc.setField( “name”, new Random().nextInt(1000) );

                            up.add( doc );

                            up.process(lbHttpSolrServer);

                            up.clear();                          

                   }

                   lbHttpSolrServer.optimize();

         }

在上面中,我新增了25条记录,索引字段分别是id,name(这些在schema.xml中都已经默认配置好了),然后索引会在三台服务器上都会创建生成。

Ø   索引查询

         private static void query() throws MalformedURLException, SolrServerException{

                   SolrQuery q = new SolrQuery();

//               q.setQuery( “test” );

                   q.setQuery( “*:*” );

                   String url1 = “http://218.241.108.84:8080/solr/collection1”;

                   String url2 = “http://218.241.108.83:8081/solr/collection1”;

                   String url3 = “http://218.241.108.82:8081/solr/collection1”;

                  

                   String[] urls = new String[]{url1,url2,url3};

                   LBHttpSolrServer lbHttpSolrServer = new LBHttpSolrServer(urls);

                   QueryRequest r = new QueryRequest( q );

                   SolrDocumentList docs = r.process(lbHttpSolrServer).getResults();

                   for( SolrDocument d : docs ) {

                            String id = (String)d.get(“id”);

                            String name = (String)d.get(“name”);

                   }                

         }

上面是从三台服务器上进行检索,在检索的过程中会自动进行负载均衡,如果218.241.108.82:8081这台服务挂掉了,不影响检索的结果,但如果218.241.108.83挂掉了,因为分片1中就只有218.241.108.83这个节点,那么此时集群检索服务将不可用。

Ø   索引删除

         private static void delete(String id) throws SolrServerException, IOException{

                   String url1 = “http://218.241.108.84:8080/solr/collection1”;

                   String url2 = “http://218.241.108.83:8081/solr/collection1”;

                   String url3 = “http://218.241.108.82:8081/solr/collection1”;

                   String[] urls = new String[]{url1,url2};

                   LBHttpSolrServer lbHttpSolrServer = new LBHttpSolrServer(urls);

                   UpdateRequest up = new UpdateRequest();

                   up.deleteById(id);

                   up.process(lbHttpSolrServer);

                   up.clear();                          

                   lbHttpSolrServer.optimize();

         }

在上面演示的只是通过id来删除某一条索引,其实还可以通过其它条件,比如查询词等,执行删除后,索引将在所有节点上都被删除掉。

在上面的lbHttpSolrServer.optimize();这句是让索引更新后马上生效。

5.     参考资料

SolrCloud :http://wiki.apache.org/solr/SolrCloud/

jason derulo asks jordin sparks toOf course, sports betting is fun. Really fun especially when your favorite team has won their game and you made a fortune out of it. However, betting is a tricky business. You can bet whenever and however you want, but things do not always fall on your desired Wholesale Jerseys results. Sometimes, it is best to distance yourself (emotionally speaking) from your favorite teams or players because in reality, the athletes you are rooting for do not always win. Betting is all about wise decisions and strategies. If you know your favorite team has very little chance at winning, you have to let it go and bet on their opponents. After all, you came to bookies to earn money, not just merely to support your beloved team.Discussing their improbable friendship In an interview with The Independent on Sunday back in 2006, she said: “He’d never been to therapy and I’ve had enough for both of us, so we started talking quite deeply about his time in the army and the kind of impact that had had. And so on. So I was kind of his shrink/landlady”.At a midday news conference in San Diego, Mayor Kevin Faulconer, county Supervisor Ron Roberts, and City Atty. Jan Goldsmith sought to assure the public and the NFL that a financing plan, an environmental Replica Oakley impact report and a political Cheap nfl jerseys campaign are ready to win public approval, if only the Chargers would return to the bargaining table.We have much work to do, as the bitterly cold winter months close in on us, including more weatherproofing, extending the roof to cover many more dog runs, cheap football jerseys concreting the enclosures and runs and making dry, wooden box beds. With the added expense of food and medical care, this is going to be a costly process.The extension of the AFL’s social influence is dubious at times, but with regard to giving voice to indigenous Australia its influence is needed. Footy is one of Australia’s common conversations, and the airwaves are too often occupied without pause by voices like McGuire’s. If anything comes from this week’s conversation, it ought to be a call for diversity of voice.Although some leagues and school districts have banned this drill as being “too wholesale jerseys china dangerous,” it is no more dangerous than any other type of full contact oakleys outlet drill. The coach should take care, however, when throwing the ball, to select a receiver who is not significantly smaller than the tackler, and should not let this drill go fake oakleys on for too long, with any one player being tackled multiple times.This is a drill meant for summertime practice, at the end of a hot day.

如无转载说明,则均为本站原创文章,转载请注明:来源:子猴博客

相关内容

作者: 子猴

我是博主,欢迎你来我这里,希望你能找到你所需要的内容。

发表评论

电子邮件地址不会被公开。 必填项已用*标注


*

酷!左边勾选上复选框,评论里将显示你博客文章!
:wink: :twisted: :roll: :oops: :mrgreen: :lol: :idea: :evil: :cry: :arrow: :?: :-| :-x :-o :-P :-D :-? :) :( :!: 8-O 8)

Upload Files

你可以上传一张或多张图片,这些图片将附在你评论里