» openSUSE package index and search
I've been busy working on a new implementation of our beloved "webpin", which is actually a service for searching for packages in the insane amount of repositories and packages we have, in the distribution, in all openSUSE Build Service repositories, as well as on Packman.
The thing is, it's a bit dated now, and its features are limited by the fact that it's using a relational database to perform search operations. I've been digging into Apache Solr quite a bit over the last few months (did I already mention that it totally rocks? :)) and I thought.. hmm.. why not use that for indexing packages/repositories ?
So I just started out on a quick prototype, to see how well it suits the job as well as how well it performs. The results are quite stunning, to say the least, both in terms of performance (results just take a couple of milliseconds on a search index that includes openSUSE 11.1, 11.2, 11.3, all non-home: repositories in the OBS, as well as Packman for 11.1, 11.2 and 11.3.. that's.. quite a lot) as well as in terms of the quality of results -- but the latter is hardly a surprise, as Solr really excels at that. It's what it has specifically been designed and implemented for, after all.
So there it is, it's already completely functional, and consists of a Solr schema definition as well as a bunch of Perl scripts to crawl, index, verify and query.
The next items on the TODO list are as follows:
- implement a REST API (send a HTTP GET, receive XML in return.. and/or JSON, or whatever else, but definitely XML) that is compatible with the current webpin service, the current command-line client, as well as the client in YaST2
- implement a new web user interface
3 Comments:
Hi Pascal
If you want a JSON interface to a Lucene based full text search (along with a whole lot of other stuff), check out ElasticSearch
And if you're using Perl, then have a look at the Perl API I've written: ElasticSearch.pm v 0.16. (ElasticSearch.pm v 0.18 just released, but still being mirrored by CPAN)
clint
But what's the point of using ElasticSearch as opposed to Solr ?
(and there's the nice WebService::Solr perl module as well ;))
I've read the marketing blurb, but I don't see anything that would make me consider using ElasticSearch instead of Solr (but ElasticSearch definitely looks like a neat piece of software, don't get me wrong). Faceted search? And I know Solr already quite well ;)
How big is the index after you have indexed all the above?
Post a Comment
<< Home