Results 1 to 13 of 13
  1. #1
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Fulltext Search Engines: Sphinx, Lucene and mySQL
    So I've started to push the limits of the mySQL fulltext search. It's just too slow for the number or products I'm working with. I've been looking into alternatives and the two that I'm interested in are Lucene and Sphinx. Now it seems Sphinx might be the easier choice because it has built in functionality to import a mySQL table. Plus it has it's own SQL-like language. I haven't gotten into it enough but it seems like I might even be able to access a Sphinx based table through mySQL. Now what I'd really like to do is access a Sphinx table via a fulltext search while joining it to a myISAM table in mySQL. Is this at all possible? It doesn't seem like this would even be an option for Lucene since it doesn't tie into mySQL at all. In which case I'll have to build an export script that builds a larger index for Lucene to work with. Any thoughts or experiences?

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  2. #2
    ABW Ambassador sjangro's Avatar
    Join Date
    January 18th, 2005
    Location
    Boston
    Posts
    1,529
    Scodtt,

    All my experience is with Swish-e which has turned out to be a fantastic alternative to mySql fulltext searching. I found a php class that interfaces with swish-e and I hacked it up to serve my purposes. I update the index daily after all the product datafeeds load up.

    In addition to the fulltext capabilities, the attributes (like price, color) are really great for filtering results.

    This goes back a few years, before Sphinx appeared. If I had to do it all over again, I think I would start with Sphinx for all the reasons you mention. Unfortunately, I have no experience with it.

    FWIW, we used to use Lucene at Be Free for product searching in reporting.net and it's a great product, but shied away immediately for my stuff because its Java/Tomcat.

  3. #3
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Thanks for the feedback Scott, it's good to hear that you would pick Sphinx as that's where I'm leaning. I was also turned off by the fact that Lucene runs on Java, although I've heard great things about its speed. When it comes down to it anything is better than mySQL, but the interfacing and indexing need to be fluid. Apparently SphinxQL doesn't support joins yet, but it sounds like they'll get them in there which would be fantastic. I'm imagining a database that works much like mySQL in terms of usability with the speed of a dedicated fulltext search engine.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  4. #4
    Affiliate Manager cbsturg's Avatar
    Join Date
    January 24th, 2007
    Location
    Lima OH
    Posts
    753
    Not sure what your underlying structure is, but I'm using Xapian on a Rails app I've recently developed. Supports joins across tables, etc. The only potential negative is that it doesn't update its index automatically (on record insert / delete), but is run from cron. I actually don't mind that so much as I would prefer a user not have to sit through the time needed to reindex (especially if that reindexing fails or hangs for whatever reason).

    It also supports Google-esque search parameters that you can define. For example, on my site I defined that by:[username] should limit search results those submitted by the specified user. "Did you mean?" and related search results are also supported.

    My app is just about to launch publicly, so I haven't raked it over the coals, so to speak, but scalability is supposed to be a feature. I'm also pretty confident you can host your search index across multiple servers cluster style to divvy up load.

    Here's a link to some information about the specific plugin I'm using (called acts_as_xapian). Even if you aren't using Rails, it will give you some good general info on Xapian with some links to external sources.

    http://locomotivation.squeejee.com/p...h-using-xapian

    I'm using this on wiredortired.com, if you want to test the search engine in action. Here's a sample search query if you are interested.
    Chris Sturgill
    "All my life I've had one dream, to achieve my many goals." - H. Simpson

  5. #5
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Thanks Chris, I hadn't crossed Xapian before. I just checked a little into it and it seems it's comparable to Sphinx and Lucene. Do you know if it'll interface with mySQL or if it has its own SQL-like syntax? These are the things that have caught my attention with Sphinx.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  6. #6
    Affiliate Manager cbsturg's Avatar
    Join Date
    January 24th, 2007
    Location
    Lima OH
    Posts
    753
    What language are you working with? I'm assuming from previous conversations that you're a PHP kinda' guy. I know that Xapian builds its own indexes in its own structure, and then queries are called against that. My personal experience with Xapian is from the Rails end, and the plugin I am using uses the same syntax as any other database call. But I haven't broken into the fundamentals to see how the framework is converting those calls, and I haven't had to call them directly.

    That said, I'd be more than willing to poke around a bit. I've got a contact whose pushing 30K+ searches a month through Lucene and he's taking a real hard look at Xapian. Are you looking to run the typical LAMP stack?
    Chris Sturgill
    "All my life I've had one dream, to achieve my many goals." - H. Simpson

  7. #7
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Yes, I'm using PHP. Interesting point about Lucene as I've got a friend who uses it quite successfully with tens of millions of objects. Yes, I'm using a typical LAMP environment. My goal here is to minimize the changes I need to make to my code and hopefully even query the engine through my current mySQL library calls. I could be dreaming though.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  8. #8
    Affiliate Manager cbsturg's Avatar
    Join Date
    January 24th, 2007
    Location
    Lima OH
    Posts
    753
    Lucene has worked well for him, but he's running Java servers solely for search. It's not that Lucene is a bad option, just the weight of Java seems a bit unnecessary.

    Here's a link outlining PHP and Xapian: http://xapian.org/docs/bindings/php/
    Chris Sturgill
    "All my life I've had one dream, to achieve my many goals." - H. Simpson

  9. #9
    Affiliate Manager cbsturg's Avatar
    Join Date
    January 24th, 2007
    Location
    Lima OH
    Posts
    753
    It could realistically be that Sphinx is your best bet with what you want to do. I ran across Xapian and found an plugin that allowed me to have it implemented within 10 minutes. Seriously, a full site search with custom search parameters (ala Google) deployed in 10 minutes. So I thought I'd spread the good word...
    Chris Sturgill
    "All my life I've had one dream, to achieve my many goals." - H. Simpson

  10. #10
    Full Member
    Join Date
    January 18th, 2005
    Posts
    396
    more for my education rather than any other reason - Scott -when you say "mySQL fulltext search. It's just too slow" - what performance are you seeing and what do you want?

  11. #11
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by micheck
    more for my education rather than any other reason - Scott -when you say "mySQL fulltext search. It's just too slow" - what performance are you seeing and what do you want?
    I execute thousands of fulltext searches every day and when I get hit with a lot at the same time mySQL can sometimes get backed up and queries get delayed quite a bit. I'm really just trying to avoid this backup so users visiting my site don't get long load times. mySQL performs very poorly on fulltext searches, so an alternative like we've been discussing is essential for large-scale search engines.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  12. #12
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    So I've finally gotten around to this and compiled Sphinx onto my local box. I'm very pleased with the mySQL integration and can't wait to push it out to my live servers. What's great about it is I can query the fulltext engine while joining the results to my normal MyISAM tables. It's going to require minimal code changes as I just have to tweak my SQL queries slightly. What I'm planning to do is replicate the MyISAM version of my search table onto a second server and create a Sphinx index there multiple times a day. That way generating the index doesn't impact my live server.

    Pretty exciting stuff and I expect the overall speed of my server to increase quite a bit, mostly because I can remove the fulltext index from my original MyISAM table. This should have a huge impact on the speed of my datafeed imports which have been impacting the speed of my front end as well.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  13. #13
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    I just deployed Sphinx the other day and I'm quite pleased with the results. My pages are substantially faster and everything just feels snappier all around. Indexing takes less than a minute and most queries take .1 to .4 seconds. My data set may not be *huge*, but it's certainly enough to push the limits of mySQL. This was definitely a learning experience though, especially when it came to compiling the mySQL plugin.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  14. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Best International Search Engines
    By IndieSourceSD in forum Voting Booth
    Replies: 3
    Last Post: July 16th, 2012, 04:45 AM
  2. Search engine Fulltext with PHP & Mysql
    By clutax in forum Programming / Datafeeds / Tools
    Replies: 1
    Last Post: June 1st, 2005, 04:27 PM
  3. Any search engines in CJ?
    By androidtech in forum Commission Junction - CJ
    Replies: 5
    Last Post: October 10th, 2003, 11:00 PM
  4. PPC Search Engines
    By Kellie aka Ms. B in forum Search Engine Optimization
    Replies: 1
    Last Post: July 23rd, 2002, 08:01 PM
  5. New Search Engines
    By mailman in forum Midnight Cafe'
    Replies: 2
    Last Post: April 5th, 2002, 08:15 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •