Results 1 to 15 of 15
  1. #1
    Full Member
    Join Date
    January 18th, 2005
    Posts
    290
    Is there a way to do a more advanced search with PHP other than the Match function of MySql. Let's say I want to search for "Reebok shoes", Is there a way to show only exact matches and not show results like Reebok jackets, etc.?

  2. #2
    Affiliate Manager Allen Nance's Avatar
    Join Date
    January 18th, 2005
    Location
    Colorado River, Bullhead City AZ
    Posts
    1,604
    Now that's a good question WLY.

    Pooper

  3. #3
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Why only show exact matches? You want to show partial matches, but only after the exact matches. Use a mySQL fulltext search to do this. It's an amazingly effective and produces some excellent results. If you want what you're asking for just split the phrase by word and do a where query like this:

    where
    name like "%$word1%"
    and name like "%$word2%"

    etc.

    Good luck!
    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  4. #4
    Full Member
    Join Date
    January 18th, 2005
    Posts
    290
    Thanks Snib for your reply. I'm currently using the fulltext index and it works great. But there is a problem with relevancy. For example if I do a search for "coffee maker", I would also get results for shoes that are coffee in color?? I wonder how to filter these results?

  5. #5
    Animal Lover
    Join Date
    January 18th, 2005
    Location
    oz
    Posts
    1,210
    If you do a search with the AND function you shouldn't get the shoes results because after all, you are searching for result with BOTH coffee AND maker. Unless your shoes results has 'maker' in it as well...

    Oscar
    My DataFeed Scripts - php datafeed scripts for your site
    Shareasale datafeed scripts - to display Shareasale datafeeds
    Linkshare datafeed scripts - for multiple Linkshare merchants

  6. #6
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Are you also searching the product descriptions? You might want to limit what you're searching because the more text you search for each item, the crazier your results will get. Also consider throwing each product's category into the fulltext index. That way the coffee maker will appear first because it'll be in the coffee makers category and the name of the product will give you a match resulting in double relevency. Don't worry about the results after the first 5 or 6 products because every search engine is like this. It also helps show people how many coffee makers you have because once they start seeing coffee colored shoes they realize that's the end of the line.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  7. #7
    Moderator MichaelColey's Avatar
    Join Date
    January 18th, 2005
    Location
    Mansfield, TX
    Posts
    16,232
    MATCH() returns a number that reflects how good of a match it is. You might want to only display the results with the best matches. For instance, if the top result has a relevancy rating of 3.712, you might want to only display results with a relevancy rating of 0.37 or higher. You'll have to experiment to find the right cutoff.

    Another thing that might help is to do a separate MATCH() for the product name and the product description, combine the results, and weight the name higher than the description. You'll still want to eliminate the results that aren't as good.

    Another way to continue to improve the relevancy is the track all your searches and keep track of the CTR from the results page. Adjust those items that get the best CTR for a keyword to rank higher and those with a low CTR to rank lower. You need quite a bit of search results volume to do this, but it can be quite effective.

    Of course most of this will require programming, and not just a simple SQL statement.
    Michael Coley
    Amazing-Bargains.com
     Affiliate Tips | Merchant Best Practices | Affiliate Friendly? | Couponing | CPA Networks? | ABW Tips | Activating Affiliates
    "Education is the most powerful weapon which you can use to change the world." Nelson Mandela

  8. #8
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Some nice ideas there Michael. I was thinking about playing with the relevency rating myself. I do track clicks and know which items are clicked more often than others. I just had an idea of doing an order by relevency + click % against the entire result set. That way the results aren't only ordered by relevency, but the click stats are tied in. I'll have to give that a shot.

    Don't forget boolean matching. That really helps by adding "exact match" -remove +include queries. Since I added that I'm seeing more and more people use these special commands. Very helpful!!

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  9. #9
    Newbie Affiliate Ian's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,540
    Interesting info. And exactly why I am still running an old cgi script for my search.

  10. #10
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    2,419
    I did mine the old fashioned way usign a t_word, t_word_prod and t_products table. The design takes up much more space and lots of coding but was much faster in my testing when you're working with a very large number of products.

    It also allows the ability to handle plurals vs non plurals easily as well the ability to add in synonym processing such as specifying things like autos, automobiles and cars are all considered and joined in the results.

    Do a search using Women Womens Women's ladies lady along with skirts and see if you get the same results - I do .

    This feature alone was worth me doing the design and extra coding myself but the real decision to do this was based on speed.

    Go add a couple million products in your db and do a search and it's most likely slow as molases! My performance is excellent in most cases although there are a couple of areas with words bloated across so many products it can be slower on them. Overall, I'm very pleased with my results. Over 200 datafeeds and the processing fully automated to update them along with a kick butt SE makes me a happypoon.

  11. #11
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    happy,

    I'm interested in your method but I'm not sure what you mean with your t_word, t_word_prod and t_products tables. What is the function of each table? What you're doing sounds nice and I'd definitely like to accomodate several million products.

    Thanks!
    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  12. #12
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    2,419
    *) Each product is put in t_products and assigned an uniqueid.
    *) Each word associated with the product is added to t_word "if needed" and assigned a uniqueid
    *) t_word_prod maps the words that are associated to a product.

    So...
    t_word ends up being a unique listing of all words in all of your products.

    t_word_prod gets HUGE - 1 row for every word in every product IE: If you associate an average of 40 words per product on a 2 million products, you end up with 80 million rows here. Note: I do not necessarily associate an entire description along with a product. Each merchant I process (t_merchant table contains merchant info) specifies where connection info for downloading the feed as well as data such as last_download_size and last_mod_date (only need to process datafeeds that change by adding this) as well as switches that specify what makes up "a product". I can optionally add, keywords, categories, subcategories or parts of the description to end up with what I call a product. This varys per merchant depending on how good their datafeed is and I define this info after a manual review of the feed and adjust as I see crap results coming back in test searches. My biggest pet peeve against datafeeds is where merchants have multiple level categories such as
    ACE HARDWARE HARDWARE/BUILD MATERIAl/MAILBOXES/WEATHERVANES

    If you add these kinds of categories untouched in your datafeed, you will get weathervanes showing up on a search for mailboxes or vice versa. ARRRGH! I handle this by defining the rules "and implemented the coding" that allows me to either not add category data, alter a category either fully replacing it or doing a partial relacement and can even specify just bypass the products in categoryX all together.
    It's time consuming but provides me a great deal of flexibility and most importantly puts me in control of the datafeeds and not the other way around!

    Unfortunately many merchants datafeeds are "garbage" and one of the first DP acronyms I learned was GIGO - "Garbage In" = "Garbage Out" and this certainly applies to datafeeds and the results you get out of any search against them! Put in crap - you will get crap back!

    There is a ton of customized programming with my approach but I do believe it will be worth the efforts I've put into it in the long run.

    It ain't perfect by any means but for my first search engine project I'm happy with the results and will keep tweaking it as I go.

  13. #13
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Thanks for the in-depth breakdown happy. Now when you enter a search query and grab all the relevent rows from your t_word table, how do you order the results after associating the t_products table through the t_word_prod table. Also with 80 million rows, won't the t_word_prod table get a little slow? How did you handle that hurdle? Even with proper indexes I can imagine it would get slow.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  14. #14
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    2,419
    As long as you avoid doing multi table joins on the t_word_prod table, the result is amazingly fast. I do join this table but mostly on instances of itself.

    If I were to attempt a single sql statement to obtain the results (joing t_word, t_word_prod, t_product, t_Merchant, and a few others I use the results would be horrific.

    By getting the core data only "the product_ids" with a where condition that is indexed and no multi table joins is very fast (even on a huge table).

    Knowing where the pitfalls occur regarding performance is the key and reducing joins on huge tables is one key part of the formula.

    Even with me telling you what I have, be assured, the sql and the building of it is not "just as simple as whats stated".

    Ordering the results is a user supplied variable and can be by :
    1) Price - low to high
    2) Price - high to low
    3) Offer - high to low With this one, products for the merchant who offers the highest % cash back are listed first in a price low to price high fashion then the next highest merchant price low to high, etc....

  15. #15
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Excellent explanation happy!

    You couldn't have made it more clear. My only concern is searching the data and returning results based on relevency. It's simple to order by price and % off, but the relevency aspect of a fulltext index is unbeatable. I can see how it's possible to code it up yourself using your scheme, like setting more rows in t_word_prod when a word appears more than once in a product, but it would be cumbersome.

    Good point about joining large tables. Sometimes it's tempting to put everything into a single query, but it's definitely not the most efficient way to go sometimes. Good advice!

    Thanks,
    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  16. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. php search and replace
    By dflsports in forum Programming / Datafeeds / Tools
    Replies: 2
    Last Post: February 23rd, 2006, 10:59 AM
  2. PHP search Box
    By dak142 in forum Programming / Datafeeds / Tools
    Replies: 1
    Last Post: May 7th, 2004, 05:25 AM
  3. php and search bots
    By awstewar in forum Programming / Datafeeds / Tools
    Replies: 4
    Last Post: February 7th, 2004, 05:33 PM
  4. PHP search function help
    By Affiliate Ian in forum Programming / Datafeeds / Tools
    Replies: 8
    Last Post: December 16th, 2003, 09:27 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •