Results 1 to 14 of 14
  1. #1
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    I had an emotionally charged few hours last night (UK time) ...

    Good news is that googlebots and some scooters were crawling all over thousands of new pages on 2 new sites. One is a GoCollect site with page-per-product and amaxon.pl content pulled into every page. The other is a more general site, but again with amazon.pl content on every page (1000s of them).

    In both cases the xml is pulled in with SSI to make it visible to the spiders.

    The bad news was 2 sys admins of web hosting companies screaming blue murder as this activity crucified the servers these sites are on.

    Fortunately, I was around and they happened at slightly different times so I was able to do something about it immediately, but these guys were seriously distraught - and the gocollect site was shut down by one of them until I was ready to go in and fix it.

    I'm not really sure where to go with this.

    My guess is that the load is higher (x2?) first time the pages are found as the script has to process for cache as well as for browser - but won't I be in almost the same situation next month when the googlebots return?

    I think I can write things so the amazon.pl doesn't get run if it's a 'bot looking at the page - but a major reason for designing the pages this way was to provide spider food.

    Is there any difference in server load between SSI and script?

    One of the factors in this may be that I sometimes have more than one amazon.pl call in a page - for example, one search that is related to the page content, but may often have no results, so I allow it to collapse and follow it with a forced more general search. Guess that doubles the load as 2 copies of amazon.pl are being run by that page. Maybe the search could have some conditionals - search for "this" AND search for "that" if "this" is <3 results.

    I have found issues generally with Cusimano page-builder scripts and server resources. On most shared hosts I have to agree to run them around 4am as the server loads go v high when they're running. Maybe there could be a "choke" factor in the page-builder scripts to slow them down?

    I'm under more and more pressure to move to a dedicated server but I'm reluctant to take on the extra tech responsibility and to put all my eggs in one basket, so I'll explore every possible way to postpone the inevitable [img]/infopop/emoticons/icon_biggrin.gif[/img]

    Are you Crazy?

  2. #2
    Crazy Cat Lady Heidi's Avatar
    Join Date
    January 18th, 2005
    Location
    Rochester, NY
    Posts
    1,685
    I have noticed that amazon.pl xml version is making my server resources used go up to unacceptable limits.

    I'm on my own server so don't have the backlash from the web host but this could be a very serious issue for anyone on shared hosting.

    Heidi
    Fit2a-t - Make Money Selling T-Shirts From Your Site!!

  3. #3
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    2,402
    Crazy Guy...first, thanks for thinking of your host...I've been through something similar before and it can cause issues. One thing you may want to look at either with your current host or a different host is a VPS. It is in that mid-range between a shared enviroment and a dedicated. Resources are a little more tolerable and won't necessarily have an effect on other clients on the server....but will effect your own account. If you would like some recommendations, you can e-mail me and I can give the names of some good ones.

    TH Media-Web Solutions For The Small Business
    Check Out The TH Media Affiliate Program

  4. #4
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    Googlebot does spider amazon.pl XML content [img]/infopop/emoticons/icon_smile.gif[/img] Unfortunately, Googlebot seems to like the "spider food" from amazon.pl XML so much that it eats up everything it can find. This situation applies to non-amazon.pl websites too.

    The amazon.pl XML script is not the problem. Any kind of "on-the-fly" dynamic script is going to have this problem.

    Amazon.com has observed that its XML Web Services are being heavily loaded with XML requests (from all applications that use their services) because of Googlebot. See this message at the Amazon.com XML discussion forum.

    We are looking into what can be done in amazon.pl XML to lessen the load when Googlebot stops by your website. If you are having server overload problems, we recommend that you set the pagesize configuration variable in amazon.ini to 10 in the mean time (default is 25). If the load becomes extremely critical, then temporarily disable the script by removing execute permissions on the amazon.pl file (chmod 444).

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  5. #5
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Is there any difference in server load between SSI and script? <HR></BLOCKQUOTE>

    Both SSI and script cause an instance of amazon.pl to be run. The SSI version probably has marginally higher load (but negligible) because the server has to run the SSI statement and integrate the output into the HTML.

    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>I have found issues generally with Cusimano page-builder scripts and server resources. On most shared hosts I have to agree to run them around 4am as the server loads go v high when they're running. Maybe there could be a "choke" factor in the page-builder scripts to slow them down?<HR></BLOCKQUOTE>

    You're right. Keep in mind that some of the page-builder scripts that we offer build the pages at a rate of about 500 to 1000 HTML pages per minute. That would cause a load increase [img]/infopop/emoticons/icon_wink.gif[/img] If you're on a shared server, we recommend that you run the script at low-load hours (e.g.: midnight). Alternatively, you can run the script on your Windows computer at any time and then upload the created HTML files -- if you are an experienced user, you can zip the HTML files in to a .zip file and upload one .zip file to your server and then unzip that on your server. We have considered a "choke" factor but we are not sure how beneficial it would be since the server's memory resources would be in use for an even longer period of time.

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  6. #6
    Newbie
    Join Date
    January 18th, 2005
    Posts
    2
    Would it help to add the googlebot bot to one's robots.txt exclusion file?

    # Your Site 11/12/02 16:02:08
    # Robot Exclusion File -- robots.txt
    # Author: Your Name
    # Last Updated: ---timestamp---

    User-agent: Googlebot
    Disallow: /cgi-bin/

    Regards,

    Mark

  7. #7
    "An Englishman In New York" TJ's Avatar
    Join Date
    January 18th, 2005
    Posts
    3,282
    This is a good idea if you don't want google to suck all of amazon through your search box [img]/infopop/emoticons/icon_wink.gif[/img]

    I prefer to use

    User-agent: *
    Disallow: /cgi-bin/

    To exclude ALL bots from that folder [img]/infopop/emoticons/icon_smile.gif[/img]

  8. #8
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    Mark - thanks, but I'm using SSI to pull the amazon xml data into the page, so strictly speaking the googlebot isn't spidering anything in cgi-bin.

    What I think I'll have to do is make the SSI call a script that checks if the user agent is googlebot and only go on to do the amazon stuff if it's not.

    I have just bit the bullet and signed up for a dedicated server, but I'm progressing slowly and gingerly to move sites across as it's a scary business [img]/infopop/emoticons/icon_eek.gif[/img]

    Are you Crazy?

  9. #9
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    If possible, I would not block /cgi-bin/amazon.pl -- that way spiders will spider more material from your website and thus increase the probability of your website appearing in search engine results.

    The next version of amazon.pl XML will momentarily turn off imageWH if it is a spider that is calling the amazon.pl XML script. This will reduce unnecessary traffic (and thus reduce load on the server). This will not solve the problem entirely but it will improve the situation somewhat (no difference if imageWH is set to no in your amazon.ini).

    For future versions, we are looking at other load-related and spider-related improvements.

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  10. #10
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    532
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by CrazyGuy:
    I have just bit the bullet and signed up for a dedicated server, but I'm progressing slowly and gingerly to move sites across as it's a scary business<HR></BLOCKQUOTE>

    ...that's my next step. I'm just waiting until I can get my monthly combined commissions average to the $250.00/month mark (aprox). I figure that amount should cover a pretty decent dedicated server AND my current hosting bill (spread over 3 seperate hosts) until I can get everything switched over.

    CollectableCrazy.com {} CollectibleCrazy.com {} eTMAC.com {}
    ExchangeReview.com {} Seek360.com

  11. #11
    Full Member
    Join Date
    January 18th, 2005
    Posts
    222
    Can anyone recommend a host that is good for these scripts and has what they require which package size is needed?

    Thanks.

  12. #12
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    532
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by reflections:
    Can anyone recommend a host that is good for these scripts and has what they require which package size is needed?

    Thanks.<HR></BLOCKQUOTE>

    My largest account (and the one I use for my Cusimano scripts) is a 2000MB account. After I upload the gocollect.pl pages (355MB) and some other datafeed pages PLUS my own hand-coded pages that comprise the rest of the site I'll be pushing close to 1200MB of my space used. I also use the space for a couple of smaller sites... (my account allows multiple domains to use the same space/bandwidth allotment)...so with the 1200+/- for the primary site and the secondary sites my space is going to be pretty much shot.

    However I can't recommend this host because since their move to larger facilities/server farm there have been issues with the installed PERL modules. I had to run the amazon.pl script off of another account on a different server because it requires the LWP::Simple module < I think that's the module it needed anyways [img]/infopop/emoticons/icon_smile.gif[/img] >.

    CollectableCrazy.com {} CollectibleCrazy.com {} eTMAC.com {}
    ExchangeReview.com {} Seek360.com

  13. #13
    ABW Ambassador webmarm's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,713
    I don't make sites from the whole shebang datafeed for allposters or gocollect. I've made add-ons for existing sites (themed posters to mini-sites, etc..) and themed not-so mini sites using large categories or selections (in the GC case).

    I use Futurequest.net, and I've never had a problem with being limited by memory for running the scripts (just in case, I do try to run them in off-peak hours). Other hosts I've tried don't allow enough memory for running the scripts.

    I have run the scripts on a basic account (50 MB), and I have one silver account (100MB) filled with the main domain and 5 IRM's for a total of 6 domains of Cusimano script produced sites. There is a limit to 5 IRM's (added domains), and it costs a one time fee of $25 per added domain. Some folks don't like that. Personally, it doesn't bother me at all. Fastest FTP I've ever had, easiest control panel, and I can run the scripts without getting yelled at.

    Let me know if you want an affiliate link [img]/infopop/emoticons/icon_wink.gif[/img]. But seriously, I am hosted at both FQ and THMedia, and I've tried a number of other places. I'm done shopping, as far as I'm concerned.

    - - - - -
    42. Yup, the answer to life, the universe, and everything.

  14. #14
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    Reflections - disk space requirements can vary dramatically according to your page design. A few K on each page x thousands of pages soon adds up. Using CSS and SSI can help keep this down.

    You'll need telnet or SSH access access and certain perl modules installed. These things are common but not universal.

    This isn't a blanket recommendation, but a host I've found useful to try ideas out at is addaction.com. For a $10 signup fee you get 6 months free, then it's (from memory) $6.95/month. For an extra $1/month they'll upgrade your account to unlimited domains/subdoms but you still don't pay for 6 months.

    This plan is 500Mb - so you might squeeze 2 datafeed sites on it, and is otherwise pretty fully featured. They were one of the sites who (understandably) freaked at the googlebot/amazon xml spike, but otherwise fine with me running scripts off-peak.

    The main advantage is you can set a site up, get it in google and see how it performs before paying for hosting.

    Are you Crazy?

  15. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. The bad news and the good news
    By Joe Lilly in forum Midnight Cafe'
    Replies: 16
    Last Post: December 18th, 2006, 03:32 PM
  2. Well...good news and bad news..
    By MoneyBusiness in forum Midnight Cafe'
    Replies: 22
    Last Post: September 14th, 2006, 09:10 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •