Results 1 to 19 of 19
  1. #1
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,205
    I'm tearing out my hair trying to figure out why my biggest content site isn't getting spidered. Not all the readable content is there, but the "geek list" part of it is: www.snlpeople.com

    It won't take down anything more than the "season list" pages, the privacy policy, and the links page. IOW, it only thinks that I have 33 pages. It won't eat anything more. Logs confirm that Googlebot (regular googlebot, mediabot's not scared) will not go beyond it. Doesn't even try.

    Any Ideas?

    Consistency is the key to a winning season.

  2. #2
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Nunya, Business
    Posts
    23,684
    Don't know, i think MarkyMark can help you. Maybe a site map, Google loves those.

    "The successful man is the average man, focused."

  3. #3
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    So it's getting the season pages but not the people, movie, books and TV show pages ? Is that right ? Hmm....well. Site map is the first thing to do - Trust is right on that. Linked off every page.

    I'm spidering it right now to see what's going on. I think it's just a little afraid of the id= parameters, but is confident enough to spider the pages that are linked from every page (ie: the season stuff). I'll report back in a min.

    Search Engine Positioning - 1 Design 4 Life

  4. #4
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    How many pages has this site got. I've spidered 781 so far and it's still going. I can't see what's going on until the spider finishes, but I can stop it if you've got less than that. Please let me know.

    Search Engine Positioning - 1 Design 4 Life

  5. #5
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    OK, I stopped it. Part of your problem is a number of internal server errors - here's a partial list.

    Referenced from season.asp?sid=10:
    people.asp?id=358 [500]: Server Error

    Referenced from season.asp?sid=13:
    people.asp?id=462 [500]: Server Error

    Referenced from season.asp?sid=16:
    people.asp?id=462 [500]: Server Error
    people.asp?id=572 [500]: Server Error

    Referenced from season.asp?sid=17:
    people.asp?id=607 [500]: Server Error
    people.asp?id=608 [500]: Server Error

    Referenced from season.asp?sid=18:
    people.asp?id=629 [500]: Server Error
    people.asp?id=462 [500]: Server Error
    people.asp?id=634 [500]: Server Error

    Referenced from season.asp?sid=19:
    people.asp?id=648 [500]: Server Error
    people.asp?id=653 [500]: Server Error

    Referenced from season.asp?sid=2:
    people.asp?id=21 [500]: Server Error
    people.asp?id=82 [500]: Server Error

    Referenced from season.asp?sid=20:
    people.asp?id=682 [500]: Server Error
    people.asp?id=687 [500]: Server Error
    people.asp?id=696 [500]: Server Error

    Referenced from season.asp?sid=21:
    people.asp?id=715 [500]: Server Error
    people.asp?id=718 [500]: Server Error
    people.asp?id=722 [500]: Server Error
    people.asp?id=723 [500]: Server Error
    people.asp?id=462 [500]: Server Error
    people.asp?id=733 [500]: Server Error
    people.asp?id=696 [500]: Server Error

    Referenced from season.asp?sid=22:
    people.asp?id=747 [500]: Server Error
    people.asp?id=723 [500]: Server Error
    people.asp?id=572 [500]: Server Error
    people.asp?id=687 [500]: Server Error
    people.asp?id=462 [500]: Server Error

    Referenced from season.asp?sid=23:
    people.asp?id=760 [500]: Server Error
    people.asp?id=762 [500]: Server Error
    people.asp?id=607 [500]: Server Error
    people.asp?id=783 [500]: Server Error
    people.asp?id=785 [500]: Server Error

    Referenced from season.asp?sid=24:
    people.asp?id=715 [500]: Server Error
    people.asp?id=682 [500]: Server Error
    people.asp?id=804 [500]: Server Error

    Referenced from season.asp?sid=25:
    people.asp?id=607 [500]: Server Error
    people.asp?id=462 [500]: Server Error
    people.asp?id=827 [500]: Server Error
    people.asp?id=747 [500]: Server Error
    people.asp?id=832 [500]: Server Error

    Referenced from season.asp?sid=26:
    people.asp?id=843 [500]: Server Error
    people.asp?id=851 [500]: Server Error
    people.asp?id=696 [500]: Server Error
    people.asp?id=861 [500]: Server Error
    people.asp?id=866 [500]: Server Error

    Referenced from season.asp?sid=27:
    people.asp?id=874 [500]: Server Error
    people.asp?id=880 [500]: Server Error
    people.asp?id=629 [500]: Server Error
    people.asp?id=886 [500]: Server Error
    people.asp?id=894 [500]: Server Error
    people.asp?id=895 [500]: Server Error
    people.asp?id=896 [500]: Server Error
    people.asp?id=900 [500]: Server Error

    Referenced from season.asp?sid=28:
    people.asp?id=851 [500]: Server Error
    people.asp?id=914 [500]: Server Error
    people.asp?id=916 [500]: Server Error
    people.asp?id=923 [500]: Server Error
    people.asp?id=926 [500]: Server Error

    Referenced from season.asp?sid=3:
    people.asp?id=110 [500]: Server Error

    Referenced from season.asp?sid=4:
    people.asp?id=155 [500]: Server Error

    Referenced from season.asp?sid=7:
    people.asp?id=247 [500]: Server Error
    people.asp?id=270 [500]: Server Error

    Referenced from season.asp?sid=8:
    people.asp?id=280 [500]: Server Error
    people.asp?id=294 [500]: Server Error
    people.asp?id=306 [500]: Server Error

    Referenced from season.asp?sid=9:
    people.asp?id=323 [500]: Server Error

    The other part of the problem is the sheer number of pages combined with the fact that they're dynamic.

    You need two things - firstly, an ISAPI rewrite filter to rewrite all the URLs as static URLs. IE: people.asp?id=1 would become people.asp/id/1 . That will help Google get around the site. You can buy some good ISAPI rewrite filters on the web - just do a search. They're easy to install if you're using IIS server.

    Secondly, you need a bunch of site maps (one ain't gonna do it) - one for each category, ie: season, people, media etc. Some of these may need to be divided into more than one page.

    Lastly, you're gonna have to look at what is causing all those internal server errors (and I promise it wasn't my spider - but Googlebot would soon go away confronted with all that).

    Combine all these things and you'll have all 1,000 odd pages spidered. I Promise.

    Search Engine Positioning - 1 Design 4 Life

  6. #6
    Member
    Join Date
    January 18th, 2005
    Posts
    110
    I think markymark is right, it probably has something to do with "id=". Your season links use "sid=", and they are ok.
    It has been said by a google person, that ID= is a no, no.


    If I were you, I would change id= to something totally different like "name=", "artist=", etc.

  7. #7
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    I don't think that will be enough, defanjos, as Googlebot isn't spidering the media pages either which have the parameter mediaID= .

    Search Engine Positioning - 1 Design 4 Life

  8. #8
    You are in, or you are out ... choose!
    Join Date
    January 18th, 2005
    Posts
    459
    >It has been said by a google person, that ID= is a no, no.

    Really? Who said that? Google is fine with dynamic URLs but the effectiveness is inversely proportional to the number of parameters. One parameter is fine (id=3), two is touchy, (id=4&page=5), three is asking for trouble. However, there are other factors which are important in getting dynamic URLs spidered.

    So I don't see your URLs as a problem, I have many similar pages doing very well in the SERPs. What you *do* need to drive Googlebot down into your lower pages is a high Root PR. Each time Gbot follows a link from a dynamic page to another dynamic page it reduces the estimated PR by one until it gets to Zero and stops. By raising your Root PR you increase the reach into your site. You are 4 at the moment, you need some links from High PR sites (7+) to get to 6 or above. That will then entice Gbot and other bots to take a closer look.

    Then combine that with markymark's suggestion of site maps. If your PR is say 6, then you can expect Gbot to venture 5 clicks into the site before expiring. So make up some maps pointing to important nodes 4 clicks into the site and then 4 clicks on from that and so on. IE, entice Gbot deep into your site and let her find her own way out. You already have that with your seasons, try some Alpha listings of people and so on.

    As to the 500 errors, this could be server load from a highly populated server, in which case move servers, or it could be data source problems. What are you using for your database? If Access then make sure it is Access 2000 plus as the drivers for Access 97 are becoming deprecated and some hosts already do not support it.

    Woz

    dWoz - serious webmaster resources.

  9. #9
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    Woz, generally you're right about the dynamic URL parameters, but in this case Googlebot should be spidering some of the people and media pages and it ain't. I still think an ISAPI filter would be the way to go.

    Search Engine Positioning - 1 Design 4 Life

  10. #10
    Full Member
    Join Date
    January 18th, 2005
    Posts
    303
    Must agree, Its the SID parts, googlebot thinks those are session IDs.
    A simple rewrite rule can cure that.
    (if you can use mod_rewrites )

    ---------------------------------------
    But it beats a real j.o.b.

  11. #11
    You are in, or you are out ... choose!
    Join Date
    January 18th, 2005
    Posts
    459
    >I still think an ISAPI filter would be the way to go.

    Ultimately yes, but it requires access to the server for installation which is not always available with some (most?) hosts. Regardless a higher Root PR would help in driving Gbot further into the site with either Dynamic or rewritten URLs.

    >Its the SID parts, googlebot thinks those are session IDs.
    No, actually the season pages (SID=) do seem to be getting spidered. It is the others that are not getting spidered.

    >mod_rewrites
    Psst, mod_rewrite is nix, this is ASP on Win.

    The 500 errors that markymark found are raising the red flags with me though. I would look at reducing those first and see how that affects things.

    Woz

    dWoz - serious webmaster resources.

  12. #12
    Full Member
    Join Date
    January 18th, 2005
    Posts
    303
    Oh, yes, they asp gave that clue away.
    Wasn't paying attention to that

    ---------------------------------------
    But it beats a real j.o.b.

  13. #13
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,205
    There's almost 940 people, I lost track of how many media there are. Easily 400+, and I'm nowhere near done with that little project.

    Needless to say, sitemap(s) will take FOREVER.

    I looked at a couple of pages you listed 500 errors for, and I'm getting errors in IE, but after half of the page has rendered: I get a nice little ADODB field error at the other half.

    When I get the problem fixed, I'll let you spider it again.

    Consistency is the key to a winning season.

  14. #14
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,205
    Problem is fixed, feel free to spider it.

    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>So I don't see your URLs as a problem, I have many similar pages doing very well in the SERPs. What you *do* need to drive Googlebot down into your lower pages is a high Root PR.<HR></BLOCKQUOTE> I have upwards of 10-15 links to people.asp?id=[xxx] on the front page. Googlebot won't touch them. This is the part I don't get.

    Consistency is the key to a winning season.

  15. #15
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    2,341
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by weisinator:


    Needless to say, sitemap(s) will take FOREVER.

    <HR></BLOCKQUOTE>

    Yes, at least 10 minutes


    Download Site Map Creator Today

  16. #16
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,205
    I thought about it some more: sitemaps contain links to all your pages and are linked to from all your pages, right?

    The season pages link to all my people pages, and are linked to from all my other pages. How is a sitemap different, how would it behave differently?

    (I'm not worried 'bout my media pages just yet.)

    Consistency is the key to a winning season.

  17. #17
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,205
    I just checked my logs. MSNbot is not scared of anything in this site. It crawled over 1100 unique pages in the last 8 days.

    Consistency is the key to a winning season.

  18. #18
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,205
    Update: Google has eaten 628 pages in the last 24 hours.

    -----------------------------
    It's better to write something crappy that you can improve upon later than it is to write nothing. - Comedian Mike Myers

  19. #19
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,205
    927 is loads better than 46

    -----------------------------
    It's better to write something crappy that you can improve upon later than it is to write nothing. - Comedian Mike Myers

  20. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Content site - static or dynamic?
    By anabayan in forum Newbie Affiliate FAQs & Helpful Articles
    Replies: 6
    Last Post: September 3rd, 2008, 01:28 AM
  2. add dynamic content to your site in seconds
    By fine in forum Search Engine Optimization
    Replies: 11
    Last Post: October 12th, 2005, 04:09 PM
  3. DySE and Dynamic Content
    By Nosmada in forum Cusimano.com Scripts
    Replies: 0
    Last Post: June 17th, 2005, 12:35 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •