Thread: sitemap.pl Google Sitemap Generator |
|
Tools | Search |
|
#1
|
|
|
We are please to announce the release of sitemap.pl Google Sitemap Generator script.
http://www.c3scripts.com/sitemap/ Dynamically generates Google Sitemap sitemap.xml and sitemap.txt files by automatically scanning your webserver's hard drive for files. Note: Intended for use with real files that are physically on located on your webserver. Does not recognize virtual files generated via .htaccess (mod_rewrite). Add/Delete HTML files on your website and your sitemap will reflect those changes. No more need to manually generate a Google Sitemap. It's all automatic with sitemap.pl. Set it up and forget about it. Lifetime license: $14.95 as shareware -- try 100% fully functional version before you buy. Yours truly, Cusimano.Com Corporation per: David Cusimano |
|
|
#2
|
|
|
I feel what you just said about "Your WEB SERVR". But what if I pay for YAHOOO ! hosting ?
Will it still work ? Steve
__________________
DreamLinux.net | Registered Linux User 453976 | PM me to view our sites. It's a Google thing. |
|
|
#3
|
|
|
Your website needs to support cgi-bin and .htaccess (mod_rewrite) for sitemap.pl to work.
Yours truly, Cusimano.Com Corporation per: David Cusimano |
|
|
#4
|
|
|
I'm looking at this, it looks extremely useful and at a good price
|
|
|
#5
|
|
|
Having some problems, getting only a 400 response, I filled out a trouble ticket.
EDIT: It's working now, I see what took so long. Ir was also mapping my subdomains. Need a little more information on the "Optional Configuration file" that there is no hint on how to format. I read how it is and I looked at the skip file but there is no info on the format of the exclusions file. help? |
|
|
#6
|
|
|
Note that sitemap.pl will display its output only if there is not an actual /sitemap.txt (or /sitemap.xml) file. It appears that your website has an actual .xml file already (DOMAIN.com/sitemap.xml) that was generated by some other sitemap generator software. If you want sitemap.pl to dynamically generate the /sitemap.xml file then you should remove your current /sitemap.xml file.
For the format of the exclusions file, see: http://www.c3scripts.com/sitemap/doc...#configuration The exlusions file is a text file that you can create with a plain text editor such as Windows Notepad. Put one or more exclusions on each line. Seperate each exclusion with a space or a tab or a newline/return. From # to end of line is ignored as comments. To exclude all files with a certain filename ending (e.g.: .zip), specify that ending (e.g.: .zip); the dot is necessary. To exclude a directory and everything in it, specify /DIR/ (e.g.: /downloads/). Supports specifying subdirectories too (e.g.: /downloads/plugins/). To exclude a file, specify the file (e.g.: /downloads/plugins/email.zip). Wildcards are supported. Character * matches zero or more occurrences of any character. Character ? matches zero or more occurrences of any character EXCEPT for the / directory separator character. Yours truly, Cusimano.Com Corporation per: David Cusimano |
|
|
#7
|
|
|
I'll delete my sitemap and try again. I did create the sitemap.txt file following the instructions in the installation guide in the sitemap folder and generated a new DOMAIN.COM/sitemap.txt and it still is spidering all subdomains. Don't suppose there is any way to make it follow the rules in robots.txt?
Thank you for the prompt response, with everyone gone to Boston I was expecting a lo0Oong wait.
|
|
|
#8
|
|
|
I deleted the existing sitemap and the one generated by the script. I reuploaded the sitemap.txt file with the new rules and then went again to DOMAIN.COM/sitemap.txt. It overwrote my rules with an urllist following only the rules in the script and generated a new sitemap.xml (only in the sitemap folder, not where I had deleted the existing sitemap.xml) but with all the subdomains listed again. I had to put back my old sitemap.xml file and delete those generated by the script. The script does not use the rules I put in and I followed the format of the ?skip file. (without the <html> etc. tags. If it generates a sitemap.txt file whenever I visit the DOMAIN.COM/sitemap.txt page it is going to overwrite my rules every time, isn't it? If it is going to generate a sitemap.txt file shouldn't the rules file have another name?
|
|
|
#9
|
|
|
what about dynamic pages..?
detailspage.asp?xxxx I have about 30 k of these.. |
|
|
#10
|
|
|
If we can get clear information on how to write our own exclusion rules then I'm sure you would have no problem. In the instructions it tells you to make a .txt file called sitemap.txt with a list of your exclusions and put it in the same folder with the .pl script. When you run the script it generates a new file called sitemap.txt that overwrites your rules (and ignores the rules you put in) I'm certain there is a way to do it, I just don't know what it is. It does appear to ignore all dynamically generated pages right off the shelf. Since I don't use .asp I cannot tell you for sure.
This is a terrific asset if we can get some configuration help. |
|
|
#11
|
|
|
Save the exclusions file as cgi-bin/sitemap/sitemap-skip.txt
I assume that subdomans are in subdirectories. So exclude those subdirectories. For example, if mybooks.DOMAIN.com is actually the subdirectory /mybooks/ then add /mybooks/ to your exclusions file. Do that for all domains you want to skip. sitemap.pl saves the latest sitemaps at cgi-bin/sitemap/sitemap.txt and cgi-bin/sitemap/sitemap.xml. When you (or a spider) access /sitemap.txt (or /sitemap.xml), the .htaccess file causes cgi-bin/sitemap/sitemap.pl to be run *IF* the request file does not exist (that's why I asked you to delete /sitemap.xml in my previous post). The first thing that sitemap.pl does is see if cgi-bin/sitemap/sitemap.txt (and .xml) exists. If they are less than one hour old, they are output. Otherwise, sitemap.pl scans your site's directory for files, saves the output, then sends it out. Because of the cached sitemaps, subsequent requests within one hour will be processed extremely quickly. The sitemap.pl script only works with actual files stored on the hard drive (on my server it scans at about 500 files per second). It does not work with virtual files since it scans the hard drive, it does not do http accesses (which is much slower). Note that our DySE scripts automatically generate Google Sitemaps so sitemap.pl is not required for DySE sites. A future version of sitemap.pl will allow you to specify URL's that you want included, but sitemap.pl will not spider from there, it will only include those URL's that you specify. That feature would at least enable you to specify the root of your virtual directory, and then it would be up to Google to spider from there (sitemaps don't have to specify every file on your website; Google will spider beyond it). Yours truly, Cusimano.Com Corporation per: David Cusimano |
|
|
#12
|
||
|
Thank you, Thank you! I believe that will resolve the problem. Where I went wrong was this line in the instructions:
Quote:
|
||
|
#13
|
|
|
Bingo!! It works very well now!
![]() It does not include the AvantLink shops which are dynamic, but I believe if I rewrite the sitemap-skip.txt file I can get thim to be included. It excluded all excluded directories and listed only the files I wanted to have listed ![]() OK! Now let me find that link to go and pay.. Done. Another Happy Customer! Last edited by 2busy; August 12th, 2008 at 03:06 PM. |
|
|
#14
|
|
|
The sitemap.pl documentation was updated recently.
It erroneously stated to save the exclusions file to: cgi-bin/sitemap/sitemap.txt << THIS IS WRONG The documentation now says to save the file to: cgi-bin/sitemap/sitemap-skip.txt << THIS IS CORRECT Yours truly, Cusimano.Com Corporation per: David Cusimano |
|
|
#15
|
|
|
Do you need a google xml site map if your site is already indexing properly?
My site http://dealking.com seems to be fully indexed. Can a google site map help? Can it hurt? |
|
|
#16
|
|
|
Google.com states: "In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it."
Google's Webmaster Help Center: - About Sitemaps - Working with Sitemaps Your website already has a sitemap.xml file (created by some other program) but I can't tell how old it is. Note that my sitemap.pl script scans the website's hard drive to find files (discovers about 500 pages per second), it does not spider the website from the outside. Thus URL's with parameters will not be included (e.g.: /search.php?category=123) and dynamic webpages will not be included either. If your website consists mainly of real .html (or .php) files located on the hard drive, then sitemap.pl is a good Google Sitemap generator for you. It runs very fast, and you only need to set it up once and you can then forget about it. Google will use the sitemap as a guide to what to spider. Google does not guarantee that everything will be indexed. Also, Google will spider beyond the sitemap, that is, if Google finds a link to a page that does not appear in your sitemap, Google will spider that page, assuming that page is not blocked via your robots.txt file. With a Google sitemap, you're basically telling the Google spider, "I'll save you some time having to spider my website to find all my webpages; here's a list of all my pages and I'll even tell you the relative priority of each page and how often I typically update each page so you know how often you should revisit." Bottom line is that Google Sitemaps cannot hurt you. BTW, even if you do not use a Google Sitemap (there's no reason why you shouldn't) then you should at least submit your website to Google Webmaster Tools so you can gain access to important indexing information about your website. Yours truly, Cusimano.Com Corporation per: David Cusimano |
|
|
#17
|
|
|
The latest beta version of the sitemap.pl Google Sitemap Generator script now generates a log file showing exactly what was included/excluded and why. The sitemap-log.txt is write to the same directory where sitemap.pl is located. The log is not visible via your web browser; so, unless you have ssh access, FTP the log and open it locally.
The latest beta is available in the sitemap.pl download directory. Information about sitemap.pl is available at the sitemap.pl information page. (as of this post, the beta is v10.03.09-beta; so that version or higher has the log feature) Yours truly, Cusimano.Com Corporation per: David Cusimano
__________________
Affiliate Tools: Datafeed Merge |
|
«
Previous Thread
|
Next Thread
»
| Tools | Search |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Google AdWords VAT | Galant | PPC Search Engines | 0 | September 29th, 2007 06:53 AM |
| Google Trends: boxers v. briefs | Carolyn - ShareASale | ShareASale | 4 | May 11th, 2006 04:45 PM |
| Google Calls us "Thin Affiliates" and Penalizes Us as "Offensive" | Nosmada | Search Engine Insight | 111 | August 9th, 2005 09:32 AM |
| Google AdSense Article in NY Times | Akiva | Midnight Cafe' | 1 | August 4th, 2003 05:17 PM |

