Page 1 of 2 12 LastLast
Results 1 to 25 of 44
  1. #1
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Illicit Back Link or site scrape?
    Checking through my sites I saw that I had sprung a new 'page' on one site. Google's Webmaster Tools showed as a page on my site. I had never heard of the other site so I did a WHOIS on it and blocked their servers via htaccess. I am just curious, is this some new trick or I just haven't seen it before? When I clicked on the link I was on my own site with that URL showing. All the afflinks still in place. Weird.

  2. #2
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    Proxy servers, some of it deliberate hijacking. Some will display the entire content of other people's pages.

  3. #3
    ABW Veteran Mr. Sal's Avatar
    Join Date
    January 18th, 2005
    Posts
    6,795
    I am just curious, is this some new trick or I just haven't seen it before? When I clicked on the link I was on my own site with that URL showing. All the afflinks still in place. Weird.
    I don't think anyone is trying anything bad with that.

    http :// www. mysite.com/?someothersite.com

    http :// www. mysite.com/?whatever.com
    http :// www. mysite.com/?I-was-here-today

    You can put a ? after the .com/?and_type-anything_ you-want_here

    Some people do that to let you know that they visited that page, some may do it thinking that you might visit their site once you see their link there.

    But I don't think they can do any damage to your site by adding whatever after the ? sign.

    Try this on your site:
    http :// www. 2busy.com/?IwantToTestThisThingNow

    Just make sure not to leave any space between the letters.

  4. #4

  5. #5
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    Yours just looks wierd, not like the other stuff.

    Do a search for a portion of text from the page using quotes and see if it's on any other sites out there. It's a good thing to do every so often, anyway. I've had entire pages AND a homepage that tanked completely copied by sites, including graphics.

    Also do site: searches at Yahoo and see if they've got any strange characters listed appended to pages: like /stuff/?N=M or /stuff?S=A

    Those can mess up the page they're being appended to in Yahoo's index and also affect the rankings for the page.

  6. #6
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    WebWorker, this is absolutely, positively NOT a "proxy" or hijacking attempt. Someone who views a web page using this URL format will see the exact same content as if the same URL were entered without any text after the ? mark.

    I agree with Mr. Sal that it's not a problem at all -- it's certainly not suspicious or troubling. In fact, it's a signal that someone is linking to your site, and wants you to know about it.

    The most logical reason for this format of URL is that someone is referring you traffic, and wants you to know the source (perhaps hoping that you'll find the source site to be worth linking back to).

    I actually do this sometimes, when I post a link to a "nice web site," usually creating a link such as:

    http://forum.abestweb.com/?from=MarkWelch.com

    but the syntax

    http://forum.abestweb.com/?MarkWelch.com

    would be just as valid.

    Nobody's trying to hijack your traffic, server, or bandwidth.

    You should remove any "blocks" you installed that might block legitimate traffic to your web site!

    I am surprised that any Google tools would identify this as a separate web page -- Google knows better!

  7. #7
    Newbie HazelB's Avatar
    Join Date
    June 25th, 2007
    Posts
    74
    Hi Mark,

    Don't dismiss it so quickly. That also happens when someone does an XSS attack on a site to get sneaky backlinks.

    Without divulging too much detail an XSS attack can (among other things) create a link on a site that then is indexed by Google so that the bad guy gets a backlink. It is relatively easy to do still with .edu and many .gov sites but Google detects it. There are newer ways around that though and they look like...
    http://www.mysite.com/?someothersite.com

    To eliminate this possibility, go to..
    http://www.mysite.com
    and save a copy of the page

    then go to...
    http://www.mysite.com/?someothersite.com
    and save a copy of the page

    Then compare them to see if there are any differences besides the appended ?someothersite.com.

    If not then you are okay, if so, then let us know what you found

  8. #8
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    The thing is that when I did the WHOIS search, it said this person has 476 other websites. My afflinks were still in place (maybe there was no intention to replace any?)but it had to have happened over the previous 24 hours because I had just been into my account the day before to remove some URLs on another site. If the other site was something like Maria's Ladies Club or such I would not have been quite as concerned. This person seems to put together "directories" and I don't want to be associated with Adfarms. There are enough people who just put up links without any relativity. I just have never seen a link like that showing up as a page before, with Google complaining that it was blocked by robots.txt.
    I appreciate all the input, I knew someone here could help me figure it out; maybe it is nothing but I will leave the blocks on anyway as I don't like to encourage these links.

  9. #9
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    but the syntax

    http://forum.abestweb.com/?MarkWelch.com

    would be just as valid.
    If you're so sure, show us in the W3 specs whee it says that's a valid syntax for a normal hyperlink.

    http://www.w3.org/

    2busy, if he does it that way, maybe it's Mark Welch logspamming you.

  10. #10
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Webworker, I'm not going to go search for the specification for HTTP URL parameters, but I'm pretty sure we ALL must agree that they are supported, since all the major affiliate systems use them. For example:

    shareasale.com/r.cfm?b=12345&u=67890&m=4321

    The query string after the ? in a URL is interpreted by whatever server software is installed on the computer (it might be Apache, or Microsoft IIS, perhaps in conjunction with PHP or ASP or ColdFusion). If the parameters aren't used by the server application, then they are ignored. I'm pretty sure that nobody is going to create a server application which recognizes a parameter called "markwelch.com"

    If the destination page is static HTML, then the query string will merely be logged, but otherwise ignored.

    When 2busy launched this discussion thread, this is the URL that was identified:
    http://www.mysite.com/?someothersite.com

    Unless the index page for the site at mysite.com is a script that redirects traffic to an appended URL, or which tries to interpret unknown parameters in some bizarre way, then there is NO POSSIBLE DAMAGE that could be done, nor any nefarious purpose that I can recognize.

    From 2busy's description, it actually sounds as if something is "broken" in Google's webmaster tools, if it recognizes
    http://www.mysite.com/?someothersite.com and
    http://www.mysite.com/
    as two separate pages -- normally, Google and other search engines will index only ONE version of a page if the same content is displayed when different ? parameters are appended. Nor has Google treated such page variations as "duplicate content."

    If your server script modifies the HTML or inserts an error message when an unrecognized parameterName is appended, then this would be an issue, but I'm not aware of any server applications that work this way.

  11. #11
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    When I clicked on the link I was on my own site with that URL showing. All the afflinks still in place. Weird.
    There's no justifiable reason for that IMHO, and if I caught it I would definitely try to make sure that the engines did NOT get that page/URL indexed with the identical duplicate content that's legitimately on my own page.

    More than likely using a regex for mod_rewrite in .htaccess would prevent duplicate content from being indexed for another URL with the root host and the TLD of another domain appended.

  12. #12
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    >>surprised that any Google tools would identify this as a separate web page

    Google doesn't index "pages" Mark. Google indexes URLs and each URL gets indexed with its own unique DocID. That's how Google works.

    A page (content) is only supposed to be online with one URL. If the identical content is online with two or more URLs (as the OP indicated that it is), it's duplicate content.

    When I clicked on the link I was on my own site with that URL showing. All the afflinks still in place. Weird
    Do a site: search for their root domain - site:theirdomain.com and see how many pages come up. Do also inurl:theirdomain.com and when doing those, check for what's in the cache for their pages, including some way down into the hundreds and including a search that way that includes something that's supposed to be unique for your page.

    Then compare the affiliate IDs in the cache with the ones on your real page - just out of curiosity. Inquring minds like to double-check these things out, especially with all the cute little cloakers out there who know and use all kinds of tricks most of us could never even dream up if we tried.

  13. #13
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Thank you Webworker for the ideas. I did go through the steps in my GWTools account to request that they remove that URL from their index. The site (mine) is just a few months old and not a big site, maybe a dozen pages, only ran one PPC campaign from there because I haven't added much content yet. It's functional but not optimized yet (if I had time...)
    I get the idea that this person just surfs around and collects sites for his content (?) but I do not understand the workings of a URL like that which is why I posted my question. I think I'll check the offending site further like you suggest. That was why I was suspicious, a link wouldn't gain him that much but I thought maybe he's too lazy to code his own pages.

  14. #14
    Newbie
    Join Date
    December 22nd, 2007
    Posts
    36
    Quote Originally Posted by webworker
    If you're so sure, show us in the W3 specs whee it says that's a valid syntax for a normal hyperlink.

    http://www.w3.org/

    2busy, if he does it that way, maybe it's Mark Welch logspamming you.
    http://xyz.com/?something is a valid url format. that's not something that needs to be debated. the part that comes after the question mark is called the query string, and doesn't necessarily have to include field/value pairs (i.e. ?id=123).

  15. #15
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    Quote Originally Posted by affz
    http://xyz.com/?something is a valid url format.
    Yes, and with the .com appended it is not.

    that's not something that needs to be debated. the part that comes after the question mark is called the query string, and doesn't necessarily have to include field/value pairs (i.e. ?id=123).
    Yes, it can and does need to be debated. What's being done with the .com appended is not a normal query string if the content is being indexed as part of another TLD. And it violiates search engine webmaster guidelines, through no fault of the originator of the content.

    You cannot display someone else's content on a domain that you own without their written permission, and to do so is a copyright violation committed against the original author of the page. In some cases it can also be a trademark violation.

    When I clicked on the link I was on my own site with that URL showing. All the afflinks still in place. Weird.
    If so, run the URL on their domain through an HTTP header checker. Check both the code (302, 301, 200 or whatever):

    http://www.rexswain.com

    And don't post the code or the URLs, but in a general, non-identifying way, let us know what you find out, including whether there's a meta refresh or JS redirect.

    I just found pages on a couple of my sites (a homepage and interior pages) trying to game the engines with JS links and frames, and had to put up a framebuster. And will now be putting up framebuster on all my pages on all sites.

  16. #16
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636

    I have only visited the first one and I read that some time ago but since it didn't have any immediate relevance it slid off the edge for me. The only reason I am concerned now is because of the WHOIS report that said this guy runs 476 other sites. I guess with that much space to fill they get desperate enough to pick on small sites.
    I'm not saying that this is the case here, I have much checking and searching to do, but it does identify a scenario that is more plausible than the friendly linkback from 476 sites person.
    I also ran across another answer that I was not sure about: That "Report Spam" link in the GWTools. I wasn't sure if that was the term they used for this, but it is.
    I need to do a lot more looking around before I think it is innocent and harmless though.

  17. #17
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    Re-read the previous post, I added to it.

    Also check out a few results from here especially the more technical ones:

    http://www.google.com/search?hl=en&r...sh&btnG=Search

    It's been done again recently, only a bit more sophisticated this time around.

  18. #18
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    This is what I got from the Rex Swain tool: (URLs edited out in bold)
    Rex Swain's HTTP Viewer
    http://www.rexswain.com/httpview.html
    Parameters:
    URL = http://www.MySite.com/?SomeOtherSite.com
    UAG = Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7
    AEN =
    REQ = GET ; VER = 1.1 ; FMT = AUTO
    Sending request:

    GET /?SomeOtherSite.com HTTP/1.1
    Host: www.MySite.com
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7
    Connection: close

    • Finding host IP address...
    • Host IP address = 69.89.31.99
    • Finding TCP protocol...
    • Binding to local socket...
    • Connecting to host...
    • Sending request...
    • Waiting for response...
    Receiving Header:
    HTTP/1.1·200·OK(CR)(LF)
    Date:·Sun,·23·Dec·2007·16:53:33·GMT(CR)(LF)
    Server:·Apache/2.2.6·(Unix)·mod_ssl/2.2.6·OpenSSL/0.9.8g·DAV/2·mod_auth_passthrough/2.1·mod_bwlimited/1.4·FrontPage/5.0.2.2635(CR)(LF)
    Last-Modified:·Tue,·18·Dec·2007·06:56:13·GMT(CR)(LF)
    ETag:·"4e0c0d9-3ce5-4418a06401d40"(CR)(LF)
    Accept-Ranges:·bytes(CR)(LF)
    Content-Length:·15589(CR)(LF)
    Connection:·close(CR)(LF)
    Content-Type:·text/html(CR)(LF)
    (CR)(LF)
    End of Header (Length = 355)
    The HTML page that loads is my page with zero alterations. Since the pages are 'handmade' it is easy for me to spot any changes and there are none.

    It may be a week before I can finish all this checking, I only have a little time each day that I'm not working (or asleep). I check in here all through the day but the time to sit in front of my c'puter and do anything in depth is severely limited at this time of year. Virtually everything is on hold to fill orders for the next several weeks. I sincerely appreciate the help and advice I just can't get back to this much right now.
    I have blocked the IP addresses of the offending site, removed the URL from my site in GWTools and filed a spam complaint in GWTools for now. I've made some minor changes to that page and uploaded it so they would not be identical unless it is being called 'live' and not cached; "nocache" is one of my metatags if that makes any difference.. His site shows http://stumbleupon.com/?HisSite.com also and several other much larger sites than mine. It appears to be a regional directory but full of broken links and half finished pages.

  19. #19
    Newbie
    Join Date
    December 22nd, 2007
    Posts
    36
    Quote Originally Posted by 2busy
    This is what I got from the Rex Swain tool: (URLs edited out in bold)

    The HTML page that loads is my page with zero alterations. Since the pages are 'handmade' it is easy for me to spot any changes and there are none.
    i wouldn't worry about this. at worst he's trying to stuff your logs in hopes they are indexed somewhere. at best, it's more of a "ping" to let you know where the traffic is coming from.

    unlike what webworker says, http://xyz.com/?something.com is a perfectly valid url, as it is akin to writing http://xyz.com/?id=something.com (where 'id' can be some key your script looks for). if you took your url and added any sort of key/value pair, it wouldn't affect it at all because your script is not specifically trying to pull that information from the $_GET variable.

    http://forum.abestweb.com/?asdfasdfasdf.com
    http://forum.abestweb.com/?xyz=asdfasdfasdf.com

    neither url will alter the output of ABW's main forum page because the script is not looking for the 'xyz' key or trying to pull the default query string (in the former example).

    MarkWelch was correct in his post above, there's nothing to worry about at all.

    hope that made sense and I hate to see FUD being spread around for nothing.
    Last edited by affz; December 23rd, 2007 at 12:49 PM.

  20. #20
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    The script will look for nothing if there is no script installed looking for dynamic pages.

    Linking to www.anothersite.com/?badsite.com will cause www.anothersite.com/?badsite.com to display the identical content as what's on the homepage of www.anothersite.com and they can both be (and are, particularly and demonstrably by Yahoo) indexed by the search engines.

    FUD my arse. That is plain and simple causing duplicate content if what that rogue page decided to "give as a gift" is getting indexed with an exact duplicate of anothersite.com's homepage. And that is exactly what happens.

    Since when are exact duplicate content issues and situations caused by multiple URLs with the identical content not a problem?

  21. #21
    ABW Ambassador Daniel M. Clark's Avatar
    Join Date
    January 7th, 2006
    Location
    Houston, TX
    Posts
    2,082
    By that logic, what comes after the ? makes no difference, then. If someone were to put a list of links on their page to:

    www.anothersite.com/?badsite.com
    www.anothersite.com/?nothing
    www.anothersite.com/?everything
    www.anothersite.com/?yahoo.com

    it would cause the SE's to index the homepage of www.anothersite.com four times (and consider each one a separate page)? That makes no sense. If that is the way it works, seems to me the SE's need to change their behavior because it leaves sites open to a very, very simple type of attack: make a long list of links, make it look like there's hundreds of pages of duplicate content in the index, get your target kicked out of the index.

    Am I understanding this correctly? Please let me know if I'm not, I could be wrong

  22. #22
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    I am hoping that altering the page's content and blocking the IP and notifying Google will take care of it. That page does not use scripting for the content. That site does not have any CMS. I have an Avantlink merchant there and some GC sections but it is the site's home page that shows the add-on site in the URL. From looking at this persons site I would say that he doesn't know what he's doing or thinks he's doing something different than what is resulting from it. While searching around I found some posts in non-related places mentioning this half finished site and the posts were a few years old. That makes me think that he's trying to figure out how to do what he wants to do with it but hasn't got all the basics down yet. Like a supersite wannabe who doesn't know how to write a link that works right.
    I will definitely keep an eye out for any other oddities. The person is not hiding, is easy to contact and I didn't find any warehouse of collected sites on his 'directory' but I can see that if a site suddenly drops off the index that this type of thing needs to be looked for.
    From the headers returned it looks like my site is hosting his site which I may never understand. I can't find any section of his site that holds anything related to mine. He has a shopping section link but it takes you back to his main page, many links on his site don't work right; but if Google found it it must be out there somewhere. I think. I do appreciate everyone's input because when you are faced with something you haven't seen before and are unfamiliar with, it helps to have as much information as possible to sift through, especially when I can't show the specifics. You learn a lot looking things up.

  23. #23
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    Quote Originally Posted by markwelch
    shareasale.com/r.cfm?b=12345&u=67890&m=4321

    If the destination page is static HTML, then the query string will merely be logged, but otherwise ignored.
    No, not ignored. Indexed. Those URLs are being indexed by Live Search instead of the pages the links are on, and messing up some affiliate sites big time. Yahoo, too. So are the URLs with query strings onsite using a redirect script. Check out the thread in the Shareasale forum about that very thing.

    there is NO POSSIBLE DAMAGE that could be done, nor any nefarious purpose that I can recognize.
    Mark, Just because you can't recognize it, as you say, doesn't mean it doesn't exist. And just because you don't believe any damage can be done doesn't mean it's so.

    two separate pages
    To reiterate, only one page, two URLs. Google indexes by URLs not by "pages".

    normally, Google and other search engines will index only ONE version of a page if the same content is displayed when different ? parameters are appended.
    Bzzzzttt. That is absolutely not so. They sure will index all the URLs that contain the content, but the clustering and dup filtering aspects of the algos for selecting pages when returning results for queries will only allow one version of a page's content, with only one URL to be returned in the results set for a query for the sake of providing a good user experience.

    Not seeing pages returned in the SERPs for a query does not mean all the URLs aren't indexed when they they find them, because they are. It means filters are being applied (either at query time or by pre-processing) and you just can't see them. You have to play around with exact match and special operators to be able to detect the dups.

  24. #24
    ABW Veteran Mr. Sal's Avatar
    Join Date
    January 18th, 2005
    Posts
    6,795
    The person is not hiding, is easy to contact and I didn't find any warehouse of collected sites on his 'directory' but I can see that if a site suddenly drops off the index that this type of thing needs to be looked for.


    This is what I know!

    If you "Yourself" add the ?Whatever.com to your link when you make the link, then it would be there for the SE's to pick up any time they go to that page to index it.


    But.........

    If anybody, even yourself, add the ?Whatever.com to whatever http :// www .site.com/ is showing on the address bar, then there is no way in this planet for ?Whatever.com to do any damage to your page or your site, and unless a search engine reads your logs, I don't see how can the http :// www .site.com/?Whatever.com be listed on the search engine, since that actual http :// www .site.com/?Whatever.com doesn't exist anywhere else but in the address bar when is typed in, and later on your logs file.

    Now, if that whole link http :// www .yoursite.com/?Whatever.com is added to a page on another place, then that's a different story.

    Like I said before, that was also a way to leave a message to the site owner on their stats many years ago, I don't know how many people still do it for that purpose, but even I still do it once in a while, when I want to let someone I know, that I have visited that page on where I have put something similar to this: http :// www .thesite.com/?IwasHereTodayAt2pmAndThePageLooksGood or any other note.

    But if you still want to waste a lot of extra time in theconspiracy theory of the: ?

    Then you must have a lot of free time, or you have some extra special site that, that single guy wants to steal from you.


    Good luck chasing it.




  25. #25
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Thank you for the LOL Mr. Sal! If I added it myself I would be a loon I guess, but I found it in my "Google Webmaster Tools" account listed as a page on my site. It is not listed in my logs. This is a page that Google has found and decided that it is on my site when it is not. As you say yourself
    Now, if that whole link http :// www .yoursite.com/?Whatever.com is added to a page on another place, then that's a different story.
    It is not on my site, not in my logs. I don't know where google found it, but I let them know it is not part of my site. because I do not have time to keep looking for wherever it is I bet if you found a new non-existant page in your account you would try to find out what it was, why it was there and how to make it go away. even if you didn't have any time.

+ Reply to Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. Does graphic hotlinking help back link count?
    By Rexanne in forum Midnight Cafe'
    Replies: 8
    Last Post: February 19th, 2007, 12:01 PM
  2. What is a link back affiliate
    By mellie in forum ShareASale - SAS
    Replies: 20
    Last Post: January 9th, 2007, 11:31 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •