|
Poster:
|
CoJaBo |
Date:
|
August 07, 2012 04:03:21pm |
|
Forum:
|
web
|
Subject:
|
Re: Domainsponsor.com erasing prior archived copies of 135,000+^W 24 million+ domains |
Disallowing the user agent "*" will not cause removal; it will stop crawling as expected, but only the specific user-agent specified to on the removal FAQ page will actually cause removal of past content.
I do agree that there should be something explicitly stating "Remove" in the directive to prevent such a mistake; others off-site have pointed out that some "bad bot blacklists" also include these lines without explanation of what "ia_archiver" actually is- its possible DomainSponsor got it from a source like that and didn't realize it would cause *removal* of the content from the Archive as it would have been far separated from the removal FAQ entry at that stage.
If anyone's been following the list of sites registered to their nameserver (that is, sites being removed from the Archive in this way), its increased nearly two-hundred-fold since I made this post; the current count is over 24 *million*.
I'm not sure if this indicates they are expanding that rapidly or simply that that particular index site is just catching up with their existing registrations; I suspect the latter to be more likely.
|
Poster:
|
Jeremy Leader |
Date:
|
August 07, 2012 04:37:11pm |
|
Forum:
|
web
|
Subject:
|
Re: Domainsponsor.com erasing prior archived copies of 135,000+^W 24 million+ domains |
OK, CoJaBo, thanks for that clarification.
So there's no way to say "Internet Archive, don't crawl my site, but don't delete the archive", while still allowing other crawlers to crawl the site?
|
Poster:
|
CoJaBo |
Date:
|
August 07, 2012 04:40:47pm |
|
Forum:
|
web
|
Subject:
|
Re: Domainsponsor.com erasing prior archived copies of 135,000+^W 24 million+ domains |
It doesn't seem so; the FAQ only mentions those lines for removal, it doesn't seem to give an option for "don't crawl the site anymore, but still keep the existing content".
I had hoped someone from either DomainSponsor or the Archive would have responded to my emails by now.