October 27, 2003
THE WHITE HOUSE MEMORY HOLE....Yesterday Atrios reported an
odd thing: someone at the White House had modified their website so that
items related to Iraq were no longer indexed by search engines such as
Google. You could still get to these items by going to the White House
site and searching for them there, but you couldn't get to them by
searching via Google.
I scratched my head. What's the point of that? And didn't they do
some fudging around with their website a couple of months ago too?
What's going on?
Today, Jesse Berney puts the pieces together. It does make sense and it is related to their earlier shenanigans: they want to make it harder to get caught if they do something like that again.
The word "Orwellian" gets tossed around too often, but this really fits. These guys are a real piece of work.
UPDATE: Apparently there's an innocent explanation for this. More here.
Posted by Kevin Drum at October 27, 2003 08:47 AM
Is there a way to manually archive and organize what they've got? That seems like a good Memory Hole project.
It seems like a search engine could be programed easily enough to
ignore the robot.txt file, or alternatively, pay special attention to
the directories listed in the robot.txt file.
Maybe this is something already done, but shouldn't some one be
keeping a list of these things, saving the pages, etc., for history, if
not for politics?
Talk about historical revisionism on the part of the White House. Sheesh...
You know, good ol' unix wget has a spider option. It'll back up pretty much any site you want.
At least until they send the copyright cops after you.
Anybody know how to get this one picked up by Slashdot?
You could very easily write a spidering program that would ignore the robots.txt file.
Let me rephrase that. *I* could very easily write a spidering program
that would ignore the robots.txt file. Could even write somethingt that
would automatically detect revisions. Somebody give me some web space.
This is especially creepy because it's weird and obscure enough that
no one is ever going to call them on it--or if they do, no one will
(Bloggers, as always, excepted.)
Have you seen the robots.txt file?
This is pretty blatant.
wget obeys the robots.txt file, and there are no command flags to
turn that behavior off. You would need (and I was thinking of) a custom
compile of wget that would prevent that feature from working. There may
even be a simple preprocessor directive for it that you can disable.
That's my tax dollars that's paying for that site. I want good access to it.
You can disable observance of robot rules with a Perl app called
"w3mir". It has the added benefit of actually being more featureful than
wget for website mirroring.
You could then create a date-based interface such as is used by waybackmachine.org
You don't need to recompile anything. Just alias whitehouse.gov to
your own proxy and let all requests except those for robots.txt pass
I think recompiling is easier than setting up a proxy server.
Yea, I happened to have a copy of the source of the latest wget
(version 1.9 came out a few days ago), and there is a simple variable
you can set to "0" to turn off obeying robots.txt.
Actually, this is even easier than I thought. You can turn it off with a directive in your wgetrc file (example). Just add the line "robots=off".
This is especially creepy because it's weird and obscure enough
that no one is ever going to call them on it--or if they do, no one will
You know that was my initial reaction, too. But there are people who
can understand this, and its implications, and media that might be
interested in running a story: Wired comes to mind. I suggest we all
send an email to the editors of as many such media vehicles as we can
think of, to point out that this is a very interesting and newsworthy
There's no copyright on government produced documents. Well, there
hasn't been. I wouldn't be surprised if Bush et al change this, in the
name of national security.
I've commented before on this Administration's supporters reluctance
to hold BusCo accountable in the absence of some nebulous standard of
"proof." This action strikes me as red meat for the apologist crowd.
This move is Orwellian indeed. The purpose of the "memory hole" in 1984 was to enable the government to claim that a poor harvest was in fact a record-breaking increase.
This move is no doubt an attempt to abet future goalpost moving.
Sadly, I suspect that conservatives will at least tacitly support the
move, little realizing that any President, of any party, could employ
such obfuscatory methods. *No* president who takes such great pains to
conceal his methods from the American public -- and taxpayer -- is to be
trusted. We may only hear howls of outrage from conservatives when a
Democrat is in the White House, but by then it'll be too late.
For what it's worth, I submitted the story to Slashdot. We'll see if
they pick it up. They have picked up on the blackboxvoting.com story,
although they rejected my submission.
One other thing I'd like to point out is that it would be pretty easy
for the White House to notice that they're getting all these requests
from one address. They could then block access from your IP. This is
defensible if you overload their server (after all, this is why they
have a robots.txt file in the first place).
So I would take steps to make sure it plays nice with their server, and to do it from a number of different locations.
Personally, I'd use one of perl's LWP modules to write a robot for this. Wouldn't take too long, I don't think.
FWIW, Here's what I sent to Wired News. Feel free to copy & send to the media outlets of your choice.
White House blocking Google searches & archiving of pages related to Iraq?
See the link at: http://www.bway.net/~keith/whrobots/
In short: the robots.txt file at whitehouse.gov (
http://www.whitehouse.gov/robots.txt ) has been changed to exclude
search engine indexing (and archiving) of just about every page relating
This follows an event this past August in which the White House
changed the text on a story from the previous May. The original story
had a headline saying "President Bush Announces Combat Operations in
Iraq Have Ended". (You can view the page as it looked then via the
.) In August the page was changed to read "President Bush Announces
MAJOR Combat Operations in Iraq Have Ended" -- as you can see on the
live White House site today:
Numerous bloggers and media sources pointed out this "revisionist
history", which was demonstrable thanks to the indexing services of
Google and the WayBack Machine. Pages such as this one are effectively
blocked from indexing now that the robots.txt file has been updated to
block indexing of Iraq-related material. So if the White House changes
old news stories again, it will be very difficult for anyone to identify
or prove that they have done so.
I think this is a story that Wired readers would find interesting & informative.
Thanks & best regards,
My favorite line from the robots.txt file was:
I can't help thinking that an archive of the White House web site
maintained by Bush opponents would be pretty easily dismissed. Pointing
to documents in Google's cache is great because people trust Google;
nobody is going to accuse them of faking documents to make the White
House look bad (like you'd have to fake anything). But I can easily see
the rightwingers making that claim about anybody to the left of Rush
Is there some relatively neutral party that could be approached about
keeping such an archive? Like, I dunno, the League of Women Voters or
# robots.txt for http://www.ingsoc.gov/
... heh heh
So, for those of us less technically inclined, could somebody tell us
what exactly we're looking at when we open up the link in
MillionthMonkey's post? What is a robot.txt file? What does it do?
It seems to me the best way to go would be to write a spider that
would specifically use the robots.txt file as a guide to where to look
for things the White House hopes will disappear.
Now, if they can just get the major media outlets to play along....
Hey, there's Condi Rice saying that old news casts might contain
secret messages to terrorist cells, and that archives should be shut
And hey, aren't the media players seeking favorable treatment from
the FCC in the consolidation saga?...Well, then, brother Powell over at
the FCC might be able to arrange a little quid pro quo...
Slashdot has already got it but since the site is dominated by
Libertarians and Right Wing pinheads the general consensus is that this
- "Some intern made a mistake...",
- "most of these directories don't exist so it means nothing...",
- "If they were really trying to hide anything then the liberal press would be all over this in a couple of hours..."
are the typical posts.
Dr Morpheus: Never underestimate the fury of a libertarian who thinks his government is hiding something...
Dr Morpheus: Never underestimate the fury of a libertarian who thinks his government is hiding something...
Yeah, but they're bending over backwards trying to explain this as
nothing or defend it as a perfectly reasonable thing to do. Neither
behavior meshes with the ideology of a Libertarian.
Frankly I think Slashdot Libertarians are nothing more than
Conservatives who want to party. Actually I think that pretty much
describes all Libertarians...
In my travels around the web I have come across many a conservative
who calls himself a libertarian who is nothing of the sort. I think
they think libertarian is a hipper label than conservative. Whatever,
they aren't really libertarians.
obeah: There are also cryptographic protocols for certifying a set of documents; search for "digital time-stamping".
Basically, you need a function called a "hash" that takes a string of
bytes and produces a smaller value (usually 16 or 20 or 32 bytes); it's
not feasible to find two strings of bytes that produce the same hash
value. (Example functions include SHA-1 and MD5.)
Every week, download all the new White House documents. Stick them
all together in one big document and run the hash function over them,
getting a hash value. Run a classified ad in the Washington Post or NYT
with this week's hash value. Next week, do the same thing but start
with this week's hash value as your starting point. Publish the
resulting value, and so on.
Now you can no longer retrospectively edit. That would change the
hash values after the edited document, and someone could cross-check
with the published values and realize that they've changed. You could
still edit documents when you put them in the archive, before hashing
them; not sure how to fix that hole, though it would require very good
foresight to insert embarassing disclosures that turn out to be proven
Okay, here's what happened. It's an attempt to cover up changes
they've already made, and get the older versions out of Google, the
Internet Archive, etc. See my full explanation at http://shock-awe.info/archive/000965.php
3309 You can buy viagra from this site :http://www.ed.greatnow.com
8327 Why is Texas holdem so darn popular all the sudden?
6353 get cialis online from this site http://www.cialis.owns1.com
7800 ok you can play online poker at this address : http://www.play-online-poker.greatnow.com
5714 Keep it up! Try Viagra once and youll see. http://viagra.levitra-i.com
7513 Get your online poker fix at http://www.onlinepoker-dot.com
Excellent site. Keep up the good work.
http://www.mapau-online.biz http://www.c-online-casino.co.uk http://www.cd-online-casino.co.uk
7498 black jack is hot hot hot! get your blackjack at http://www.blackjack-dot.com
Totally off the mark man. Just wrong.
Posted by viagra at 2004 August 21 20:00:22 for http://www.viaga-viagra.greatnow.com powered by car hire at http://www.car-hire.greatnow.com and diecast http://www.diecast.greatnow.com
6358 check out the hot blackjack at http://www.blackjack-p.com here
you can play blackjack online all you want! So everyone ~SMURKLE~
6287 Herie http://blaja.web-cialis.com is online for all your black jack needs. We also have your blackjack needs met as well ;-)