October 27, 2003

THE WHITE HOUSE MEMORY HOLE....Yesterday Atrios reported an odd thing: someone at the White House had modified their website so that items related to Iraq were no longer indexed by search engines such as Google. You could still get to these items by going to the White House site and searching for them there, but you couldn't get to them by searching via Google.

I scratched my head. What's the point of that? And didn't they do some fudging around with their website a couple of months ago too? What's going on?

Today, Jesse Berney puts the pieces together. It does make sense and it is related to their earlier shenanigans: they want to make it harder to get caught if they do something like that again.

The word "Orwellian" gets tossed around too often, but this really fits. These guys are a real piece of work.

UPDATE: Apparently there's an innocent explanation for this. More here.

Posted by Kevin Drum at October 27, 2003 08:47 AM | TrackBack


Is there a way to manually archive and organize what they've got? That seems like a good Memory Hole project.


Posted by: saranwarp at October 27, 2003 08:54 AM | PERMALINK

It seems like a search engine could be programed easily enough to ignore the robot.txt file, or alternatively, pay special attention to the directories listed in the robot.txt file.

Posted by: jri at October 27, 2003 08:58 AM | PERMALINK

Maybe this is something already done, but shouldn't some one be keeping a list of these things, saving the pages, etc., for history, if not for politics?

Posted by: James E. Powell at October 27, 2003 09:07 AM | PERMALINK

Talk about historical revisionism on the part of the White House. Sheesh...

Posted by: David W. at October 27, 2003 09:14 AM | PERMALINK

You know, good ol' unix wget has a spider option. It'll back up pretty much any site you want.

At least until they send the copyright cops after you.

Posted by: Nimrod at October 27, 2003 09:14 AM | PERMALINK

Anybody know how to get this one picked up by Slashdot?

Posted by: uh_clem at October 27, 2003 09:14 AM | PERMALINK

You could very easily write a spidering program that would ignore the robots.txt file.

Let me rephrase that. *I* could very easily write a spidering program that would ignore the robots.txt file. Could even write somethingt that would automatically detect revisions. Somebody give me some web space.

Posted by: taktile at October 27, 2003 09:15 AM | PERMALINK

This is especially creepy because it's weird and obscure enough that no one is ever going to call them on it--or if they do, no one will care.

(Bloggers, as always, excepted.)

Posted by: Emma at October 27, 2003 09:29 AM | PERMALINK

Have you seen the robots.txt file?

This is pretty blatant.

Posted by: MillionthMonkey at October 27, 2003 09:29 AM | PERMALINK

wget obeys the robots.txt file, and there are no command flags to turn that behavior off. You would need (and I was thinking of) a custom compile of wget that would prevent that feature from working. There may even be a simple preprocessor directive for it that you can disable.

Posted by: taktile at October 27, 2003 09:32 AM | PERMALINK

That's my tax dollars that's paying for that site. I want good access to it.

Posted by: tristero at October 27, 2003 09:36 AM | PERMALINK

You can disable observance of robot rules with a Perl app called "w3mir". It has the added benefit of actually being more featureful than wget for website mirroring.

You could then create a date-based interface such as is used by

Posted by: Jeff at October 27, 2003 09:37 AM | PERMALINK

You don't need to recompile anything. Just alias to your own proxy and let all requests except those for robots.txt pass through.

Posted by: MillionthMonkey at October 27, 2003 09:38 AM | PERMALINK

I think recompiling is easier than setting up a proxy server.

Posted by: taktile at October 27, 2003 09:49 AM | PERMALINK

Yea, I happened to have a copy of the source of the latest wget (version 1.9 came out a few days ago), and there is a simple variable you can set to "0" to turn off obeying robots.txt.

Posted by: taktile at October 27, 2003 09:54 AM | PERMALINK

Actually, this is even easier than I thought. You can turn it off with a directive in your wgetrc file (example). Just add the line "robots=off".

Posted by: taktile at October 27, 2003 09:58 AM | PERMALINK

This is especially creepy because it's weird and obscure enough that no one is ever going to call them on it--or if they do, no one will care.

You know that was my initial reaction, too. But there are people who can understand this, and its implications, and media that might be interested in running a story: Wired comes to mind. I suggest we all send an email to the editors of as many such media vehicles as we can think of, to point out that this is a very interesting and newsworthy story.

Posted by: sockeye at October 27, 2003 10:01 AM | PERMALINK

There's no copyright on government produced documents. Well, there hasn't been. I wouldn't be surprised if Bush et al change this, in the name of national security.

Posted by: Emma Anne at October 27, 2003 10:12 AM | PERMALINK

I've commented before on this Administration's supporters reluctance to hold BusCo accountable in the absence of some nebulous standard of "proof." This action strikes me as red meat for the apologist crowd.

This move is Orwellian indeed. The purpose of the "memory hole" in 1984 was to enable the government to claim that a poor harvest was in fact a record-breaking increase.

This move is no doubt an attempt to abet future goalpost moving. Sadly, I suspect that conservatives will at least tacitly support the move, little realizing that any President, of any party, could employ such obfuscatory methods. *No* president who takes such great pains to conceal his methods from the American public -- and taxpayer -- is to be trusted. We may only hear howls of outrage from conservatives when a Democrat is in the White House, but by then it'll be too late.

Posted by: Gregory at October 27, 2003 10:13 AM | PERMALINK

For what it's worth, I submitted the story to Slashdot. We'll see if they pick it up. They have picked up on the story, although they rejected my submission.

One other thing I'd like to point out is that it would be pretty easy for the White House to notice that they're getting all these requests from one address. They could then block access from your IP. This is defensible if you overload their server (after all, this is why they have a robots.txt file in the first place).

So I would take steps to make sure it plays nice with their server, and to do it from a number of different locations.

Personally, I'd use one of perl's LWP modules to write a robot for this. Wouldn't take too long, I don't think.

Posted by: M. at October 27, 2003 10:18 AM | PERMALINK

FWIW, Here's what I sent to Wired News. Feel free to copy & send to the media outlets of your choice.

White House blocking Google searches & archiving of pages related to Iraq?

See the link at:

In short: the robots.txt file at ( ) has been changed to exclude search engine indexing (and archiving) of just about every page relating to Iraq.

This follows an event this past August in which the White House changed the text on a story from the previous May. The original story had a headline saying "President Bush Announces Combat Operations in Iraq Have Ended". (You can view the page as it looked then via the WayBack machine: .) In August the page was changed to read "President Bush Announces MAJOR Combat Operations in Iraq Have Ended" -- as you can see on the live White House site today:

Numerous bloggers and media sources pointed out this "revisionist history", which was demonstrable thanks to the indexing services of Google and the WayBack Machine. Pages such as this one are effectively blocked from indexing now that the robots.txt file has been updated to block indexing of Iraq-related material. So if the White House changes old news stories again, it will be very difficult for anyone to identify or prove that they have done so.

I think this is a story that Wired readers would find interesting & informative.

Thanks & best regards,

John Salmon

Posted by: sockeye at October 27, 2003 10:52 AM | PERMALINK

My favorite line from the robots.txt file was:


Who knew?

Posted by: Satan luvvs Repugs at October 27, 2003 11:04 AM | PERMALINK

I can't help thinking that an archive of the White House web site maintained by Bush opponents would be pretty easily dismissed. Pointing to documents in Google's cache is great because people trust Google; nobody is going to accuse them of faking documents to make the White House look bad (like you'd have to fake anything). But I can easily see the rightwingers making that claim about anybody to the left of Rush Limbaugh.

Is there some relatively neutral party that could be approached about keeping such an archive? Like, I dunno, the League of Women Voters or something.

Posted by: obeah at October 27, 2003 11:26 AM | PERMALINK

# robots.txt for

User-agent: *
Disallow: /cgi-bin
Disallow: /search
Disallow: /query.html
Disallow: /help
Disallow: /agencycontact/eurasia
Disallow: /agencycontact/eastasia
Disallow: /appointments/eurasia
Disallow: /appointments/eastasia
Disallow: /ask/20030515/eurasia
Disallow: /ask/20030515/eastasia
Disallow: /ask/images/eurasia
Disallow: /ask/images/eastasia
Disallow: /deptofhomeland/analysis/eurasia
Disallow: /deptofhomeland/analysis/eastasia
Disallow: /deptofhomeland/bill/eurasia
Disallow: /deptofhomeland/bill/eastasia
Disallow: /deptofhomeland/eurasia
Disallow: /deptofhomeland/eastasia
Disallow: /ecom/eurasia
Disallow: /ecom/eastasia
Disallow: /economy/eurasia
Disallow: /economy/eastasia
Disallow: /goodbye/eurasia
Disallow: /goodbye/eastasia
Disallow: /government/fbci/guidance/eurasia
Disallow: /government/fbci/guidance/eastasia
Disallow: /government/fbci/eurasia
Disallow: /government/fbci/eastasia
Disallow: /government/handbook/eurasia
Disallow: /government/handbook/eastasia
Disallow: /government/images/eurasia
Disallow: /government/images/eastasia
Disallow: /government/eurasia
Disallow: /government/eastasia

... heh heh

Posted by: MillionthMonkey at October 27, 2003 11:41 AM | PERMALINK

So, for those of us less technically inclined, could somebody tell us what exactly we're looking at when we open up the link in MillionthMonkey's post? What is a robot.txt file? What does it do?

Posted by: libdevil at October 27, 2003 01:14 PM | PERMALINK

It seems to me the best way to go would be to write a spider that would specifically use the robots.txt file as a guide to where to look for things the White House hopes will disappear.

Posted by: Jon H at October 27, 2003 01:57 PM | PERMALINK

Slashdot has it.

Posted by: Keith Thompson at October 27, 2003 01:57 PM | PERMALINK

Now, if they can just get the major media outlets to play along....

Hey, there's Condi Rice saying that old news casts might contain secret messages to terrorist cells, and that archives should be shut down.

And hey, aren't the media players seeking favorable treatment from the FCC in the consolidation saga?...Well, then, brother Powell over at the FCC might be able to arrange a little quid pro quo...

Posted by: andrew at October 27, 2003 03:22 PM | PERMALINK

Slashdot has already got it but since the site is dominated by Libertarians and Right Wing pinheads the general consensus is that this nothing.

  • "Some intern made a mistake...",
  • "most of these directories don't exist so it means nothing...",
  • "If they were really trying to hide anything then the liberal press would be all over this in a couple of hours..."

    are the typical posts.

Posted by: Dr. Morpheus at October 27, 2003 05:44 PM | PERMALINK

Dr Morpheus: Never underestimate the fury of a libertarian who thinks his government is hiding something...

Posted by: Anarch at October 27, 2003 08:20 PM | PERMALINK

Dr Morpheus: Never underestimate the fury of a libertarian who thinks his government is hiding something...

Yeah, but they're bending over backwards trying to explain this as nothing or defend it as a perfectly reasonable thing to do. Neither behavior meshes with the ideology of a Libertarian.

Frankly I think Slashdot Libertarians are nothing more than Conservatives who want to party. Actually I think that pretty much describes all Libertarians...

Posted by: Dr. Morpheus at October 27, 2003 09:54 PM | PERMALINK

In my travels around the web I have come across many a conservative who calls himself a libertarian who is nothing of the sort. I think they think libertarian is a hipper label than conservative. Whatever, they aren't really libertarians.

Posted by: platosearwax at October 28, 2003 01:00 AM | PERMALINK

obeah: There are also cryptographic protocols for certifying a set of documents; search for "digital time-stamping".

Basically, you need a function called a "hash" that takes a string of bytes and produces a smaller value (usually 16 or 20 or 32 bytes); it's not feasible to find two strings of bytes that produce the same hash value. (Example functions include SHA-1 and MD5.)

Every week, download all the new White House documents. Stick them all together in one big document and run the hash function over them, getting a hash value. Run a classified ad in the Washington Post or NYT with this week's hash value. Next week, do the same thing but start with this week's hash value as your starting point. Publish the resulting value, and so on.

Now you can no longer retrospectively edit. That would change the hash values after the edited document, and someone could cross-check with the published values and realize that they've changed. You could still edit documents when you put them in the archive, before hashing them; not sure how to fix that hole, though it would require very good foresight to insert embarassing disclosures that turn out to be proven true.

Posted by: amk at October 28, 2003 03:30 AM | PERMALINK

Okay, here's what happened. It's an attempt to cover up changes they've already made, and get the older versions out of Google, the Internet Archive, etc. See my full explanation at


Posted by: Kynn at October 28, 2003 12:15 PM | PERMALINK

