I got this from Super Colossal but I’m so blown away by it I had to re-post.
The robots.txt file for whitehouse.gov has changed.
Yesterday it was:
User-agent: *
Disallow: /cgi-bin
Disallow: /search
Disallow: /query.html
Disallow: /omb/search
Disallow: /omb/query.html
Disallow: /expectmore/search
Disallow: /expectmore/query.html
Disallow: /results/search
Disallow: /results/query.html
Disallow: /earmarks/search
Disallow: /earmarks/query.html
Disallow: /help
Disallow: /360pics/text
Disallow: /911/911day/text
Disallow: /911/heroes/text
Disallow: /911/messages/text
Disallow: /911/patriotism/text
Disallow: /911/patriotism2/text
Disallow: /911/progress/text
Disallow: /911/remembrance/text
Disallow: /911/response/text
Disallow: /911/sept112002/text
Disallow: /911/text
Disallow: /ConferenceAmericas/text
Disallow: /GOVERNMENT/text
Disallow: /QA-test/text
Disallow: /aci/text
Disallow: /afac/text
Disallow: /africanamerican/text
Disallow: /africanamericanhistory/text
etc... for another 2000 lines.
Today it is:
User-agent: *
Disallow: /includes/
The robots.txt file tells search engines what to include and not include in their indexes. The previous administration either had a blanket policy to block indexing of content on the White House website or someone made a conscious decision that content within /africanamerican/text
(and the other directories) should not be returned in a google search. I imagine this would be incredibly effective in restricting the dissemination of information, without actually blocking it.
Great to see change on such a fundamental level.
For those of you wondering, I just checked the robot.txt files for www.australia.gov.au and www.aph.gov.au – nothing nefarious to report.