# Define access-restrictions for robots/spiders # http://www.robotstxt.org/wc/norobots.html # Normal robots.txt body is purely substring match only # We exclude lots of general purpose forms which are available in various mount points of the site # and internal image bank which is hidden in the navigation tree in any case User-agent: * Disallow: /author Disallow: /search_form Disallow: /sendto_form Disallow: /accessibility-info Disallow: /login_form Disallow: /mail_password_form?userid= Disallow: /front-page Disallow: /portal_javascripts Disallow: /portal_kss Disallow: /events_listing Disallow: /vcs_view Disallow: /ics_view # By default we allow robots to access all areas of our site # already accessible to anonymous users User-agent: * Disallow: # Add Googlebot-specific syntax extension to exclude forms # that are repeated for each piece of content in the site # the wildcard is only supported by Googlebot # http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling # Googlebot allows regex in its syntax # Block all URLs including query strings (? pattern) - contentish objects expose query string only for actions or status reports which # might confuse search results. # This will also block ?set_language User-Agent: Googlebot Disallow: /*talkback Disallow: /*RSS Disallow: /*sendto_form$ Disallow: /*folder_factories$ Disallow: /*?*