CyberD.org
C:\ Home » Blog » Tech » FaceBook Robots, Check

FaceBook Robots, Check

Why haven't I checked this before? Interesting...


# Notice: if you would like to crawl Facebook you can
# contact us here: http://www.facebook.com/apps/site_scraping_tos.php
# to apply for white listing. Our general terms are available
# at http://www.facebook.com/apps/site_scraping_tos_terms.php

User-agent: baiduspider
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: Googlebot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: msnbot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: naverbot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: seznambot
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: Slurp
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: teoma
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: Yandex
Disallow: /ac.php
Disallow: /ae.php
Disallow: /album.php
Disallow: /ap.php
Disallow: /feeds/
Disallow: /l.php
Disallow: /o.php
Disallow: /p.php
Disallow: /photo.php
Disallow: /photo_comments.php
Disallow: /photo_search.php
Disallow: /photos.php

User-agent: *
Disallow: /

# E-mail [email protected] if you are authorized to access these and are getting denied.
Sitemap: http://www.facebook.com/sitemap.php

Seems like they expected me to visit. :)

4chan keeps it simple:

User-agent: Googlebot
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: Googlebot-Image
Disallow: /
User-agent: Googlebot-Mobile
Disallow: /
User-agent: Mediapartners-Google
Disallow: /
User-agent: Adsbot-Google
Disallow: /
User-agent: *
Disallow: /

How about NG?

User-agent: *
Disallow: /cgi-bin/
Disallow: /bbs/post.php
Disallow: /bbs/takepost.php
Disallow: /pm/
Disallow: /moderators/
Disallow: /bbs/moderators/
Disallow: /ajax/
Disallow: /js/
Disallow: /bbs/post/
Disallow: /bbs/search/
Disallow: /bbs/search.php
Disallow: /includes/
Disallow: /account/
Disallow: /refer/

Stuff like this is pretty useful if you want to rip a site btw, lets you know which directories you might need to check. :) Hmm, and what about Google? No list of robots.txt would be complete without the robots.txt of the worlds largest collector of robots.txt. :P

User-agent: *
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
Allow: /news/directory
Disallow: /nwshp
Disallow: /setnewsprefs?
Disallow: /index.html?
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /relcontent
Disallow: /imgres
Disallow: /imglanding
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /googlesite
Disallow: /preferences
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /default
Disallow: /m?
Disallow: /m/?
Disallow: /m/blogs?
Disallow: /m/directions?
Disallow: /m/ig
Disallow: /m/images?
Disallow: /m/local?
Disallow: /m/movies?
Disallow: /m/news?
Disallow: /m/news/i?
Disallow: /m/place?
Disallow: /m/products?
Disallow: /m/products/
Disallow: /m/setnewsprefs?
Disallow: /m/search?
Disallow: /m/swmloptin?
Disallow: /m/trends
Disallow: /m/video?
Disallow: /wml?
Disallow: /wml/?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/?
Disallow: /pda/search?
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /gwt/
Disallow: /purchases
Disallow: /hws
Disallow: /bsd?
Disallow: /linux?
Disallow: /mac?
Disallow: /microsoft?
Disallow: /unclesam?
Disallow: /answers/search?q=
Disallow: /local?
Disallow: /local_url
Disallow: /froogle?
Disallow: /products?
Disallow: /products/
Disallow: /froogle_
Disallow: /product_
Disallow: /products_
Disallow: /print
Disallow: /books
Disallow: /bkshp?q=
Allow: /booksrightsholders
Disallow: /patents?
Disallow: /patents/
Allow: /patents/about
Disallow: /scholar
Disallow: /complete
Disallow: /sponsoredlinks
Disallow: /videosearch?
Disallow: /videopreview?
Disallow: /videoprograminfo?
Disallow: /maps?
Disallow: /mapstt?
Disallow: /mapslt?
Disallow: /maps/stk/
Disallow: /maps/br?
Disallow: /mapabcpoi?
Disallow: /maphp?
Disallow: /places/
Disallow: /maps/place
Disallow: /help/maps/streetview/partners/welcome/
Disallow: /lochp?
Disallow: /center
Disallow: /ie?
Disallow: /sms/demo?
Disallow: /katrina?
Disallow: /blogsearch?
Disallow: /blogsearch/
Disallow: /blogsearch_feeds
Disallow: /advanced_blog_search
Disallow: /reader/
Allow: /reader/play
Disallow: /uds/
Disallow: /chart?
Disallow: /transit?
Disallow: /mbd?
Disallow: /extern_js/
Disallow: /calendar/feeds/
Disallow: /calendar/ical/
Disallow: /cl2/feeds/
Disallow: /cl2/ical/
Disallow: /coop/directory
Disallow: /coop/manage
Disallow: /trends?
Disallow: /trends/music?
Disallow: /trends/hottrends?
Disallow: /trends/viz?
Disallow: /notebook/search?
Disallow: /musica
Disallow: /musicad
Disallow: /musicas
Disallow: /musicl
Disallow: /musics
Disallow: /musicsearch
Disallow: /musicsp
Disallow: /musiclp
Disallow: /browsersync
Disallow: /call
Disallow: /archivesearch?
Disallow: /archivesearch/url
Disallow: /archivesearch/advanced_search
Disallow: /base/reportbadoffer
Disallow: /urchin_test/
Disallow: /movies?
Disallow: /codesearch?
Disallow: /codesearch/feeds/search?
Disallow: /wapsearch?
Disallow: /safebrowsing
Allow: /safebrowsing/diagnostic
Allow: /safebrowsing/report_error/
Allow: /safebrowsing/report_phish/
Disallow: /reviews/search?
Disallow: /orkut/albums
Allow: /jsapi
Disallow: /views?
Disallow: /c/
Disallow: /cbk
Disallow: /recharge/dashboard/car
Disallow: /recharge/dashboard/static/
Disallow: /translate_a/
Disallow: /translate_c
Disallow: /translate_f
Disallow: /translate_static/
Disallow: /translate_suggestion
Disallow: /profiles/me
Allow: /profiles
Disallow: /s2/profiles/me
Allow: /s2/profiles
Allow: /s2/photos
Allow: /s2/static
Disallow: /s2
Disallow: /transconsole/portal/
Disallow: /gcc/
Disallow: /aclk
Disallow: /cse?
Disallow: /cse/home
Disallow: /cse/panel
Disallow: /cse/manage
Disallow: /tbproxy/
Disallow: /imesync/
Disallow: /shenghuo/search?
Disallow: /support/forum/search?
Disallow: /reviews/polls/
Disallow: /hosted/images/
Disallow: /ppob/?
Disallow: /ppob?
Disallow: /ig/add?
Disallow: /adwordsresellers
Disallow: /accounts/o8
Allow: /accounts/o8/id
Disallow: /topicsearch?q=
Disallow: /xfx7/
Disallow: /squared/api
Disallow: /squared/search
Disallow: /squared/table
Disallow: /toolkit/
Allow: /toolkit/*.html
Disallow: /qnasearch?
Disallow: /errors/
Disallow: /app/updates
Disallow: /sidewiki/entry/
Disallow: /quality_form?
Disallow: /labs/popgadget/search
Disallow: /buzz/post
Disallow: /compressiontest/
Disallow: /analytics/reporting/
Disallow: /analytics/admin/
Disallow: /analytics/web/
Disallow: /analytics/feeds/
Disallow: /analytics/settings/
Disallow: /alerts/
Allow: /alerts/manage
Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml
Sitemap: http://www.google.com/hostednews/sitemap_index.xml
Sitemap: http://www.google.com/ventures/sitemap_ventures.xml
Sitemap: http://www.google.com/sitemaps_webmasters.xml
Sitemap: http://www.gstatic.com/trends/websites/sitemaps/sitemapindex.xml
Sitemap: http://www.gstatic.com/dictionary/static/sitemaps/sitemap_index.xml

Woah. Yeah. No wonder it's a big site. Happy exploring! ;)

Comments

Keep track of the discussion via rss? Read about comment etiquette? Or type in something below!
This was pretty damn interesting. And yet, nobody's spoken! Be the first!


The Comment Form

Your email address will not be published. Required fields are marked *

Your email is saved only to approve your future comments automatically (assuming you really are a human). ;) It's not visible or shared with anyone. You can read about how we handle your info here.

Question   Smile  Sad   Redface  Biggrin  Surprised   Eek  Confused  Beardguy  Baka  Cool  Mad   Twisted  Rolleyes   Wink  Coin

Privacy   Copyright   Sitemap   Statistics   RSS Feed   Valid XHTML   Valid CSS   Standards

© CyberD.org 2024
Keeping the world since 2004.