搜索机器人
什么是搜索机器人?
搜索机器人,有时也被称为爬虫,是一种持续浏览互联网的机器人,通常用于构建搜索索引。
机器人可能会人为地夸大流量数据,因此了解它们的存在非常重要。
还有其他类型的机器人用于点击欺诈。了解更多关于点击欺诈及其预防方法的信息。
搜索机器人与 Linkly
Linkly 可以检测主动暴露自己身份的搜索机器人和爬虫。您可以在流量报告的机器人部分查看哪些点击被归属于机器人。
我们有一篇关于机器人流量以及如何阻止机器人的文章。
社交媒体爬虫的特殊处理
Linkly 改进了对来自 Facebook、YouTube、Google、LinkedIn 和 X 的社交媒体爬虫的处理。
当来自这些爬虫的流量访问您的链接时:
- 不会被记录在您的分析中
- 不会计入您的点击限制
- 爬虫会被透明地重定向到正确的目标
- 即使启用了"阻止机器人"功能,这些爬虫始终被允许通过
这样可以防止社交媒体爬虫消耗您的点击限制,同时仍然允许它们生成预览和检查您的链接。

阻止机器人跟踪链接
Linkly 可以阻止机器人和搜索爬虫跟踪您的链接。有关启用机器人阻止的说明,请参阅机器人流量。
重要提示: 即使启用了阻止功能,来自 Facebook、YouTube、Google、LinkedIn 和 X 的社交媒体爬虫始终被允许通过,因此链接预览可以继续正常工作。
机器人是否计入点击限制?
来自 Facebook、YouTube、Google、LinkedIn 和 X 的社交媒体爬虫****不计入您的点击限制,也不会记录在您的分析中。
下面列出的所有其他机器人都会计入您的点击限制,因为无论流量来源如何,监控和重定向流量的成本都是相同的。
被阻止的机器人(遇到阻止页面的机器人)也不会计入您的限制。
搜索机器人列表
以下是 Linkly 识别的搜索机器人及其用户代理列表,必要时可以将它们阻止。
- 200pleasebot 200PleaseBot
- 360spider 360Spider
- abot CrawlDaddy, abot
- addthis AddThis
- adldxbot Microsoft Bing Ads
- admantx ADmantX Platform Semantic Analyzer
- adsbot-google Google Adwords
- advbot AdvBot
- ahrefsbot Ahrefs backlinks research tool
- alexa Alexa Crawler
- apache-httpclient Java http library
- apachebench ApacheBench (ab)
- apis-google APIs-Google
- appengine-google Google App Engine
- applebot Apple Bot
- archive.org_bot Internet Archive (archive.org)
- ask jeeves Ask Jeeves
- asynchttpclient Java http and WebSocket client library
- awe.sm Awe.sm URL expander
- baidu Baidu
- bdcbot Big Data Corp
- bingbot Microsoft Bing
- bingpreview Microsoft Bing preview
- bitlybot bit.ly bot
- blekkobot Blekkobot
- blexbot BLEXBot (webmeup)
- bot@linkfluence.net Linkfluence bot
- bufferbot BufferBot
- buibui-checkbot buibui
- butterfly Topsy Labs
- buzztalk buzztalk
- catchbot CatchBot (catchbot.com)
- check_http Nagios monitor
- cliqzbot Cliqzbot
- cmradar/0.1 CMRadar/0.1
- coldfusion ColdFusion http library
- commoncrawl CCBot
- comodo-webinspector-crawler Comodo
- crowsnest Crowsnest
- curabot cura.yt
- curl curl unix CLI http client
- dap/nethttp DAP/NetHTTP
- datagnionbot datagnion.com/bot.html
- daumoa Korean portal and search engine indexing bot
- developers.google.com/+/web/snippet/ Google Plus
- diffbot Diffbot
- digitalpersona fingerprint software HP Fingerprint scanner
- domain re-animator bot Domain Re-Animator Bot
- domainsbot DomainsBot
- domaintunocrawler DomainTuno
- dotbot Dot Bot
- duckduck Duck Duck Go
- elb-healthchecker AWS ELB HealthChecker
- embedly Embedly
- eoaagent EOAAgent
- eventmachine httpclient Ruby http library
- everyonesocialbot EveryoneSocial
- evrinid Evri bot
- exabot Exalead's bot
- exaleadcloudview ExaleadCloudView
- facebookexternalhit Facebook Bot
- facebot Facebook Bot
- feedburner RSS bot
- feedfetcher-google Google Feedfetcher
- findxbot Findxbot
- flipboardproxy FlipboardProxy
- friendfeedbot FriendFeed
- genieo Genieo Web filter bot
- getprismatic.com getprismatic.com
- gigabot Gigabot spider
- gimme60bot Gimme60 (gimme60.com)
- gimmeusabot Gimme60 (gimme60.com)
- go http package Go http library
- google page speed insights Google Page Speed Insights
- google Web Preview Google Instant Previews crawler
- google-structured-data-testing-tool Google-StructuredDataTestingTool
- google-structureddatatestingtool Google-StructuredDataTestingTool
- googlebot Google Bot
- googlestackdrivermonitoring-uptimechecks GoogleStackdriverMonitoring-UptimeChecks
- grapeshotcrawler GrapeshotCrawler
- gravitybot Gravity Bot
- hatena::bookmark Hatena::Bookmark
- heritrix heritrix
- htmlparser HTMLParser
- http_request2 HTTP_Request2
- httpclient HTTPClient
- https://developers.google.com/+/web/snippet Google+ Snippet Fetcher
- hubspot HubSpot
- ia_archiver Internet Archive (WayBackMachine)
- icoreservice iCoreService
- idmarch idmarch.org/bot.html
- inagist URL resolver
- insieve Insieve Bot
- insitesbot Insitesbot
- instapaper Instapaper
- istellabot IstellaBot
- jack jack
- jakarta commons Jakarta Commons HttpClient
- java Generic Java http library
- jetslide Jetslide
- js-kit URL resolver
- kemvibot Kemvi
- kimengi Kimengi Bot
- knows.is knows.is
- kojitsubot Kojitsubot
- komodiabot KomodiaBot
- kraken kraken
- laconica Laconica
- libwww-perl Perl client-server library
- lijit crawler Lijit
- linkdexbot Linkdex Bot
- linkedinbot LinkedIn
- linkscrawler LinksCrawler
- linode Linode Longview
- lipperhey Lipperhey
- livelapbot Livelapbot
- loadtimebot Load Time Bot
- longurl URL expander service
- ltx71 ltx71.com
- lumibot Lumibot
- lwp-trivial Another Perl library
- magpie-crawler magpie-crawler
- mail.ru_bot Mail.ru Bot
- meanpathbot meanpath
- mediapartners-google Google Adsense bot
- megaindex.ru MegaIndex
- memorybot mignify.com/bot.html
- metauri MetaURI
- mfe_expand Mcafee spider
- mir web crawler MIR web crawler
- mj12bot Majestic-12 spider
- mojeekbot Mojeek UK search crawler
- mrchrome MrChrome
- ms search 6.0 robot MS Search 6.0 Robot
- msnbot-media Microsoft media bot
- msnbot Microsoft bot
- nerdybot NerdyBot
- netcraft Netcraft
- netstate netEstate NE Crawler
- netvibes Personalized dashboard bot
- netzcheckbot netzcheck
- newrelicmonitor NewRelic monitor
- newrelicpinger NewRelicPinger
- newsme newsme
- niki-bot niki-bot
- ning NING - Yet Another Twitter Swarmer
- nutch Apache search spider
- openhosebot OpenHoseBot
- orangebot OrangeBot
- pagesinventory pagesinventory.com
- panopta Monitoring service
- paperlibot PaperLi
- peerindex peerindex
- percolatecrawler PercolateCrawler
- perfectmarketkwtbot PerfectMarket
- phantomjs PhantomJS
- pingdom Pingdom monitoring
- pinterest Pinterest
- plukkie botje.com/plukkie.htm
- privacyawarebot PrivacyAwareBot
- proximic Proximic Spider
- psbot-page Picsearch
- publiclibraryarchive.org publiclibraryarchive.org
- pycurl Python http library
- python-httplib2 Python-httplib2
- python-requests Python http library
- python-urllib Python http library
- queryseeker QuerySeekerSpider
- quicklook QuickLook
- re-animator Domain Re-Animator Bot
- readability Readability
- rebelmouse RebelMouse
- redditbot Reddit Bot
- relateiq RelateIQ
- riddler Riddler Bot
- rogerbot SeoMoz spider
- rssmicro RSS/Atom Feed Robot (rssmicro.com)
- ruby Ruby
- scrapy Scrapy
- screaming frog seo spider Screaming Frog SEO Spider
- searchmetricsbot SearchmetricsBot
- semrushbot SEO analysis bot
- seokicks SEOKicks
- seznambot SeznamBot
- shopwiki ShopWiki
- shortlinktranslate Link shortener
- showyoubot Showyou iOS app spider
- siege Joe Dog Siege
- sistrix SISTRIX
- siteuptime Site monitoring services
- slack Slackbot-LinkExpanding
- slackbot Slack Bot
- slurp Yahoo spider
- smtbot SimilarTech
- socialrank SocialRankIOBot
- sogou Chinese search engine
- spbot OpenLinkProfiler
- spider generic web spider
- spinn3r Spinn3r aggregator
- sputnikbot SputnikBot
- squider Squider
- statuscake StatusCake
- stripe Stripe
- test certificate info C http library?
- tineye TinEye Bot
- traackr Traackr Bot
- trendictionbot Trendiction Search
- turnitinbot TurnitinBot
- tweetedtimes The Tweeted Times
- tweetmemebot TweetMeMe Crawler
- twikle Social web search bot
- twitjobsearch TwitJobSearch
- twitmunin Twitmunin
- twitterbot Twitter URL expander
- twurly Twurly
- typhoeus Typhoeus
- umbot uberMetrics
- unwindfetch Gnip
- uptimerobot Uptime Robot
- vagabondo Vagabondo
- vb project Visual Basic
- vigil Vigil
- vkshare VKontake Sharer
- voilabot VoilaBot
- vrcrawler Venture Radar
- wasalive-bot Wasalive Bots
- watchsumo WatchSumo
- wbsearchbot Ware Bay Best Buys
- webscout Webscout
- wesee WeSEE
- wget wget unix CLI http client
- wordpress WordPress spider
- wormly WormlyBot
- wotbox Wotbox
- xenu link sleuth Xenu Link Sleuth
- xing-contenttabreceiver Xing bot
- xovibot XoviBot
- yacybot YaCy
- yahoo-ad-monitoring Yahoo Ad monitoring
- yandex Yandex
- yeti Naver Corp
- yourls YOURLS
- zelist.ro feed parser
- zibb ZIBB spider
- zitebot Zite
- zyborg Zyborg
被识别为机器人的云服务提供商
许多机器人不会主动表明自己的身份,但我们会跟踪互联网服务提供商,并将来自主要云服务提供商的流量也识别为可能是机器人。
- Google Cloud
- Microsoft Corporation
- OVH SAS
- DigitalOcean
- Huawei Clouds
- Google-private-cloud
- Amazon.com
- Google Proxy
- Omonia d.o.o.
- ColoCrossing
搜索机器人常见问题
Linkly 如何检测机器人?
Linkly 通过用户代理字符串(许多机器人会自我声明)以及检查流量是否来自已知的云托管提供商或数据中心来识别机器人。
这个列表中是否缺少某个机器人?
我们会定期更新机器人检测。如果您发现来自未在此列表中的机器人的流量,请联系我们,我们将添加它。
为什么有些点击被标记为机器人,但它们实际上是真实用户?
使用 VPN 或企业网络的用户可能会被标记为机器人,因为他们的流量通过数据中心路由。有关更多信息,请参阅我们关于 VPN 流量的文章。
我可以查看访问我的链接的具体是哪个机器人吗?
可以。在流量报告中,点击机器人选项卡,即可按用户代理查看机器人流量的详细分类。