Built with 
HomeBrave Tech WorldAbout SiteMarcelo Calbucci

Brave Tech World

Entries for April 14, 2007


April 14, 2007


SAT
14
APR
2007

The power of doing what users want

By Marcelo Calbucci

    On the last ten days four very important Sampa sites where created. They were created by very close friends. And why they are important? Because these people used the Sampa Alpha version more than a year ago and never came back. I didn't bother them recently to create sites, but they did without even talking to me.

    The sites created are one about a couple the moved to Washington DC and thought that a blog would be a good way to keep in touch with friends in Seattle. The other is another couple just creating a personal site. The third site is a guy that wants to create easy pages if he wants to sell something on Craigslist and the third is a personal site of a female friend.

    Most of them are Microsoft people -- so they shall remain anonymous -- and they could have created their site on MSN Spaces (Live Spaces) or Office Live. I'm sure they have sites on those services as well. But the fact that they are using Sampa speaks a lot to our feature list and overall user experience. Maybe they can't do what they want on MSN Spaces (Office Live is mostly for business anyway) or maybe Sampa had just the extra thing that they needed.

    IMHO, this is a sign that we are moving our user experience on the right direction.

    BTW, we have three new features just released: Blog About a BookOpen in a new window, Redesigned Site OverviewOpen in a new window and Wide-Width TemplatesOpen in a new window.

   



SAT
14
APR
2007

Crawlers and Bots explosion

By Marcelo Calbucci

 

    Amazingly enough, Sampa gets more page requests from crawlers and bots than from real users. For example, yesterday 59% of the page requests were from crawlers and bots, only 41% were from real users.

 

    Of course, on our stats we always discard the crawlers because those are not real users requesting pages and they can actually really inflate your number of Unique Users and Visits because crawlers (mostly) don't support cookies.

 

    So, in prol of helping my fellow Web 2.0 entrepreneurs, I'm listing some of the strings matches that we use to detect if a user-agent is a crawler:

 

Crawlers:

 

  • bot
  • crawler
  • spider
  • spyder
  • fetch
  • perl
  • search
  • feedseek
  • screenshot
  • scout
  • thumbnail
  • reader
  • mediapartners
  • jeeves
  • ia_archiver
  • slurp
  • yahoofeed
  • yahoo-blogs
  • del.icio.us
  • nutch
  • netnewswire
  • moreover
  • stackrambler
  • boitho
  • blogpulse
  • snap.com
  • everest
  • filangy
  • stumble
  • zyborg
  • baldric
  • hanzoweb
  • yacy
  • wazzup
  • python
  • feedcheck
  • dragonfly
  • netcraft
  • grabber
  • linkwalker
  • egothor
  • irlbot
  • psbot
  • heritrix
  • tmcrawler
  • libwww
  • jakarta
  • httpclient
  • java/1
  • wget/

 

    Besides those, any user-agent that has less than 15 characters is considered a crawler. I have to update this list every month, because every month there is at least a couple of new crawlers where the user-agent string doesn't contain the word "crawler" or "bot" and most of the time it is from some CS university.

 

    Now, we also have a list of feed readers, which are crawlers but they are working on behalf of a real person (mostly). On those cases, we treat them a bit differently because we want to grab the number of subscribers from that feed.

 

    We don't use the subscribers of feeds to our UU count, but we are still interested in knowing how many people subscribe to each Sampa site feed. The reason we don't use it is because if somebody subscribe to a feed it doesn't mean the saw it. Bloglines might say that I have 25 subscribers, but maybe only a handful really read what I write (in Bloglines), and the only way to detect that is by adding a tracking-gif on each blog post, which is something we are not planning on doing for now.

 

    The list of strings to identify a feed reader is:

 

  • bloglines
  • yahoofeedseeker
  • newsgator
  • feedster
  • feedfetcher-google
  • netvibes
  • pubsub
  • sharpreader
  • rssbandit
  • feedbite
  • zhuaxia

    One of the biggest problems with feed readers detection is the new IE 7 and Outlook 2007 that use the IE 7 regular user-agent, making it impossible to distinguish between a user that subscribe to a feed versus a user that just click on the feed link.

 

    I hope this helps your startup and if you have other crawlers, bots or feed readers that I'm missing, please, let me know.



Similar Content
Powered by Google