It has been a while since I wrote a tip of development. This one is pretty good, I think.
In .NET 1.0 and 1.1, each ArrayList would pre-allocate space for 16 elements, which means 64 bytes just for the pointer storage. Then, the .NET team did something very smart. They instrumented several real world applications and they found out that most of the ArrayList created had... zero elements!
That is right, most of the ArrayList allocated in an application are never used. So on .NET 2.0 they changed the default allocation to only 4 elements, which saves 48 bytes per ArrayList not being used.
I thought that change was great, but it had near zero impact on Sampa, because I knew the cost of allocating an ArrayList so I always delayed it until I knew I would use it.
But I went one step further.
As the .NET development guidelines recommend, if your function returns an ArrayList, it is better to make it return an empty array than a null pointer. If you are consistent, you avoid having to compare for both cases when calling a function.
Turns out, that a lot of functions will return empty ArrayList on Sampa, and probably on your application too. The solution: Create a static ArrayList with zero elements and used it every time you'll return an empty array instead of allocating a new ArrayList and return it.
Just make sure that static ArrayList is created properly with readonly flag on the variable and read-only attribute on the array:
static public readonly ArrayList EmptyArray = ArrayList.ReadOnly(new ArrayList());
That saved us a ton of memory. Each request might be "using" 300 hundred or more empty ArrayList. If each ArrayList takes about 40 bytes, we are saving 12KB per request. At 100 requests/second, that is 1.2MB less (or 30K objects) that the Garbage Collector needs to worry about (per second!).
TechMeme is this fantastic website that rolls up the hottest topic on the blogosphere at the moment. For example, if you write some that insterests a lot of people, and a lot of bloggers notice that and write themselves referring to your original post, you might appear on the Homepage of TechMeme.
That works fairly well, but it has a fundamental flaw, IMHO. Blogs that are popular, like TechCrunch, Scripting News or Scobleizer will get lots of link if they write something interesting or not, just because of the sheer number of readers.
Well, this is where a handy IR (Information Retrievel) technique might work. And this is how it goes...
In text analysis (this is an oversimplification), words that occur the most are not as valuable as words that occur a few times. If you look at papers about lung cancer, the word "Mesothelioma" is more relevant in a document than the word "are". Not because of the instance count on that document per se, but because of the instance count of all words on all documents.
Now, if TechMeme really wants to surface the out of the ordinary news, that are popping on the blogosphere, they should apply a factor to each blog inverse proportionally to the average number of daily links to that blog.
This way, if TechCrunch averages 500 links per blog post, and they get 5,000 for a single blog post, that is worth noticing. But right now, they get less than 500 and appear every day on TechMeme.
I thought the Blogosphere was about distributing power, not shifting from a single group (Mainstream Media) to another (A-list bloggers).
In a few minutes we will deploy a new version of our Page Layout system. This will fix a few bugs and help simplify how users choose the column layout of their pages.
After this upgrade, your site might look different!
If you find a bug on the next few days, please, report to us as soon as possible so we can minimize the impact to other users: bugs@sampa.com