Amazon Web Services just released their new service: SimpleDB. This is a pretty brilliant idea and peoplearetakingnotice.
But there are two pieces that bloggers are not paying attention to, or they didn't realize (after all, most of them are not developers)
I felt the service was so interesting I checked the API and how it works and bam! I was hit in the head.
Did anyone say X.500?
My first realization is that it's not a Database, it's a Directory Service!
Ok, most people (even developers) would not know what a Directory Service is even if you hit them with an Active Directory book on the head. Anyway, if I remember correctly of my years on Exchange Server (98-99) while working on the Active Directory integration, a Directory Service had a few peculiarities that differentiated it from a traditional database.
First of, each object (this is what a "record" is called on a Directory Service) can contain different attributes and the schema can be changed on the fly (a bit more complicated than that).
The next interesting aspect is that a single attribute (field) can have multiple values, just like the Amazon SimpleDB! This means if I define attribute "UsedBy" I can set the values to "Realtors" and "Brokers". On traditional relational databases you'd need 3 tables to do something like this.
Finally, a Directory Service allows a hierarchy of objects, meaning instead of Tables you have nodes (which are container objects) and objects hang out of those nodes. Oh oh, SimpleDB doesn't have that, so all my theory goes down the drain.... Not really, they provide a thing called "Domain" which, if you want to (but you don't), can be used as a hierarchy.
And the best application for SimpleDB will be...
Calling SimpleDB a database or a directory service doesn't change what it can do or what people can do with it, it's just a convention. What matters are the nice products that will come out of that, and IMHO, one of the most interesting one will be...
A search engine!
What? Somebody will built a search engine on top of SimpleDB to compete with Google? Nah! Somebody -- lots of body, actually -- will be able to built their own site search service on top of SimpleDB.
Imagine that Redfin is not a gazillion-dollar VC-backed startup. They are just getting started and want to index all listings from MLS to do a kind of search that you cannot do directly to the MLS database. They can put all that data into SimpleDB (the flexible schema is a huge plus) and not have to worry about having Terabytes of data on their own database. Do you know how much it costs in time and money to maintain a Terabyte database? A lot. There is backup, there is perf issues, there is hardware redundancy, etc.
The only thing missing from the SimpleDB API to provide some serious "site search" capability is a way to rank attributes when doing a query.
I for one, can't wait until it's ready. I already have a million ideas. Granted, I have access to MySQL but now I don't have to worry about replicating data. Well, I didn't worry too much because my db is small. hahhaha
you're straight on. we've done just that with SDS at <a href="http://www.polarrose.com">polar rose</a>. (while not one that intends to compete with google, polar rose is a visual search engine).
Let's assume Redfin imports the entire ~1 TB MLS database into SimpleDB. I wonder, what are the advantages of this approach: price and enhanced search capabilities? Compared to what...importing the entire ~1 TB MLS database into (say) MySQL? I can understand how price may be better under SimpleDB (although the AWS cost formula for this service is involved). But enhanced search, I'm not following what it is you imply would be possible thanks to SimpleDB. In other words, what kind of query is possible in SimpleDB but not in MySQL (or Postgres, etc.)?
By Ryan M - 12/15/2007 1:59 AM
There's no kind of query that a directory service can do that can't be replicated by a relational database... in theory... and I don't think that's what Marcelo was implying. He's saying it'd be easier to create a search engine for those sorts of data, which may be right and may be off... that "one thing missing" he mentions at the end, might well be a deal breaker. And the MLS data is still structured data, mostly. It's not like making a search engine for generic content on a web site.
By Paul K - 12/15/2007 3:35 AM
Paul nailed it. Pretty much everything you can do in a relational database you can do on a directory service. The question is what is it optmized for.
WRT to hierarchy, any hierarchy can be represented in a flat table, either by have a "Parent-Node" attribute or by having a structured notation on a field, like "node1.node23.node27". Want to know all children of "node1" query for the "hierarchy='node23%'".
Ryan is right. There is nothing you can do on SimpleDB you can't do on MySQL, but that is not the great thing about SimpleDB. The fact that you can have 20 TB of data is what will enable a lot of startup to do interesting things. Before that, it would cost you a couple hundred thousand dollars just in server/storage to get that started.
Just why I think a Search for a ginormous database might be the applications that benefit the most of SimpleDB. If you will store a couple thousand rows into SimpleDB you might as well use a flat file on your own server to reduce the latency.
You could build a rather simple website CMS - purely for serving a static looking dynamic site, not a forum/blog/documents/info dynamic site. Of course who could go down that path as well...
I've been looking for a simple website CMS - they don't exist in the form I want and I'll be building my own in RoR - MagnitudeCMS. Lots of CMS do what I want as well as 100 other things I don't want them to do ;)