I’ve been trying to build some tooling to support the company I’m currently working for. It involves getting information from the search engines about rankings and competition. Now I’m not one to usually go against the search engines terms of services, but they are really pushing it.
First lets give you some access to the API’s that search engines have, just in case you want to play with them a bit:
- Google Api,
Really basic but well written API. Is pretty easy to use and supports a lot of features. (1000 search limit)
- MSN Search API,
Very good Api if you’re writing for one of the Microsoft languages, otherwise I wouldn’t waste any time on them. Don’t even know if they support anything other then .Net
- Yahoo Search Api,
One of the most stable Api’s around. It’s got a ton of features, but is a hell to implement in anything. I mean its got good XML powered results. But this does mean you need to build wrappers for everything to translate it into something useful for your program. (5000 search limit)
So what do these API’s offer you. You can use them to perform search operations without having to visit the website of the search engine. This can be useful if you’re developing some online / offline search feature. Or if you want some search engine tooling, eg. checking rankings. Actually it’s the only way as all of the search engine forbid the usage of automated queries that aren’t using their API.
So why am I so against these API’s?
Well let me first state: I like most of the service they offer, but….
They are just so fucking in accurate! If I perform a search to see how many pages of a website are in the index all three of the API’s return either 0 or the wrong number. When performing the same search in the search engine I get the real results. Why can’t the search engine give the same results using the API as they give to the rest of the world?
There is even one worse thing. The Google Api doesn’t work half of the time, and the other half the results are crap. Yahoo is just as bad and loves to make our live as developer as difficult as it can be with in-accurate results and dozens of ‘Bad Request‘ errors.
I’m sorry to say that this is one of the reasons that I had to build a tool that made a direct request to the search engine to analyze the results that way. I know it’s against the terms of services, but there’s no way around it at the moment. Let’s just hope the search engines fix some of these problems.