Have you ever worked with .Net remoting and had the problem of huge lag and lost information. Well when I was developing a ocean simulator this is just what happened to me. In our case the advantage of distributing the work load was soon overtaken by the overhead of the network traffic.
Don’t get me wrong in most applications you will never find any hinder from using remoting, but our application had thousands of objects that were communicating over the network. To keep it short it produced so much overhead the program, on all computers, would stall and eventually crash. Not something you want on a production system. So let me try and explain why this happened to us, and most likely to you in similar systems.
Disclaimer: I am expecting you to already know how remoting works, and what worked for me might not work for you. But you should be able to use my information to solve some problems in latency and network overhead.
In our application we had a simple setup, just to keep it manageable. We had a ocean on one computer and distributed all the fish over the various subscribing systems. With this we expected problems would occur with A.I. of similar fish so all fish of one species were located at one system.
To prevent any single system from being crippled with hundreds of fish while another had not even one we included a load balancing system. This system spread any new species to those system with the fewest amount of fish.
Overall we thought this would be a good setup and would not cause to many problems. Oh boy were we wrong!
During our initial trials with up to fifty fish everything appeared to run smoothly. But then came the shock. As soon as the amount of fish increased to over 100 with two clients the programs locked up completely. I was less then happy with this result, lets just say that some nasty words were used.
So what we did was introduce caching into the system, in our case nothing else was reachable as we only had a few weeks left. The new scenario included that every system had its own cache copy of the entire ocean. Thus reducing the number of requests send over the network somewhat.
Our hopes were that this would cut the network traffic in about half. And initially it did, we only updated the cache every three seconds or less (Took some tweaking to find an acceptable level). What we forgot to think about was the fact that the more fishy there were in our little ocean, the more objects had to be copied across the network.
After we thought to solve most of our problems we started testing the new scenario. At first everything appeared to work fine, but then again the A.I of the fishy were not yet activated. Without enabling the A.I. the world could sustain about 500 fishes on three different systems without to much problems.
Successful trial you might think, but alas we were to happy to soon. In the second trial I enabled my A.I. system. Soon after I wished I didn’t though. The application started out fine, but after a few minutes of running the first remoting exceptions were thrown. Did not take to long to completely crash the application again.
What I had forgotten to take in mind was that in our system the fishes interacted with each other. This is needed for finding food and allies as well as fleeing for potential threads. But it also dramatically increased the amount of network calls to remoted objects.
Third trial setup
To make it appear the program was working we introduced some shortcuts, like making the A.I. dumber and reducing the cache refresh rate. Needless to say it made the entire program useless, but then again it was only a school project which needed to run for 15 minutes tops.
Still I was not happy with the results. I mean you must be able to apply remoting without instantly crashing or slowing down your application. To find out if it is even possible I decided to start a library for automating remoting tasks. This includes a network service, load balancer and base classes for any object needing to be networked.
Now as of yet I’ve not been able to do much testing, or completing for that matter, of the library. But it is showing promise as I’m only sending calls with changes of values and not entire objects. As well as trying to make every networking call in separate threads as not to block the program.
Remoting a real challenge
At first remoting may look appealing because of the ease at which you can set it up, well relatively spoken that is. But it has huge performance hits when you try to remote to many objects and don’t apply the following simple tricks:
- Reduce the number of networked calls whenever possible
- Don’t use serialization, unless absolutely necessary (sending huge XML or BLOB data is not smart!)
- Try to only send primitive data types, like integer and bytes. Prevent using strings as these can be quite long in size
- Work as much as possible in separate threads to prevent locking the program when waiting on return values.
- When calling void functions use a-synchronic calls, thus preventing the calling application from blocking
Hopefully when you apply these pointers you will have more success then we had with our setup. And if you are interested in knowing the results of my experiment in creating a library with less overhead then stay tuned every now and then. If you found a solution to the lag and latency with remoting don’t hesitate to contact me.