Wednesday, January 9, 2008

A Child application may have already started this instance

There is this weird error message that happens when starting a container on OC4J 10.1.3.3.0 when things are out of whack, and it isn't exactly the most friendly or informative error message. Its the one you see as the title of this blog message above you.

The other day, we were hitting a problem with our application deployed to OC4J 10.1.3.3.0 where it would spin and spin and spin and finally throw out a 500 internal server error, the application, mind you. Which is fine, just hiding the real error message.. which I asstutely knew how to get more information from by going into $ORACLE_HOME/j2ee//applications//_web/orion-web.xml and adding development="true" to the orion-web-app xml tuple, which, of course, you'd know to do anyway.. right.. 'cause you wrote an orion-web.xml for your war too, right? Anyway... I had my operations person restart the ENTIRE MACHINE so that we could try again and get the real error message to find I was hitting an out of heap space exception. Ok, thats not good. So, what'd I do? I upped the heap space.

ERROR ERROR.. Now I cant start my container.... But, when you cant start your container, you cant administer that container.. which means I couldn't change my settings back to the way they were to see if that fixes the non-starting issue. Now what?!

I ask my operations person to restart the entire machine a couple of times and then I start building a new container. After about 20 miuntes of configuring a new container, look who decided to start.. on his own.. and without telling me.. (c'mon.. and ajax notification wouldn't have killed ANYONE to add in..) the container that wouldn't start.. had started..

So I go back in, hit the same problem.. and decide to bump the max heap size up some more.. "what could possibly be going on? this is weird", I thought. Rinse and Repeat.. But, this time, the container wont start to save its life.

I walk back down to my operations person and say "Hey, Something is weird, can you try and start the container?"

"Yeah, sure, one sec"

>opmnctl startall
>........
>Container has failed to start due to too low starting heap size.


"WHAT!?"

How in the world does that not get validated on the front end.. how in the world does an error message reported by the backend start-container job not get propogated to the front-end application deployment console..

And thus, 4 billable hours of my client's money was wasted because of just a minor, intsy wintsy bug in the container.

The 500 internal server error was totally OUR codes fault.. we had an infinite loop, which was easily debugged once I could get around the mayhem of 'Is this container even running correctly?' to realize my code was in fact to blame.

Open Source Rant
Am I entering a bug? Nope. Am I fixing the problem? Nope. I'm not. Just another drop of personal experience that'll make me a better OC4J user, and not make the product any better. If I hit this problem using any one of several openly developed servlet/JEE servers you'd better believe I'd immediately open a ticket and followed through with a passion to make sure no one would hit the same frustration. I mean, seriously, ONE LINE OF VALIDATION WOULD'VE SAVED ME... Or maybe its the other way around, the openly developed tools would have completely skipped a rule that looked anything like this because of how unreal it really is.... who thinks of these kind of rules.. and if it's backed up by actual research, wouldn't their be an OC4J memory management & garbage collection tuning tutorial that would outline the things I need to know about who OC4J does to make my life 'easier'?

1 comment:

Brad said...

Not only is the product closed, but the method of opening tickets is closed as well. Sure, I have posted a few things onto the Oracle Forums (what a joke that is), but I have never had a MetaLink account with which to report anything. Its always left up to the DBAs for even more layers of separation to anyone that could possibly help solve your problem.

BTW.. Oracle is "The Information Company".. yeah right. It is impossible to find anything on any of their sites and the search (in both the regular site and the forums) is absolutely worthless.