Quantcast
Channel: West Wind Message Board Messages
Viewing all articles
Browse latest Browse all 10393

Re: DCOM Servers suddenly stopped Loading

$
0
0
Re: DCOM Servers suddenly stopped Loading
Web Connection
Re: DCOM Servers suddenly stopped Loading
Nov. 30, 2012
12:40 pm
3NN0R66KLShow this entire thread in new window
Gratar Image based on email address
From:Rick Strahl
To:Dan Scott-Raynsford
Glad you got it resolved.

It's definitely an odd error. Fxps in projects act pretty weird and vie always tried to avoid this. But I've never heard those keeping classes -com or otherwise - from loading

Anyway glad it's working...


Rick


Hi Rick & Brett,

Well problem solved! It wasn't quite what I expected and I'm not quite certain why the issue caused the problems it did. But I'll try and explain it in case it can ever help anyone else.

Your suggestion about the CONFIG.FPW pretty much gave me the idea of removing everything from the project except the basic WWC files and seeing if I could get it to start up in COM from a VFP command. And lo-and-behold - it did. So then it was a process of re-adding the components back in (VCX, WWC Scripts etc) one at a time until it stopped working again. It was a PITA but eventually I tracked it down to 3 very old .FXP files that were being compiled into the project that had no accompanying .PRG files. These FXP's were from 2004 and were no longer used in anyway. Removing these files from the project stopped the issue.

What I think was happening was when the servers were starting up VFP was looking for the accompanying .PRG files to these old FXP files (no idea why)- but was probably looking using the network names of where the files originally were - which is on our network. The client probably increased some network time out settings or changed something relating to it. This caused the the servers to take longer than 2 minutes to start up properly - which exceeded the 2 minute COM time out (a COM server must respond in 2 minutes or it is summarily killed off).

So in essence the issue wasn't with the DCOM config, permissions or our application start up code. It was with some issue with the files in the project.

Longest two weeks of my life!

But thanks again to all your help on this! Really really appreciate it - I owe both a beer or 10.

Thanks
Dan


Hi Dan,

The Init() is the FoxPro startup point for a COM server. There's nothing else that fires in terms of code before that (AFAIK). The only other thing that can be happening is that a config.fpw might have some sort of startup directive or path or something. You might want to check if there's a compiled in config.fpw and what's in that file.

Other than that I honestly don't know.

This is definitely a problem with the server itself since the other server (wcDemo.wcDemoServer) seems to run including the Web Connection startup code. If you remove the Init() code then there is nothing that can really fail.

And you are 100% sure you're calling the right server, right? There isn't another COM object with the same name in the project perhaps? Or you got another COM object with similar name and the name is wrong?

I know this sounds silly especially since the server worked before, but I just don't see how this couldn't work - UNLESS the computer level DCOM permissions (Access rights in particular) by default disable access rights. However, I think you'd see a different result. It looks to me that the server is hanging during initialization.

What happens with this server on your local machine when you call it via COM? You get a different result there?

+++ Rick ---


Hi Rick,

Thanks again for all your suggestions on this, but I'm still trying with no luck!

I've tried the following:

1. Built a version of my DCOM server with the INIT method as per your suggestion below. Same issue as before.
2. Completely cleared out all the DCOM registry entries relating to this server (/unregserver first followed by regedit). Then reregistered the server. Same issue as before.
3. Built a version of the DCOM server after regenerating component ID's then registered this new server. Same issue as before.

I'm loading all of these servers via executing the command:

loServer=CREATEOBJECT("platoatlas.platoatlasserver")

When I execute the command the VFP command window stops responding. Clicking the command window gives the message "This action cannot be completed because the other program is busy...." message. The PLATOAtlas.exe server appears in the Windows Task Manager running under the account specified in the DCOM entry. It then terminates after 2 minutes and an entry in the Windows System Event Log appears:

The Server {Prog Id} did not register with DCOM within the required timeout.

And the VFP window shows an error:

OLE error code 0x80080005: Server execution failed.

I did build a wwDemo.exe DCOM server, installed it onto the same server and set it up with the very same DCOM permissions and it works perfectly!

Is there any other code that fires in the DCOM before the wwc_server.INIT()?

Thanks
Dan


Dan,

One problem with the code you have is that the server might be instantiated with a parameter. Make sure your Init method has a parameter:

FUNCTIONInit(llComServer) ...ENDFUNC

In fact I would create the server with no code in this method at all and no DODEFAULT() and make sure the server instantiates inherited off of WWC_WWSERVER. This will just ensure that the DCOM Permissions are working. You should be able to instantiate and have a reference at this point. Nothing else will work because the INIT()) does all sorts of set up for Web Connection but at least this will verify that the server can be launched and you can access properties:

loServer = CREATEOBJECT("YourApp.YourAppServer") ? loServer.lDebugMode

If this doesn't work with an EMPTY INIT method then you'll know there's some sort of problem with the COM settings/configuration.

If this does work though, it means something is going wrong in the server's load code. At that point you can start adding log functionality and a few other things back in.

Again, if it doesn't work I'd be very inclined to double check the registry to make sure the registration points at the right file. Also, when you register the server make sure you do a YourApp.exe /regServer again and *MAKE SURE YOU DO THIS WITH AN ADMIN ACCOUNT*. /RegServer fails silently, so if it doesn't work you don't know that it might not have updated the registration. You can also run into problems if the typelibrary registered gets out of sync with what's in the registry (ie. the typelibrary ID doesn't match the actually compiled in typelibrary ID). You can check this by looking at the VBR file and then checking what the registry has for the ProgId/TypelibId.

I know all of this is a pain, but it really sounds like the actual COM registration and your COM server are out of sync since you can get the demo server to launch.

As a last resort as Brett suggests, blow away your project (or rename it and clear regenerate ClassIds) which creates a whole new COM registration with new ProgIds. You'll have to change the entries in wc.ini to match the new ProgId but otherwise

+++ Rick ---



And what you are describing, where you can get wwDemo to work but your's still won't, would make sense if you had bogus server entries.

I wouldn't recommend it but if you were really at wit's end you could always change the information on the Servers tab of the Project Information page and then choose to "Regenerate Component Ids" on the Build menu. A better option would be to clean out all the entries and start from scratch (like in the docs) because if you change the ProgID you're going to have to completely re-register the COM servers again. Still, at least you'd know.


Hi Rick,

<b>Ensure COM Server Works Altogether on the Server<b>
I tried invoking the DCOM server via a VFP Shell via:

loServer=CREATEOBJECT("platoatlas.platoatlasserver")

However, VFP stops responding at this point for 2 minutes. During this time the PLATOAtlas.exe (DCOM server) can be seen in the Task Manager running under the user account specified in DCOM. After about 2 minutes the following error is shown in VFP:

<b>OLE error code 0x80080005: Server execution failed</b>

I built a version of wwdemo.exe and uploaded that to this test server, registered with DCOM and set up the identity and DCOM permissions the same was as our DCM server and tried loading with:

loServer=CREATEOBJECT("wwdemo.wcDemoServer")

This created the Demo server correctly and I was able to interact with it just fine.

<b>Check for hangs in your Startup Code</b>

This would seem to indicate a problem with my start up code somewhere. But to check that the wwserver.init method was actually firing I added logging to it (in my platoatlasmain.prg):

<code lang="vfp"
DEFINE CLASS PLATOAtlasServer AS WWC_SERVER OLEPUBLIC
FUNCTION INIT()
DO wwUtils.prg
LogString("WWSERVER::INIT() Fired")
DODEFAULT()
ENDFUNC

When running in File Mode the string is written to the log file. When running in DCOM mode (or created in a VFP Shell) on this server the log entry is _not_ written (nor the file created).

I would have expected that this should have fired before anything else (any of my startup code) - or is that not correct? I'd really like to verifiy that the some/any VFP code is actually executing in the server when running as DCOM. If I can confirm that at least VFP code is being run in the server then I can begin adding logging functions. But if VFP code is never even being called then adding logging functions won't get me far!

So although we're definitely making progress tracking down the issue it's still a bit of a mystery!

Thanks
Dan


Dan,

Brett's recommendation are good.

A couple of more things that might help.

Ensure COM Server Works Altogether on the Server
Make sure the COM server can be invoked on that machine at all. Try calling it from VFP or VB script or Powershell script to make sure the server can be invoked at all on that machine.

loServer = CREATEOBJECT("wcDemo.wcDemoServer")*** One of the two below to fire a method ? loServer.ProcessHit("QUERY_STRING=wwDemo~TestPage") ? loServer.ProcessHit("PHYSICAL_PATH=c:\westwind\wconnect\testpage.wwd")

If this produces a response you know the server is OK and the issue is one of permissions.

Set DCOM to use Caller's Permissions and inherit Application Pool Identity
To be sure I would configure DCOM impersonation to a local machine account first. Try SYSTEM (which should almost always work unless you have resources that live on other machines) or a local machine admin account that has full rights to everything. If possible call a request that is a 'do nothing' 'helloworld' type of request to minimize startup issues that might be crashing the server due to resource issues.

Another option: The DCOM servers inherit permissions from the Application Pool if you set the DCOM permissions to 'The launching user.'. Typically this will be SYSTEM or whatever the AppPool is configured for. By leaving it at 'the launching user' and setting the permissions of the Application Pool in IIS you are removing the requirement to configure DCOM altogether.

When you say the server starts but hangs, what account does it show up under in TaskManager/Process Explorer? Is it the right account? This still doesn't solve it all - since Launch and Access permissions are assigned seperately in DCOM config.

Check for hangs in your Startup Code
If the COM object loads, but apparently hangs before a request fires it's quite possible that your startup code (OnLoad/OnInit) has code in it that is failing. One thing to try to see if that's the problem is remove all your startup code and put it into separate methods. Remove the calls to those methods and then hit a simple HelloWorld request.

Does that work? If it does, then there's a problem in the startup code that might be the result of missing permissions of some sort.


Logging Startup Code
Finally if all else fails, try logging in the server startup code. Put calls to LogString() (from wwUtils) into the server's initialization methods - one at the top one at the end of each. OnInit(), OnLoad() and Process(). This should tell you wether these methods are getting fired at all.

Make sure that the location you write the logging file to has permissions to allow the app to write to it. Allow the specific account that you have DCOM/ApplicationPool configured and just to be sure add Everyone.


You'll want to dial back the permissions once you have it running.

If you want I can also take a look at your server and see if I can find anything that doesn't look right. You can contact me here: http://west-wind.com/contact.aspx.

Finally, I know you have 4.68 servers running. There have been a lot of improvements in 5.x in terms of making the security environment a bit easier. As Brett mentioned the managed module handler actually makes a lot of things easier and provides better error reporting among other things.

Not sure if DCOM issues will necessarily be any better, but it might be a good idea going forward to look into moving to version 5.0 just to keep up with the latest versions and fixes... the change over is not 100% transparent, but the changes are isolated to a few system related required updates that can be changed globally in a few key places.

+++ Rick ---



Hi Brett,

Thank you very much - that's awesome advice. I haven't tried the HTTP Handler Module - Is it a feature of WWC5.0? We're only running WWC4.68 at this client. However I'd love to do away with DCOM - when it works it's great but when it goes bad it's usually a complete mystery! So I think I'll look into this HTTPHanlder.

We are fairly certain the fault lies in DCOM. We even got a brand new server that has never our WWC servers installed and installed the system from scratch. It exhibited the exact same problems. We are thinking there is something wrong with a Windows Security Policy or the Windows Account Security - as that appears to be the only thing connecting all the 4 servers. It is of course possible that some MS Patch or other OS software is causing the problem but that's beyond our expertise. The client has pretty much now bought in their top people to try and track this down (with network sniffers and other such tools). File Mode is our only saving grace here (the servers run flawlessly in File Mode).

So fingers crossed the super technical guys can spot something we're missing!

Thanks again
Dan


Dan, I've been using WWC for about 10 years and I've also encountered some odd DCOM problems. My normal procedure for troubleshooting is similar to what you've already done but just in case it might throw up some light bulbs, I'll mention it.

First, before doing anything, I put the servers in file mode and make sure that works. You'd be surprised the number of times just doing that has highlighted an issue I was sure wasn't a problem.

Assuming file mode works fine, my next step is somewhat new but it has proven invaluable. That is, I use the new Http Handler Module (I've actually moved to this permanently based on my problems with DCOM).

Assuming the Http Handler Module works, that means we definitely do have a DCOM issue (even though I might "know" this based on the error, I still follow the process).

Next, I double check all my identities and locations making sure my .ini files line up with each other.

Next I do what you said you've done and unreg / re-reg the server, remove all DCOM entries and create them from scratch.

Then, because it's got me SO MANY times, I make sure to IISRESET, _then_ I see if everything starts working.

It sounds like you've covered all this so far but are still having problems. My one final tip to you is to at least try the Http Handler Module. It's made my life so much easier.


Hi Rick,

I've checked that the AppPool is running under Local System (the highest account). This is running on a W2K3 Server SP2.

The WC.DLL is set to Allowed in IIS Web Service Extensions.

Going to the ShowStatus page the Current Login is SYSTEM.

We've unregistered the DCOM servers and re-registered them. We've even deleted the DCOM registry entries for the server (didn't like doing this) and re-registered them.

In DCOM mode the servers start up under the correct account (as per the DCOM identity page) - we can see them in task manager. They sit there using about 6MB memory. They ignore all commands to End Process (e.g. they don't terminate). After about 2-3 minutes the servers seem to end on their own and then the "An internal exception occurred in the call to LoadServers (COM)" browser message appears - although sometimes the message is "An internal exception occurred in the ISAPI application".

We've disabled virus scanners on the servers. The client says that they have changed nothing on these server and no patches or updates have been installed.

We know something must have changed (three independent servers at the same facility don't all of a sudden have their DCOM fail at the same time) but we're at a loss as to what (and the client doesn't seem to know).

I've been using WWC for many years and although I've run into the occasional glitch with DCOM, I've never encountered something like this. Usually re registering the servers and reconfiguring DCOM and restarting IIS does the trick. But this time we've had 4 people working on the issue for 3.5 days now and just can't figure it out! I just hope it's not some obscure MS patch that some how breaks DCOM with VFP/WWC because we've got lots of other clients with this same setup and would hate to think they're one patch away from this.

Any insight you've got here would be greatly appreciated!

Thanks
Dan

Hmmm... this is an unhandled exception that didn't throw back COM error codes in that case, which is odd. LoadServers definitely should only fail if there's a COM error.

Can you make sure that you're running the right module (ie. either ISAPI or .NET Module) and that it's configured properly for the COM server (ie. has right prog ids).

Also make sure the IIS AppPool is running with a high access account and that whatever account that is hasn't changed either. If you go to:

wc.wc?_maintain~ShowStatus (or use whatever scriptmap is configured)

you should see all the actual logon account information that the app is running under. If necessary switch to file mode first if there are problems making it this far if the app gets hit in the meantime.

+++ Rick ---



Correction - the message in the browser is:

An internal exception occurred in the call to LoadServers (COM)


Hi Rick,

Thanks for your response. Our first thoughts were also the AD account changing in some way. So we got a new account created and reset the DCOM security with it - to no avail. I haven't tried setting the default access permissions yet but I'll do that now.

The browser error message that is being shown (I've never seen this before):

Web Connection Error
An internal exception in the call to LoadServers (COM)

Thanks
Dan


Hi Dan,

It sounds like the permissions on the AD account might have changed. If there was a password change or anything else about that account has changed you'll have to re-apply the DCOM Impersonation to make sure the account is still linked.

Other than that it sounds like the Launch permissions are working, but the Access permissions are not. So check your global DCOM settings and make sure that the account in question has rights to access DCOM components (It's in the Computer level COM+ Properties I believe).

To rule out problems it might be useful to launch the servers with SYSTEM or a local ADMIN account rights to see if permissions indeed are the problem. That should tell you right away...

What does the error message say in the browser when this fails? There should be a COM Error code plus a message that shows up in the browser if it's a DCOM load error.

+++ Rick ---



Hi Rick,

We have a client who have 3 servers - each running the same WWC 4.68 intranet servers (on 32-bit W2K3 Servers). There are two live servers and one test server.

About 2 days ago all 3 DCOM servers stopped loading properly or responding. The client _claims_ that no changes have been made to any of the 3 systems or the related databases. The WWC DCOM servers have been operating perfectly there for several months.

The DCOM settings have been completely reset on all 3 servers. The DCOM servers have been /unregistered and then /registered. A new AD account was also created and assigned to the DCOM identity. The Security of the DCOM servers has been set to Everyone has Local+Remote Launch and Everyone has Local+Remote Access permissions.

The servers operate _perfectly_ in File Mode when running under the same AD account that is assigned to the DCOM. In COM mode the servers do appear to start (they show up in Task Manager) - but they have a much smaller than expected memory footprint and do not respond and cannot be terminated using End Process. These unresponsive DCOM servers seem to shut automatically after a while but this appears to be because wc.dll is terminating them because they are timing out.

IIS has been restarted and the servers have been fully rebooted. The servers are 3 separate machines. Currently we have the Live system operating off a single machine running multiple servers in File Mode. The 2nd live machine is available to us for testing/checking.

These exact same servers and setups are in use at about 7 other sites without this issue.

The wcErrors.txt shows:

2012-11-20 07:39:13:888 Web Connection Request timed out. - ?atlas~start - 0
2012-11-20 08:32:28:836 Web Connection Request timed out. - ?atlas~start - 2
2012-11-20 08:32:28:836 Web Connection Request timed out. - ?atlas~start - 0
2012-11-20 08:39:02:138 An exception occurred in WC.DLL: _maintain~Load
Unloading all servers and reloading... - ?_maintain~Load - 0
2012-11-20 09:45:52:488 Exception in Loading Servers (COM) - 1008
2012-11-20 09:49:12:069 An exception occurred in WC.DLL: atlas~worksheetsavedoc~899219
Unloading all servers and reloading... - ?atlas~worksheetsavedoc~899219 - 0
2012-11-20 09:53:13:072 Exception in Loading Servers (COM) - 1008
2012-11-20 09:55:49:121 An exception occurred in WC.DLL: atlas~worksheetsavedoc~899219
Unloading all servers and reloading... - ?atlas~worksheetsavedoc~899219 - 0
2012-11-20 09:57:49:123 An exception occurred in WC.DLL: atlas~ps~platomain
Unloading all servers and reloading... - ?atlas~ps~platomain - 0

We've exhausted all our expertise on this. Do you have any thoughts on this and/or would you be available to assist the client at what ever your normal rates were (assuming you were available to do this)?

Thanks
Dan






Hi Brett,

Thanks again for all your suggestions!

We had actually assumed the issue could be broken DCOM registration so I actually did completely blow away the DCOM registration (/unregserver followed by manually purging the Prog Ids from the registry). But this didn't fix it this time. I've run into this sort of thing before and purging (via the registry) and reregistering the DCOM usually does the trick.

It's also odd that this issue has occurred on 3 completely seperate servers at the same facility all at the same time. A 4th server at the facility was built and after installing the server from scratch exhibits the exact same problem. So we've got 4 distinct servers all doing the same thing.

The actually server executables haven't changed since July 2012 so it couldn't be a new version etc. In fact, we can't see any changes that occurred that could have suddenly made all the servers stop like this. A broken database or bad supporting file should show up in File Mode or at least the servers should have started up enough to write a simple log entry.

A real head scratcher to be sure!

Thanks
Dan
















Rick Strahl
West Wind Technologies

Making waves on the Web

from Maui, Hawaii

Viewing all articles
Browse latest Browse all 10393

Trending Articles