Question : Weird multi-user Access performance problem

(I'm sorry this post is so long, but I wanted to be sure to mention the sorts of things that get asked in follow-up posts, so as not to waste anyone's time)

I have a perplexing performance problem in a multi-user Access application that has me stumped.  I have done exhaustive research on this topic, and I believe all the usual "best practices" from the many previous posts on this topic (especially Tony's tips at granite.ab.ca) were already implemented and have been double-checked recently.  I've been developing and supporting multi-user Access applications for many years, and I have never seen anything like this.

Here's the setup:

An unsecured Access 2000 MDE (split FE/BE), developed in Access 2000 and deployed as an Access 2000 MDE on each client PC; it is decompiled, compacted and repaired prior to each release.  The app maintains a persistent connection the BE.  Name autocorrect is off, record-level locking is off, default open mode is shared.  In modules and classes, ADO is used extensively, DAO 3.6 only rarely.  All ADO connections use the CurrentProject.Connection object.  The data structure is highly normalized with approximately 65 tables all participating in RI.  Subdatasheets are off in the BE.  The application is well-indexed, and is not a heavy data entry application (probably less than 100 records a day) - mostly users browse data, make periodic edits and inserts, and run reports.  All forms are bound, but none are linked directly to the full table - every form opens only after the user has requested a specific record or small set of records.  

This app is currently installed at 13 different organizations, most of whom use XP Pro and Access 2000 or Access 2003, with the BE on a LAN shared drive.  Most sites have between 5 and15 simultaneous users. All sites except ONE have perfectly acceptable performance running the app all day long, and have had for years.  

The problem is that at only one client organization, the application launches and runs with acceptable performance in a multi-user setting for between 10 and 30 minutes, after which point the performance becomes "unacceptable" for at least one user, and eventually for all users, until they exit and re-launch the app.  ("Fast" is seeing the primary forms opening in 1-3 seconds, and "slow" is between 6 and 10 seconds.)

The "problem" site has 4 users with new PCs with 1 gig RAM, all running Windows XP Pro SP2 with Office 2003 11.6566.8122 SP 2, msjet40.dll version 4.0.8618.0.  The data file is on a shared drive on a DL 370 server with 2 gig of RAM.  The data file is named in 8.3 format (QB891176) and is located just one level down the directory tree (e.g. R\AppName\AppData.mdb).  All users have R/W/D permission on that folder.  The IT staff at the client site has added the registry key for "shared file access delay" (QB150384).  We are not seeing any data corruption, nor are we getting any locking conflicts.  IT staff has said that no other major processes are running on this server.

To check for any hidden corruption in the data file, we created a new empty database, imported all the tables, and re-linked.  The BE data file is approximately 280 meg and is regularly compacted and repaired.  This client's data file is larger than any other organization using the app, but not by that much.

When multiple user are running the app and one has slow performance, the other users are not instantly affected; it may be another 10 or 15 minutes before any of the other users experience the slowdown.  When users exit Access and restart, they have acceptable performance again, until the next slowdown occurs.  We saw a user regain performance by just exiting the MDE and staying in Access, then re-opening the MDE, but we have not extensively tested this scenario.  We have not asked users to stay in the “slow” mode indefinitely to see if they spontaneously regain performance.

This organization has a large physical plant with an extensive network.  In order to rule out network infrastructure as a factor, the IT staff created a discrete LAN by moving the current server to the office where the users are located and hooking them up to the server directly with a switch, but we still saw the same performance drag after 10-15 minutes.

I am completely stumped - if there is a serious problem in how my application is designed or coded , I know I would be hearing complaints from other organizations, but we've checked, and the other 12 sites are seeing acceptable performance, so logically, it seems like it must be something site-specific, or a problem in my app that is only apparent here because of some site-specific setting or condition.  

The fact that performance goes back up (temporarily) after users exit makes me suspect some sort of resource is being used up and then released, but how can I determine what it would be specific to this site?  Access memory usage in Task Manager starts out at about 30,000k and is at only about 50,000k when the user experiences the slowdown, so it doesn't seem like it is specifically Access that is using up resources.  I saw some posts in 2002 about a memory leak with Jet, but it looked like it had been fixed in recent SPs.

I believe that every ADO and DAO object that I open is closed and set to nothing, of course it is possible that somewhere in the many, many lines of code I've missed a few, but wouldn't I be seeing the same performance problem at other sites?  

I don't think it is related the absolute size of the BE because when the app is fast, it's very fast.  We also transferred a scrambled version to another client site and ran it there and performance for two users was fine (although we couldn't really take up their time to do extensive testing).  I'm going to take it to a colleague's office and try testing on his server (I get very good performance using their data on my peer-to-peer setup).

We are not seeing any corruption, so I haven't asked the IT people to mess around with OpsLocks, but maybe I should try that next?

This client is running enterprise-wide Symantec real-time virus scan - could this be a factor?  Does anyone know what sort of settings I should be looking at or directing the IT people to look at?

I have been collecting user-specific data for 2 days (my central error handler has been writing time in and out of every function) and I will be receiving those files Monday for analysis, but I'm not sure what I should be looking for.

A million thanks in advance for any advice.

- Ann

Answer : Weird multi-user Access performance problem

Hi Ann,
That must be the most intelligent and concise question I've seen here. And after all of that, I hesitate to help you "guess" at what might be the issue. But it seems suggestions is what you're after here...so
This Q talks about how the Symantec product causes slow downs. And of course might not be direcly affecting your app, but it's something else to know about.

http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/2003_Server/Q_21994799.html

From a troubleshooting standpoint, you've narrowed it down to environment in my opinion. Not network, not app apparently (Otherwise it would occur elsewhere) You say the server isn't running much of anything else, and God forbid if it's a shared exchange server (not mentioned)...so since there is only one shared resource that affects all users, that means the server.

That been said...and I believe you came to the same conclusion...something on the server, beit settings, storage, IO, Memory or Storate....or Antivirus is affecting your well oiled and quick fast application.

Just my opinion.
J
Random Solutions  
 
programming4us programming4us