Hello,
I am running into the following problem:
I have Tomcat application server which I am trying to test under load.
The application is a portal mostly with organizational news, blogs etc. It also has a Content Management System
The application server portal requires windows AD authentication, the CMS does not.
The web portal is approached by 126 running Vusers. I tweaked the http/html script to perform AD authentication, and it is working. There are126 unique AD accounts which are running as individual Vusers. I have set the runtime settings to "simulate a new user each iteration" and "clear cache each iteration". Each iteration takes on average 45 seconds to complete.
The problem now is that the response times measured by Performance Center (version 11.52 patch 1) are way higher then expected and also a lot higher then what I measure when the webpage is visited manually (during load, so when the test is running). For instance when I hit the homepage PC tells me that the average is 9 seconds to load the page while visiting the same page manually for a number of times shows that it is certainly not above 1 second. I think I discovered what causes the mismatch:
while running the test I noticed that the load per LG is actually quite high: 16% while there are only on average 64 Vusers running per LG (I have 2 LG's operational during this test). Of these 64 Vusers 63 are running the http/html script which I
described above and 1 Vuser is running a TruClient FF script which is testing the backend content management system of the same application server. The TruClient script does NOT use AD athentication.
When accessing performance tab of the windows taskmanager I noticed that the windows lsass.exe process (Local Security Authority process) is responsible for 12-13% of the total load. Taking into account that the LG's are actually 8 core machines, this probably means that lsass.exe completely utilizes one whole CPU core. This seems like a bad thing to me.
I think this is what is causing the high values of responsetimes: the exessive CPU load of lsass.exe affects the time measurement of response times.
I executed another test: I reduced the number of AD user accounts to just one and that single account is reused for each Vuser on each iteration. Left the runtime settings the same as described above. The application server itself is fine with this and has no problem if the same AD users approaches the server again and again. What I see on the LG now is an average CPU load of 8% which is way lower then in the previous test. The lsass.exe process is on average responsible for 3% of the total CPU load on the LG. When I look at the measured response time of the access to the homepage it is on average 0.7 seconds and this matches my experience when I approach the homepage manually via a browser.
So my questions:
1. Am I right with my conclusion that the high load of the lsass.exe process negatively affect the response time measurements? Or is the cause something else?
2. What can be done about this? Using more LG's to spread the load?
I hope somebody can help me with this.
Regards,
Peet