java - Understanding high system CPU usage on web server on Windows -


note have couple of close votes post. not sure why since no 1 has commented question how our custom written java server , calls socket reads can interact windows os , produce high system cpu usage, therefore believe correct forum post question rather sysadmin area.

i have custom java server serving tcp connections of couple of different types (some web, raw) , 1 of our customers having issues occasional periods of high cpu + high system cpu.

we may have reproduced on our test server running on windows server 2008 on vmware.

during failover 1 server or during other network events cause number of clients (40-80) re-establish connections simultaneously see high cpu usage including high system cpu (>50% average spikes of on 70%). these clients typically maintain tcp sockets server during failover see perhaps 80-160 sockets being reconnected , serviced.

we have used kernrate profile during 1 of these periods , having hard time understanding results.

kernrate reports low idle time , high kernel time expect during period:

p0     k 0:00:39.405 (55.9%)  u 0:00:27.424 (38.9%)  0:00:03.603 ( 5.1%)  dpc 0:00:01.528 ( 2.2%)  interrupt 0:00:00.327 ( 0.5%)        interrupts= 114880, interrupt rate= 1631/sec.  p1     k 0:00:37.596 (53.4%)  u 0:00:29.281 (41.6%)  0:00:03.556 ( 5.0%)  dpc 0:00:00.078 ( 0.1%)  interrupt 0:00:00.624 ( 0.9%)        interrupts= 96016, interrupt rate= 1363/sec.  total  k 0:01:17.002 (54.7%)  u 0:00:56.706 (40.3%)  0:00:07.160 ( 5.1%)  dpc 0:00:01.606 ( 1.1%)  interrupt 0:00:00.951 ( 0.7%)        total interrupts= 210896, total interrupt rate= 2994/sec.  total profile time = 70434 msec 

there plenty of physical ram available both host , guest , perfmon reports page faults of 0 200/sec maybe couple of spikes during 70s period of 800/sec. java heap set 3gb doesn't come close edge of during problem period (or @ all). don't believe doing garbage collection during time.

                               bytesstart          bytesstop         bytesdiff. available physical memory   ,      6296776704,      6266650624,       -30126080 available pagefile(s)       ,     12591370240,     12577845248,       -13524992 available virtual           ,   8796052869120,   8796048171008,        -4698112 available extended virtual  ,               0,               0,  0 committed memory bytes      ,       305979392,       303964160,        -2015232 non paged pool usage bytes  ,        81997824,        82333696,          335872 paged pool usage bytes      ,       164028416,       164057088,           28672 paged pool available bytes  ,      4147101696,      4149116928,         2015232 free system ptes            ,        33556099,        33556099,  0 

we have made efforts consolidate i/o calls , increase average bytes per i/o operation. though don't seem unreasonable:

                              total      avg. rate context switches     ,       462496,         6566/sec. system calls         ,       835593,         11863/sec. page faults          ,        32814,         466/sec. i/o read operations  ,          747,         11/sec. i/o write operations ,         3792,         54/sec. i/o other operations ,        27565,         391/sec. i/o read bytes       ,       382146,         512/ i/o i/o write bytes      ,       684128,         180/ i/o i/o other bytes      ,       890365,         32/ i/o 

kernrate reports vast majority of time spent in ntoskrnl module , rate of 10 million events per second:

outputresults: kernelmodulecount = 131 percentage in following table based on total hits kernel profiletime   12806 hits, 65536 events per hit --------  module                                hits   msec  %total  events/sec ntoskrnl                              10561      70429    82 %     9827282 amdppm                                 1001      70429     7 %      931456 hal                                     701      70429     5 %      652298 vmxnet3n61x64                           351      70428     2 %      326619 win32k                                   68      70428     0 %       63276 tcpip                                    35      70428     0 %       32568 afd                                      19      70428     0 %       17680 netio                                    17      70429     0 %       15818 vm3dmp                                   11      70428     0 %       10235 ntfs                                      6      70429     0 %        5583 ndis                                      6      70429     0 %        5583 lsi_sas                                   6      70429     0 %        5583 fltmgr                                    6      70429     0 %        5583 vmmouse                                   5      70429     0 %        4652 i8042prt                                  5      70429     0 %        4652 vmci                                      3      70429     0 %        2791 cdd                                       2      70428     0 %        1861 dxgmms1                                   1      70428     0 %         930 vmhgfs                                    1      70429     0 %         930 nsiproxy                                  1      70428     0 %         930 

with breakdown in ntoskrnl module being follows:

profiletime   10561 hits, 65536 events per hit --------  module                                hits   msec  %total  events/sec kesynchronizeexecution                 2485      70429    23 %     2312356 exreleaserundownprotectioncacheawareex       2099      70429    19 %     1953173  iogetpagingiopriority                  1561      70429    14 %     1452550 memmove                                1006      70429     9 %      936108 rtldelete                               952      70429     9 %      885860 memset                                  448      70429     4 %      416875 exfacquirepushlockexclusive             326      70429     3 %      303351 postartnextpowerirp                     280      70429     2 %      260547 rtlfindclearbits                        239      70429     2 %      222395 kebugcheckex                            199      70429     1 %      185174 kewaitformultipleobjects                 92      70429     0 %       85608 fsrtlteardownperfilecontexts             67      70429     0 %       62345 kesettimer                               62      70429     0 %       57692 ntwaitforsingleobject                    59      70429     0 %       54901 wcsncat_s                                58      70429     0 %       53970 

what don't understand here how seems reasonable number of i/o operations per second (<500/sec if understand kernrate properly?) translates millions of events kesynchronizeexecution, exreleaserundownprotectioncacheawareex , iogetpagingiopriority, presumably source of high kernel cpu?

i have searched more information on these functions information thin on ground , can translate high cpu usage regarding them.


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -

c++ - Migration from QScriptEngine to QJSEngine -