java - Understanding high system CPU usage on web server on Windows -
note have couple of close votes post. not sure why since no 1 has commented question how our custom written java server , calls socket reads can interact windows os , produce high system cpu usage, therefore believe correct forum post question rather sysadmin area.
i have custom java server serving tcp connections of couple of different types (some web, raw) , 1 of our customers having issues occasional periods of high cpu + high system cpu.
we may have reproduced on our test server running on windows server 2008 on vmware.
during failover 1 server or during other network events cause number of clients (40-80) re-establish connections simultaneously see high cpu usage including high system cpu (>50% average spikes of on 70%). these clients typically maintain tcp sockets server during failover see perhaps 80-160 sockets being reconnected , serviced.
we have used kernrate profile during 1 of these periods , having hard time understanding results.
kernrate reports low idle time , high kernel time expect during period:
p0 k 0:00:39.405 (55.9%) u 0:00:27.424 (38.9%) 0:00:03.603 ( 5.1%) dpc 0:00:01.528 ( 2.2%) interrupt 0:00:00.327 ( 0.5%) interrupts= 114880, interrupt rate= 1631/sec. p1 k 0:00:37.596 (53.4%) u 0:00:29.281 (41.6%) 0:00:03.556 ( 5.0%) dpc 0:00:00.078 ( 0.1%) interrupt 0:00:00.624 ( 0.9%) interrupts= 96016, interrupt rate= 1363/sec. total k 0:01:17.002 (54.7%) u 0:00:56.706 (40.3%) 0:00:07.160 ( 5.1%) dpc 0:00:01.606 ( 1.1%) interrupt 0:00:00.951 ( 0.7%) total interrupts= 210896, total interrupt rate= 2994/sec. total profile time = 70434 msec
there plenty of physical ram available both host , guest , perfmon reports page faults of 0 200/sec maybe couple of spikes during 70s period of 800/sec. java heap set 3gb doesn't come close edge of during problem period (or @ all). don't believe doing garbage collection during time.
bytesstart bytesstop bytesdiff. available physical memory , 6296776704, 6266650624, -30126080 available pagefile(s) , 12591370240, 12577845248, -13524992 available virtual , 8796052869120, 8796048171008, -4698112 available extended virtual , 0, 0, 0 committed memory bytes , 305979392, 303964160, -2015232 non paged pool usage bytes , 81997824, 82333696, 335872 paged pool usage bytes , 164028416, 164057088, 28672 paged pool available bytes , 4147101696, 4149116928, 2015232 free system ptes , 33556099, 33556099, 0
we have made efforts consolidate i/o calls , increase average bytes per i/o operation. though don't seem unreasonable:
total avg. rate context switches , 462496, 6566/sec. system calls , 835593, 11863/sec. page faults , 32814, 466/sec. i/o read operations , 747, 11/sec. i/o write operations , 3792, 54/sec. i/o other operations , 27565, 391/sec. i/o read bytes , 382146, 512/ i/o i/o write bytes , 684128, 180/ i/o i/o other bytes , 890365, 32/ i/o
kernrate reports vast majority of time spent in ntoskrnl module , rate of 10 million events per second:
outputresults: kernelmodulecount = 131 percentage in following table based on total hits kernel profiletime 12806 hits, 65536 events per hit -------- module hits msec %total events/sec ntoskrnl 10561 70429 82 % 9827282 amdppm 1001 70429 7 % 931456 hal 701 70429 5 % 652298 vmxnet3n61x64 351 70428 2 % 326619 win32k 68 70428 0 % 63276 tcpip 35 70428 0 % 32568 afd 19 70428 0 % 17680 netio 17 70429 0 % 15818 vm3dmp 11 70428 0 % 10235 ntfs 6 70429 0 % 5583 ndis 6 70429 0 % 5583 lsi_sas 6 70429 0 % 5583 fltmgr 6 70429 0 % 5583 vmmouse 5 70429 0 % 4652 i8042prt 5 70429 0 % 4652 vmci 3 70429 0 % 2791 cdd 2 70428 0 % 1861 dxgmms1 1 70428 0 % 930 vmhgfs 1 70429 0 % 930 nsiproxy 1 70428 0 % 930
with breakdown in ntoskrnl module being follows:
profiletime 10561 hits, 65536 events per hit -------- module hits msec %total events/sec kesynchronizeexecution 2485 70429 23 % 2312356 exreleaserundownprotectioncacheawareex 2099 70429 19 % 1953173 iogetpagingiopriority 1561 70429 14 % 1452550 memmove 1006 70429 9 % 936108 rtldelete 952 70429 9 % 885860 memset 448 70429 4 % 416875 exfacquirepushlockexclusive 326 70429 3 % 303351 postartnextpowerirp 280 70429 2 % 260547 rtlfindclearbits 239 70429 2 % 222395 kebugcheckex 199 70429 1 % 185174 kewaitformultipleobjects 92 70429 0 % 85608 fsrtlteardownperfilecontexts 67 70429 0 % 62345 kesettimer 62 70429 0 % 57692 ntwaitforsingleobject 59 70429 0 % 54901 wcsncat_s 58 70429 0 % 53970
what don't understand here how seems reasonable number of i/o operations per second (<500/sec if understand kernrate properly?) translates millions of events kesynchronizeexecution
, exreleaserundownprotectioncacheawareex
, iogetpagingiopriority
, presumably source of high kernel cpu?
i have searched more information on these functions information thin on ground , can translate high cpu usage regarding them.
Comments
Post a Comment