Tag Archive for 'Memory Leak'

[debugging] Memory leak을 찾아 내는 삽질…..

서버 프로그램을 할때 가장 중요한 요소중 하나는 Memory leak 과 같은 현상에 대한 대비를 철저히 해야한다는 점입니다. Device Driver 의 경우 Pool Tagging이나 Driver Verifier를 통해서 이러한 현상을 비교적 쉽게 (??)해결 할 수 있지요 . 하지만 어플리케이션 서버 프로그램에서는 쉽지 않은 과제 입니다. 특히나 IOCP를 이용한 서버 프로그램의 경우 Page Heap 을 이용하여 디버깅이 쉽지 않기 때문에 더더욱 그렇지요 ( IOCP를 이용하여 프로그램 할 경우 APC( APC Delivery)를 통해서 동작하기 때문에 DPH Block 관련 Critical Section과 Deadlock이 발생할 수 있습니다. )

다행이도 Windbg에서는 굉자히 쉽게 이러한 Memory Leak을 찾아주는 Command가 있지요 . !heap -l 이라는 녀석을 통해서 보면 각각의 CRT Heap , Process Heap 등으로 부터 Leak으로 생각되는 부분을 손쉽게 검색해줍니다. 물론 시간을 조금 걸립니다. ( 제 PC는 별로 좋지 않아서 30분이상 소요되는 경우도 많습니다. )

0:032> !heap -l
Searching the memory for potential unreachable busy blocks.
Heap 00140000
Heap 00240000
Heap 00370000
Heap 00390000
Heap 003c0000
Heap 00920000
Heap 003e0000
Heap 00b20000
Heap 00b60000
Heap 00ba0000
Heap 00be0000
Heap 00c20000
Heap 00c60000
Heap 00ca0000
Heap 00d00000
Heap 00d40000
Heap 00d50000
Heap 011b0000
Heap 011f0000
Heap 01200000
Heap 01240000
Heap 011a0000
Heap 01420000
ERROR: Block 08e93908 previous size c60b does not match previous block size 10b
HEAP 01420000 (Seg 08d60000) At 08e93908 Error: invalid block Previous

ERROR: Block 0ad07b38 previous size 800b does not match previous block size 10b
HEAP 01420000 (Seg 0a160000) At 0ad07b38 Error: invalid block Previous

Heap 01450000
Heap 01490000
Heap 014d0000
Heap 01510000
Heap 01550000
Heap 01590000
Heap 015d0000
Heap 01610000
Heap 01a80000
Heap 01b90000
Heap 01c30000
Heap 01d30000
Heap 01d50000
Heap 01d90000
Heap 01e00000
Heap 01e40000
Heap 024c0000
Heap 02500000
Heap 02540000
Heap 02580000
Heap 025c0000
Heap 02600000
Heap 02640000
Heap 02680000
Heap 026c0000
Heap 02700000
Heap 01de0000
Heap 073c0000
Heap 07a00000
Heap 07a40000
Heap 08880000
Heap 088c0000
Heap 08900000
Heap 00a70000
Scanning VM …
Scanning references from 922087 busy blocks (1792 MBytes) …
Entry User Heap Segment Size PrevSize Unused Flags
—————————————————————————–
00140000 00140008 00140000 00140000 640 0 0 busy
00140640 00140648 00140000 00140000 40 640 0 busy
00140680 00140688 00140000 00140000 1808 40 8 busy
00141e88 00141e90 00140000 00140000 20 1808 8 busy
00141ea8 00141eb0 00140000 00140000 58 20 8 busy
00141f00 00141f08 00140000 00140000 58 58 8 busy
00141f58 00141f60 00140000 00140000 210 58 8 busy
00142168 00142170 00140000 00140000 228 210 e busy
00142390 00142398 00140000 00140000 10 228 8 busy
001423a0 001423a8 00140000 00140000 a8 10 c busy

2058c9a0 2058c9a8 01420000 20500000 858 858 8 busy
2058d1f8 2058d200 01420000 20500000 858 858 8 busy
2058da50 2058da58 01420000 20500000 858 858 8 busy
2058e2a8 2058e2b0 01420000 20500000 858 858 8 busy
2058eb00 2058eb08 01420000 20500000 858 858 8 busy
2058f358 2058f360 01420000 20500000 858 858 8 busy
2058fbb0 2058fbb8 01420000 20500000 858 858 8 busy
20590408 20590410 01420000 20510000 858 858 8 busy
20590c60 20590c68 01420000 20510000 858 858 8 busy
205914b8 205914c0 01420000 20510000 858 858 8 busy
20591d10 20591d18 01420000 20510000 858 858 8 busy
20592568 20592570 01420000 20510000 858 858 8 busy
20592dc0 20592dc8 01420000 20510000 858 858 8 busy
20593618 20593620 01420000 20510000 858 858 8 busy
20593e70 20593e78 01420000 20510000 858 858 8 busy
205946c8 205946d0 01420000 20510000 858 858 8 busy
20594f20 20594f28 01420000 20510000 858 858 8 busy
20595778 20595780 01420000 20510000 858 858 8 busy
20595fd0 20595fd8 01420000 20510000 858 858 8 busy
20596828 20596830 01420000 20510000 858 858 8 busy
20597080 20597088 01420000 20510000 858 858 8 busy
205978d8 205978e0 01420000 20510000 858 858 8 busy
20598130 20598138 01420000 20510000 858 858 8 busy
20598988 20598990 01420000 20510000 858 858 8 busy
205991e0 205991e8 01420000 20510000 858 858 8 busy
20599a38 20599a40 01420000 20510000 858 858 8 busy
2059a290 2059a298 01420000 20510000 858 858 8 busy
2059aae8 2059aaf0 01420000 20510000 858 858 8 busy
2059b340 2059b348 01420000 20510000 858 858 8 busy
2059bb98 2059bba0 01420000 20510000 858 858 8 busy
2059c3f0 2059c3f8 01420000 20510000 858 858 8 busy
2059cc48 2059cc50 01420000 20510000 858 858 8 busy
2059d4a0 2059d4a8 01420000 20510000 858 858 8 busy
2059dcf8 2059dd00 01420000 20510000 858 858 8 busy
2059e550 2059e558 01420000 20510000 858 858 8 busy
2059eda8 2059edb0 01420000 20510000 858 858 8 busy
2059f600 2059f608 01420000 20510000 858 858 8 busy
2059fe58 2059fe60 01420000 20510000 858 858 8 busy
205a06b0 205a06b8 01420000 20520000 858 858 8 busy
205a0f08 205a0f10 01420000 20520000 858 858 8 busy
205a1760 205a1768 01420000 20520000 858 858 8 busy
205a1fb8 205a1fc0 01420000 20520000 858 858 8 busy
205a2810 205a2818 01420000 20520000 858 858 8 busy
205a3068 205a3070 01420000 20520000 858 858 8 busy
205a38c0 205a38c8 01420000 20520000 858 858 8 busy
205a4118 205a4120 01420000 20520000 858 858 8 busy
205a4970 205a4978 01420000 20520000 858 858 8 busy
205a51c8 205a51d0 01420000 20520000 858 858 8 busy
205a5a20 205a5a28 01420000 20520000 858 858 8 busy
205a6278 205a6280 01420000 20520000 858 858 8 busy
205a6ad0 205a6ad8 01420000 20520000 858 858 8 busy
205a7328 205a7330 01420000 20520000 858 858 8 busy
205a7b80 205a7b88 01420000 20520000 858 858 8 busy
205a83d8 205a83e0 01420000 20520000 858 858 8 busy
205a8c30 205a8c38 01420000 20520000 858 858 8 busy
205a9488 205a9490 01420000 20520000 858 858 8 busy
205a9ce0 205a9ce8 01420000 20520000 858 858 8 busy
205aa538 205aa540 01420000 20520000 858 858 8 busy
205aad90 205aad98 01420000 20520000 858 858 8 busy
205ab5e8 205ab5f0 01420000 20520000 858 858 8 busy
205abe40 205abe48 01420000 20520000 858 858 8 busy
205ac698 205ac6a0 01420000 20520000 858 858 8 busy
205acef0 205acef8 01420000 20520000 858 858 8 busy
205ad748 205ad750 01420000 20520000 858 858 8 busy
205adfa0 205adfa8 01420000 20520000 858 858 8 busy

7505eed8 7505eee0 01420000 74cf0000 858 858 8 busy
7505f730 7505f738 01420000 74cf0000 858 858 8 busy
7505ff88 7505ff90 01420000 74cf0000 858 858 8 busy
750607e0 750607e8 01420000 74d00000 858 858 8 busy
75061038 75061040 01420000 74d00000 858 858 8 busy
75061890 75061898 01420000 74d00000 858 858 8 busy
750620e8 750620f0 01420000 74d00000 858 858 8 busy
75062940 75062948 01420000 74d00000 858 858 8 busy
75063198 750631a0 01420000 74d00000 858 858 8 busy
750639f0 750639f8 01420000 74d00000 858 858 8 busy
75064248 75064250 01420000 74d00000 858 858 8 busy
75064aa0 75064aa8 01420000 74d00000 858 858 8 busy
750652f8 75065300 01420000 74d00000 858 858 8 busy
75065b50 75065b58 01420000 74d00000 858 858 8 busy
750663a8 750663b0 01420000 74d00000 858 858 8 busy
75066c00 75066c08 01420000 74d00000 858 858 8 busy
75067458 75067460 01420000 74d00000 858 858 8 busy
75067cb0 75067cb8 01420000 74d00000 858 858 8 busy
75068508 75068510 01420000 74d00000 858 858 8 busy

922087 potential unreachable blocks were detected.

이 Dump File ( Application 강제 Dump 입니다. )에서는 굉장히 많은 Block 이 Leak으로 발생하고 있습니다. 858이라는 사이즈로 말이죠.

0:032> dd 2058c9a0
2058c9a0  010b010b 080801d2 00000000 00000000
2058c9b0  00000000 00000000 00000000 2058c164
2058c9c0  2058d214 0142427c 00000000 00000004
2058c9d0  000000d7 00000800 2058c9ec 0142426c
2058c9e0  00000001 00000800 00000005 00000001
2058c9f0  00000063 00000000 00000000 00000000
2058ca00  00000000 00000000 00000000 00000000
2058ca10  00000000 00000000 00000000 00000000

0:032> dd 7505eed8
7505eed8  010b010b 3608013d 00000000 00000000
7505eee8  00000000 00000000 00000000 7505e69c
7505eef8  7505f74c 0142427c 00000000 00000004
7505ef08  00000800 00000800 7505ef24 0142426c
7505ef18  00000001 00000800 00000005 00000001
7505ef28  00000063 00000000 00000000 00000000
7505ef38  00000000 00000000 00000000 00000000
7505ef48  00000000 00000000 00000000 00000000

이러한 어려운 상황에서 Memory Leak이 발생한 곳을 찾아 가기 위해서는 랜덤하게 여러개의 Block의 메모리를 열어 보면서 공통이 되는 부분을 찾는 것이 가장 중요합니다. 물론 Leak이 발생한 Block이 어떠한 String을 담고 있다면 이보다 좋은 경우는 없습니다. ( 위의 경우는 Memory Block에 스트링이 존재하지 안습니다. ) 두개의 Block을 비교해 보면 공통적으로 가지는 데이터가 있습니다.  이 점에서 추정할 수 있는 것은 이 메모리는 Struct 나 Class를 할당 한 것으로 추정이 가능합니다.  그리고 파란색으로 표신된 Address의 경우 Open된 Memory와 비슷한 영역에 할당된것으로 보여지는 Address값이 들어 있습니다.

0:032> !address 7505e69c
    75000000 : 75000000 - 001be000
                    Type     00020000 MEM_PRIVATE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsag

0:032> !address 7505f74c
    75000000 : 75000000 - 001be000
                    Type     00020000 MEM_PRIVATE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsageHeap
                    Handle   01420000

역시 같은 Heap Handle을 가지는 영역에 할당된 메모리군요. 이 점에서 두개의 메모리값이 연속적으로 나타난다는 것은 이 메모리가 Double Linked List로 연결되어있지 않을까 하고 생각해 볼 수 있죠 .

0:032> dl 7505e69c
7505e69c  7505de44 7505eef4 0142427c 00000000
7505de44  7505d5ec 7505e69c 0142427c 00000000
7505d5ec  7505cd94 7505de44 0142427c 00000000
7505cd94  7505c53c 7505d5ec 0142427c 00000000
7505c53c  7505bce4 7505cd94 0142427c 00000000
7505bce4  7505b48c 7505c53c 0142427c 00000000
7505b48c  7505ac34 7505bce4 0142427c 00000000
7505ac34  7505a3dc 7505b48c 0142427c 00000000
7505a3dc  75059b84 7505ac34 0142427c 00000000
75059b84  7505932c 7505a3dc 0142427c 00000000
7505932c  75058ad4 75059b84 0142427c 00000000
75058ad4  7505827c 7505932c 0142427c 00000000
7505827c  75057a24 75058ad4 0142427c 00000000
75057a24  750571cc 7505827c 0142427c 00000000
750571cc  75056974 75057a24 0142427c 00000000
75056974  7505611c 750571cc 0142427c 00000000
7505611c  750558c4 75056974 0142427c 00000000
750558c4  7505506c 7505611c 0142427c 00000000
7505506c  75054814 750558c4 0142427c 00000000
75054814  75053fbc 7505506c 0142427c 00000000
75053fbc  75053764 75054814 0142427c 00000000
75053764  75052f0c 75053fbc 0142427c 00000000
75052f0c  750526b4 75053764 0142427c 00000000
750526b4  75051e5c 75052f0c 0142427c 00000000
75051e5c  75051604 750526b4 0142427c 00000000
75051604  75050dac 75051e5c 0142427c 00000000
75050dac  75050554 75051604 0142427c 00000000
75050554  7504fcfc 75050dac 0142427c 00000000
7504fcfc  7504f4a4 75050554 0142427c 00000000
7504f4a4  7504ec4c 7504fcfc 0142427c 00000000
7504ec4c  7504e3f4 7504f4a4 0142427c 00000000
7504e3f4  7504db9c 7504ec4c 0142427c 00000000

Memory의 뒷 부분에 공통데이터를 가지는 것으로 보아 Leak이 발생한 Memory는 Double Linked List로 연결된 것으로 추정이 가능합니다.

이제 정리해 보면 Memory leak이 발생한 Block은 0×858정도의 Size를 가지며 몇가지 공통 데이터를 가지고 있고 결정적으로 Double Linked List를 포함하는 Struct나 Class가 Memory Leak을 발생시키는 것으로 추측이 가능하죠 .

코드를 보면서 이조건에 맞는 녀석을 검색하면 답은 나오겠군요 ^ ^

[debugging tip] BugCheck 0×8E , 가상메모리 부족 ? Kernel and User Shared Memory

※ 덤프상의 드라이버의 이름은 업무상 관련이 있어 임의로 수정하였습니다.

얼마전에 업체에서 굉장히 잼있는 덤프가 하나 수집 됐습니다. 잠시 상황을 들어 보니 가상 메모리 부족이 나오면서 잠시후에 Blue Screen 이 나왔다더군요 . 0×8E BugCheck은 굉장히 다양한 경우에 발생하는데 보통은 Pool Corruption, Interrupt Handling, Deleted Object Access 등에서 가장 많이 발생합니다. 하지만 아래의 경우는 조금 독특합니다.

1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0×80000003. This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG. This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but …
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG. This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000017, The exception code that was not handled
Arg2: 8083c8c9, The address that the exception occurred at
Arg3: b8b53790, Trap Frame
Arg4: 00000000

Debugging Details:
——————

EXCEPTION_CODE: (NTSTATUS) 0xc0000017 - {

FAULTING_IP:
nt!MiMapLockedPagesInUserSpace+3d7
8083c8c9 8d8728020000 lea eax,[edi+228h]

TRAP_FRAME: b8b53790 — (.trap 0xffffffffb8b53790)
ESP EDITED! New esp=b8b53b40
ErrCode = 00000000
eax=00000000 ebx=8863a448 ecx=e20a0001 edx=e2090000 esi=89573860 edi=8863a368
eip=8083c8c9 esp=b8b53804 ebp=b8b53b80 iopl=0 nv up ei pl zr na pe cy
cs=0000 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000247
nt!MiMapLockedPagesInUserSpace+0×3d7:
8083c8c9 8d8728020000 lea eax,[edi+228h]
Resetting default scope

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0×8E

PROCESS_NAME: MyApp1.exe

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 8082d800 to 80827c63
…. 생략

1: kd> kvn
# ChildEBP RetAddr Args to Child
00 b8b5301c 8082d800 0000008e c0000017 8083c8c9 nt!KeBugCheckEx+0×1b (FPO: [Non-Fpo])
01 b8b533e0 80831136 b8b53ae4 00000000 b8b53790 nt!KiDispatchException+0×3a2 (FPO: [Non-Fpo])
02 b8b53760 8088d143 b8b53ae4 b8b53814 00000000 nt!KiRaiseException+0×120 (FPO: [Non-Fpo])
03 b8b5377c 8088978c b8b53ae4 b8b53814 00000000 nt!NtRaiseException+0×33
04 b8b5377c 8083c8c9 b8b53ae4 b8b53814 00000000 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame-EDITED @ b8b53790)
05 b8b53b80 8085336e 895263e8 8863a448 c0000017 nt!MiMapLockedPagesInUserSpace+0×3d7 (FPO: [Non-Fpo])

06 b8b53bd8 f77db030 895263e8 00000001 00000001 nt!MmMapLockedPagesSpecifyCache+0×480 (FPO: [Non-Fpo])

07 b8b53c00 f77d7bd6 8811d000 88261024 8826102c MyDriver1!XXMapSharedMemRing3+0×48 (FPO: [Non-Fpo]) (CONV: stdcall)
08 b8b53c20 f77d8566 8b2e3f18 89820008 b8b53c50 MyDriver1!CreateQueueHandler+0×62 (FPO: [2,0,0]) (CONV: stdcall)
09 b8b53c30 f77d872d 8b2e3f18 89820008 8081df65 MyDriver1!AppDeviceControlDispatch+0×146 (FPO: [Non-Fpo]) (CONV: stdcall)
0a b8b53c3c 8081df65 8b2e3f18 89820008 89619f90 MyDriver1!XXRGACDispatch+0×2d (FPO: [2,0,0]) (CONV: stdcall)
0b b8b53c50 808f5437 89820078 89619f90 89820008 nt!IofCallDriver+0×45 (FPO: [Non-Fpo])
0c b8b53c64 808f61bf 8b2e3f18 89820008 89619f90 nt!IopSynchronousServiceTail+0×10b (FPO: [Non-Fpo])
0d b8b53d00 808eed08 00009330 00000000 00000000 nt!IopXxxControlFile+0×5e5 (FPO: [Non-Fpo])
0e b8b53d34 8088978c 00009330 00000000 00000000 nt!NtDeviceIoControlFile+0×2a (FPO: [Non-Fpo])
0f b8b53d34 7c9685ec 00009330 00000000 00000000 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b8b53d64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
10 0012f29c 00000000 00000000 00000000 00000000 0×7c9685ec

1: kd> dps b8b53b80 b8b53c00
b8b53b80 b8b53bd8
b8b53b84 8085336e nt!MmMapLockedPagesSpecifyCache+0×480
b8b53b88 895263e8
b8b53b8c 8863a448
b8b53b90 c0000017
b8b53b94 00000000
b8b53b98 00000000
b8b53b9c 8826102c
b8b53ba0 8953def8
b8b53ba4 00000000
b8b53ba8 000007ff
b8b53bac f7517000
b8b53bb0 00000250
b8b53bb4 b8b53bdc
b8b53bb8 8081da60 nt!IoAllocateMdl+0xb4
b8b53bbc 00000000
b8b53bc0 0000029d
b8b53bc4 206c644d
b8b53bc8 00000000
b8b53bcc 88261008
b8b53bd0 00000000
b8b53bd4 00000078
b8b53bd8 b8b53c00
b8b53bdc f77db030 MyDriver1!XXMapSharedMemRing3+0×48
b8b53be0 895263e8
b8b53be4 00000001
b8b53be8 00000001
b8b53bec 00000000
b8b53bf0 00000000
b8b53bf4 00000010

b8b53bf8 89820008
b8b53bfc 88261008
b8b53c00 b8b53c30

MmMapLockedPagesSpecifyCache 호출시의 4번째 인자를 보면 NULL로 세팅이 되어 있는걸 확인 할 수 있습니다. 이것으로 볼때 Base Address는 System이 선택한다는 것을 알 수 있습니다. 그리고 그렇게 얻어진 Base Address는 Return Value로 넘어 오게 됩니다. 하지만 이러한 과정에서 BSOD가 나왔다는것은 UserLand에 Memory를 할당하지 못해서 발생했을 가능성이 높습니다.

1: kd> !vm

*** Virtual Memory Usage ***
Physical Memory: 1048243 ( 4192972 Kb)
Page File: \??\C:\pagefile.sys
Current: 2209792 Kb Free Space: 246392 Kb
Minimum: 2095104 Kb Maximum: 4190208 Kb
Available Pages: 646438 ( 2585752 Kb)
ResAvail Pages: 951144 ( 3804576 Kb)
Locked IO Pages: 135 ( 540 Kb)
Free System PTEs: 183235 ( 732940 Kb)
Free NP PTEs: 32766 ( 131064 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 89 ( 356 Kb)
Modified PF Pages: 89 ( 356 Kb)
NonPagedPool Usage: 14116 ( 56464 Kb)
NonPagedPool Max: 65279 ( 261116 Kb)
PagedPool 0 Usage: 8642 ( 34568 Kb)
PagedPool 1 Usage: 1984 ( 7936 Kb)
PagedPool 2 Usage: 1950 ( 7800 Kb)
PagedPool 3 Usage: 1975 ( 7900 Kb)
PagedPool 4 Usage: 1969 ( 7876 Kb)
PagedPool Usage: 16520 ( 66080 Kb)
PagedPool Maximum: 90624 ( 362496 Kb)
Shared Commit: 3536 ( 14144 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 13363 ( 53452 Kb)
PagedPool Commit: 16525 ( 66100 Kb)
Driver Commit: 3678 ( 14712 Kb)
Committed pages: 613423 ( 2453692 Kb)
Commit limit: 1557348 ( 6229392 Kb)

Total Private: 575769 ( 2303076 Kb)
1424 Myapp1.exe 497908 ( 1991632 Kb)
0830 RaidServ.exe 33115 ( 132460 Kb)
03a0 svchost.exe 4336 ( 17344 Kb)
2460 npapmagent.exe 3993 ( 15972 Kb)
0208 services.exe 2556 ( 10224 Kb)
0214 lsass.exe 2468 ( 9872 Kb)
0cb8 explorer.exe 2120 ( 8480 Kb)
01d8 winlogon.exe 2076 ( 8304 Kb)
0dac MyApp3.exe 2015 ( 8060 Kb)
145c MyApp2.exe 1776 ( 7104 Kb)
0888 explorer.exe 1655 ( 6620 Kb)
0690 inetinfo.exe 1575 ( 6300 Kb)
07a4 OmniAgent.exe 1500 ( 6000 Kb)
06ac miniwinagent.ex 1471 ( 5884 Kb)
02f0 winlogon.exe 1461 ( 5844 Kb)
0d64 winlogon.exe 1460 ( 5840 Kb)
0e04 wmiprvse.exe 1301 ( 5204 Kb)
036c svchost.exe 1018 ( 4072 Kb)
056c spoolsv.exe 983 ( 3932 Kb)
037c svchost.exe 971 ( 3884 Kb)
0ed0 nspupsvc.exe 909 ( 3636 Kb)
08c0 svchost.exe 702 ( 2808 Kb)
072c bpinetd.exe 643 ( 2572 Kb)
….

가상메모리의 사용정도를 보면 MyApp라는 녀석이 거의 2G가정도의 메모리를 사용하고 있군요. 이러한 이유에서 BOSD가 발생한 것을 추정할 수 있습니다.

1: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 8b7707a8 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
DirBase: bff50020 ObjectTable: e1002e40 HandleCount: 1978.
Image: System
….
PROCESS 8863a368 SessionId: 0 Cid: 1424 Peb: 7ffdc000 ParentCid: 0208
DirBase: bff553a0 ObjectTable: e34f4858 HandleCount: 9402.
Image: MyApp1.exe
….

1: kd> !process 8b7707a8 1
PROCESS 8b7707a8 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
DirBase: bff50020 ObjectTable: e1002e40 HandleCount: 1978.
Image: System
VadRoot 89d9ece0 Vads 4 Clone 0 Private 3. Modified 462061. Locked 0.
DeviceMap e1000930
Token e1003b08
ElapsedTime 13 Days 08:28:50.593
UserTime 00:00:00.000
KernelTime 07:18:55.781
QuotaPoolUsage[PagedPool] 0
QuotaPoolUsage[NonPagedPool] 0
Working Set Sizes (now,min,max) (90, 0, 345) (360KB, 0KB, 1380KB)
PeakWorkingSetSize 545
VirtualSize 2 Mb
PeakVirtualSize 7 Mb
PageFaultCount 18702
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 9

1: kd> !process 8863a368 3
PROCESS 8863a368 SessionId: 0 Cid: 1424 Peb: 7ffdc000 ParentCid: 0208
DirBase: bff553a0 ObjectTable: e34f4858 HandleCount: 9402.
Image: MyApp1.exe
VadRoot 89a863b8 Vads 805 Clone 0 Private 497112. Modified 5253893. Locked 0.
DeviceMap e1000930
Token e482fde0
ElapsedTime 1 Day 20:22:10.781
UserTime 01:04:02.750
KernelTime 00:02:08.546
QuotaPoolUsage[PagedPool] 169356
QuotaPoolUsage[NonPagedPool] 32984
Working Set Sizes (now,min,max) (213772, 50, 345) (855088KB, 200KB, 1380KB)
PeakWorkingSetSize 247656
VirtualSize 2031 Mb
PeakVirtualSize 2031 Mb
PageFaultCount 26778342
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 497908

THREAD 892e2450 Cid 1424.1168 Teb: 7ffdf000 Win32Thread: e37a7460 RUNNING on processor 1
THREAD 880d9db0 Cid 1424.1050 Teb: 7ffdd000 Win32Thread: e14c2ae0 WAIT: (Unknown) UserMode Non-Alertable
8b2bdcb8 NotificationEvent
880d9e28 NotificationTimer

THREAD 88e50238 Cid 1424.1518 Teb: 7ffdb000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
89714640 SynchronizationEvent

THREAD 881b9db0 Cid 1424.0ed4 Teb: 7ffda000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
8aeadb70 SynchronizationEvent

THREAD 88b40480 Cid 1424.1020 Teb: 7ffd9000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
898c4db0 SynchronizationEvent

THREAD 882d7db0 Cid 1424.1304 Teb: 7ffd8000 Win32Thread: e6bfa478 WAIT: (Unknown) UserMode Non-Alertable
89ad5084 NotificationEvent

THREAD 882d3db0 Cid 1424.0fc4 Teb: 7ffd7000 Win32Thread: e340d5b8 WAIT: (Unknown) UserMode Non-Alertable
8b2accf0 NotificationEvent
882d3e28 NotificationTimer

THREAD 89cf83f8 Cid 1424.1798 Teb: 7ffd6000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
897a4448 NotificationEvent
89cf8470 NotificationTimer

THREAD 88d3fdb0 Cid 1424.10ec Teb: 7ff95000 Win32Thread: e523e660 WAIT: (Unknown) UserMode Non-Alertable
88a68fec NotificationEvent

THREAD 87e69238 Cid 1424.1714 Teb: 7ff94000 Win32Thread: e4899c40 WAIT: (Unknown) UserMode Non-Alertable
89bbd020 SynchronizationEvent

THREAD 899a1510 Cid 1424.28b4 Teb: 7ffd3000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
899e1148 Semaphore Limit 0×7fffffff
899a1588 NotificationTimer

THREAD 89000b18 Cid 1424.2af4 Teb: 7ffde000 Win32Thread: e287a360 WAIT: (Unknown) UserMode Non-Alertable
8855b238 SynchronizationEvent

THREAD 89b53b90 Cid 1424.26a4 Teb: 7ffd5000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
885ba8b8 SynchronizationEvent

THREAD 88eaa020 Cid 1424.244c Teb: 7ffd4000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
899e1148 Semaphore Limit 0×7fffffff
88eaa098 NotificationTimer

이 시스템은 13 Days 08:28:50.593동안 실행 중이고 이 프로세스는 1 Day 20:22:10.781 동안 실행 중이 었군요. 꾀 긴시간동안 조금씩 Memory Leak이 발생한 것 같습니다. 기본적으로 실행만 되어 있다면 별다른 동작이 없는데 과연 하루가 넘는 시간동안 무슨 동작을 하고 있었던 것일까 ?? 현재 로드된 Driver들을 보기위해서 확인해 보았습니다 .

1: kd> lm
start end module name
80800000 80a56000 nt (pdb symbols) d:\websymbols\ntkrpamp.pdb\52A81696FB5B432AB0B1B46297F104D31\ntkrpamp.pdb
80a56000 80a82000 hal (pdb symbols) d:\websymbols\halmacpi.pdb\F799635F9A7C45BDBF439633754B99B61\halmacpi.pdb
b8a40000 b8a6a000 Fastfat (deferred)
b91ae000 b91d9000 RDPWD (deferred)
b9441000 b944c000 TDTCP (deferred)
b9701000 b975e000 srv (deferred)
b9b76000 b9b83e80 MyDriver3 (deferred)
b9fa6000 b9fb8000 dump_arcsas (deferred)
b9fb8000 b9fcd000 Cdfs (deferred)
ba025000 ba02f000 ndisuio (deferred)
ba055000 ba05f300 MyDriver4 (deferred)
ba095000 ba0b3000 EraserUtilRebootDrv (deferred)
ba113000 ba124000 Fips (deferred)
ba124000 ba19a000 mrxsmb (deferred)
ba19a000 ba1ca000 rdbss (deferred)
ba22c000 ba256000 afd (deferred)
ba256000 ba287000 netbt (deferred)
ba287000 ba317000 tcpip (deferred)
ba33f000 ba358000 ipsec (deferred)
ba3c8000 ba3da000 i8042prt (deferred)
ba4da000 ba4fc000 SYMEVENT (deferred)
….
Unloaded modules:
ba398000 ba39e000 MyDriver2.sys
ba7ee000 ba7f4000 MyDriver2.sys
f7807000 f780d000 MyDriver2.sys
ba3a0000 ba3a6000 MyDriver2.sys
ba368000 ba36e000 MyDriver2.sys
ba806000 ba80c000 MyDriver2.sys
f77cf000 f77d5000 MyDriver2.sys
ba3a0000 ba3a6000 MyDriver2.sys
ba80e000 ba814000 MyDriver2.sys
ba7ce000 ba7d4000 MyDriver2.sys
ba7d6000 ba7dc000 MyDriver2.sys
f77bf000 f77c5000 MyDriver2.sys
ba380000 ba386000 MyDriver2.sys
ba80e000 ba814000 MyDriver2.sys
ba368000 ba36e000 MyDriver2.sys
ba398000 ba39e000 MyDriver2.sys
ba7ce000 ba7d4000 MyDriver2.sys
ba7fe000 ba804000 MyDriver2.sys
ba370000 ba376000 MyDriver2.sys
ba7de000 ba7e4000 MyDriver2.sys
ba368000 ba36e000 MyDriver2.sys
ba7de000 ba7e4000 MyDriver2.sys
ba360000 ba366000 MyDriver2.sys
f779f000 f77a5000 MyDriver2.sys
f77bf000 f77c5000 MyDriver2.sys
ba398000 ba39e000 MyDriver2.sys
f77cf000 f77d5000 MyDriver2.sys
f780f000 f7815000 MyDriver2.sys
ba380000 ba386000 MyDriver2.sys
f779f000 f77a5000 MyDriver2.sys
ba398000 ba39e000 MyDriver2.sys
ba368000 ba36e000 MyDriver2.sys
ba7ee000 ba7f4000 MyDriver2.sys
ba7d6000 ba7dc000 MyDriver2.sys
f77f7000 f77fd000 MyDriver2.sys
ba7ee000 ba7f4000 MyDriver2.sys
ba7ce000 ba7d4000 MyDriver2.sys
ba380000 ba386000 MyDriver2.sys
f780f000 f7815000 MyDriver2.sys
ba370000 ba376000 MyDriver2.sys
f77c7000 f77cd000 MyDriver2.sys
f779f000 f77a5000 MyDriver2.sys
ba806000 ba80c000 MyDriver2.sys
ba388000 ba38e000 MyDriver2.sys
f77cf000 f77d5000 MyDriver2.sys
ba7ce000 ba7d4000 MyDriver2.sys
ba388000 ba38e000 MyDriver2.sys
f77f7000 f77fd000 MyDriver2.sys
ba398000 ba39e000 MyDriver2.sys
ba7fe000 ba804000 MyDriver2.sys

이곳에서 조금 이상한 것을 확인할 수 있내요. MyDriver2.sys라는 녀석이 계속적으로 Load와 Unload를 반복했다는것… 시스펨이 13일동안 실행중이 더라도 프로그램 특성 상 Load와 Unload를 50번이상 반복하지는 않습니다. 그렇다면 한번 실행 될때 여러번의 Load와 Unload를 반복했다는 이야기가 되는군요.( 프로그램에 이 과정을 거치는 코드는 딱 한곳있음 ) 이 과정에서 꽤 큰사이즈의 Memory Leak이 발생한다면 Virtual Memory 부족으로 위에서 보여진 Exception이 재현가능하겠군요. ( 조금 운이 따라야하겠지만… )

실제로 확인해 본 결과 다행이 이 부분에서 꽤 큰 사이즈의 Memory Leak이 발생하더군요 .

[debugging tip] Windows 2000 Section Handle Leak

Windows 2000의 경우 Windbg에서 제공하는 !htrace Extension을 사용할 수 없기에 Handle Leak를 검출하기는데 어려움이 있지요. 특히나 Process Explorer와 같은 툴에서 Handle Data는 증가하지 않는데 Handle Count만 증가 하고 있다면 더더욱 힘든 상황입니다.  이러한 경우라면 nonamed Section Handle의 존재 여부를 확인해 보면 도움이 됩니다. Section Handle의 경우 각각의 Name Value를 가지는게 보통이지만 간혹 Leak을 발생시키는 경우라면 Name이 없는 경우가 있습니다.
0:002> !handle
Handle 4
Type Section
Handle 8
Type Event
Handle c
Type Event
Handle 10
Type Event
Handle 14
Type Directory
Handle 18
Type File
Handle 1c
Type Directory
Handle 20
Type Event
Handle 24
Type Port
Handle 28
Type Mutant
Handle 2c
Type Event
Handle 30
Type Section
... 중략

Handle 2464
Type Section
Handle 2468
Type Section
Handle 246c
Type Section
Handle 2470
Type Section
Handle 2474
Type Section
Handle 2478
Type Section
Handle 247c
Type Section
Handle 2480
Type Section
Handle 2484
Type Section
Handle 2488
Type Section
Handle 248c
Type Section
Handle 2490
Type Section
Handle 2494
Type Section
Handle 2498
Type Section
Handle 249c
Type Section
Handle 24a0
Type Section
Handle 24a4
Type Section
Handle 24a8
Type Section
Handle 24ac
Type Section
Handle 24b0
Type Section
Handle 24b4
Type Section
Handle 24b8
Type Section
Handle 24bc
Type Section
Handle 24c0
Type Section
Handle 24c4
Type Section
Handle 24c8
Type Section
Handle 24cc
Type Section
Handle 24d0
Type Section
Handle 24d4
Type Section
Handle 24d8
Type Section
Handle 24dc
Type Section
Handle 24e0
Type Section
Handle 24e4
Type Section
Handle 24e8
Type Section
Handle 24ec
Type Section
Handle 24f0
Type Section
Handle 24f4
Type Section
Handle 24f8
Type Section
Handle 24fc
Type Section
Handle 2500
Type Section
Handle 2504
Type Section
Handle 2508
Type Section
Handle 250c
Type Section
Handle 2510
Type Section
Handle 2514
Type Section
Handle 2518
Type Section
2374 Handles
Type Count
Event 24
Section 2287
File 40
Port 1
Directory 3
Mutant 5
WindowStation 2
Semaphore 1
Key 6
Thread 2
Desktop 1
IoCompletion 2

0:002> !handle 8c ff
Handle 8c
Type File
Attributes 0
GrantedAccess 0x12019f:
ReadControl,Synch
Read/List,Write/Add,Append/SubDir/CreatePipe,
ReadEA,WriteEA,ReadAttr,WriteAttr
HandleCount 2
PointerCount 4
0:002> !handle 88 ff
Handle 88
Type Section
Attributes 0
GrantedAccess 0xf0007:
Delete,ReadControl,WriteDac,WriteOwner
Query,MapWrite,MapRead
HandleCount 2
PointerCount 4
Name \BaseNamedObjects\AtlDebugAllocator_FileMappingNameStatic3_22c
Object Specific Information
Section base address 0
Section attributes 0×4000000
Section max size 0×400000
0:002> !handle 2518 ff
Handle 2518
Type Section
Attributes 0
GrantedAccess 0xf0007:
Delete,ReadControl,WriteDac,WriteOwner
Query,MapWrite,MapRead
HandleCount 2
PointerCount 3
Name <none>
Object Specific Information
Section base address 0
Section attributes 0×8000000
Section max size 0×7000

Section Handle에는 보통 name이 있는게 정상이지만 Leak이 발생하고 있는 Handle에는 Name이 없더군요. 이러한 경우는 Windows 2000에서 WTS* API를 사용할 경우 주로 발생하고 있습니다.  혹시나 비슷한 문제가 있다면 한번 의심해보시길…