Local Hollowing

Introduction

During Red Team exercises, once initial access has been gained, one of the first obstacles is static detection. Deploying tools like Mimikatz to the disk is immediately detected by EDR because the signatures for these tools have long been known to EDR systems.

Local Hollowing allows you to bypass this problem. The principle: a loader embeds the malicious PE (Portable Executable) encrypted with AES-256, decrypts it in memory at runtime, manually maps it to replace the Windows loader, and then redirects its own main thread to execute it. From the OS’s perspective, only the loader process is running. But in reality, it is Mimikatz that is executing.

This article details how this technique works step by step.

The diagram below summarizes the complete Local Hollowing flow. Each step will then be detailed individually later in the article.

Background: Why is standard loading problematic?

When a PE is executed via CreateProcess, Windows follows a well-defined path. The file is first scanned on disk by the antivirus. Next, the kernel triggers a Process Creation Callback (PspCreateProcessNotifyRoutine) that notifies every registered driver, including the EDR driver. The Windows loader then maps the PE into memory, triggering an Image Load Callback (PspLoadImageNotifyRoutine). Finally, to complete the process, the EDR injects its DLL into the new process and sets inline hooks on sensitive functions in ntdll.dll.

Each of these checkpoints presents a detection opportunity. Local Hollowing bypasses the first and most aggressive one: the static scan on disk.

Local Hollowing vs. Process Hollowing

Before delving into the details, it is important to distinguish Local Hollowing from classic Process Hollowing, as the two techniques are often confused.

Process Hollowing creates a legitimate remote process (e.g., svchost.exe) in a suspended state using CreateProcess(..., CREATE_SUSPENDED), clears its memory image with NtUnmapViewOfSection, and then injects the malicious PE via VirtualAllocEx and WriteProcessMemory. The problem is that this method is now widely detected: the creation of a suspended process followed by cross-process memory writing is a classic indicator that EDRs systematically correlate.

Local Hollowing, on the other hand, does everything within its own process. No remote process, no cross-process WriteProcessMemory, no NtUnmapViewOfSection. Local memory operations (VirtualAlloc, memcpy) are much more commonplace and are detected far less frequently.

4 problems to solve

Once the PE has been decrypted in memory, it is not directly executable due to 4 problems.

Disk layout and memory layout

On disk, the PE sections (.text, .rdata, .data, .reloc) are aligned according to FileAlignment, typically 512 bytes (0x200). In memory, they must be aligned according to SectionAlignment, typically 4096 bytes (0x1000), which is the size of a memory page.

This alignment is enforced by the processor’s MMU (Memory Management Unit): memory protections are applied on a per-page basis. The .text section must be set to RX (Read-Execute), .data to RW (Read-Write), and .rdata to R (Read-Only). To apply different protections to each section, each must start on a new memory page.

The decrypted PE has the disk layout, meaning the sections are compact and contiguous; therefore, it must be remapped to obtain the memory layout.

Absolute addresses are incorrect

During compilation, the linker does not know where the PE will be loaded into memory. It therefore assumes that the PE will be loaded at the address defined by the ImageBase field of the Optional Header (typically 0x140000000 for an x64 EXE). All absolute addresses in machine instructions are calculated based on this value.

For example, if a global variable is located at RVA (Relative Virtual Address) 0x3000, the compiler generates:

1	mov rax, [0x140003000] ; ImageBase (0x140000000) + RVA (0x3000)

When the loader allocates memory with VirtualAlloc(NULL, ...), the returned address is often different from ImageBase. If VirtualAlloc returns 0x1F0000000, the instruction above points to 0x140003000, which is no longer valid. The process crashes.

The PE’s .reloc table contains a list of all offsets where the compiler wrote an absolute address. The loader traverses this table and corrects each address by adding the delta (actual_address - ImageBase):

Delta = 0x1F0000000 - 0x140000000 = 0xB0000000

Avant correction : mov rax, [0x140003000]
Après correction  : mov rax, [0x1F0003000]    ; 0x140003000 + 0xB0000000

Imports are not resolved

The PE imports functions from external DLLs (kernel32.dll, advapi32.dll, etc.). On disk, the Import Address Table (IAT) contains only the names of these functions, not their actual addresses. This is because these addresses change with every boot due to ASLR, so they cannot be hardcoded into the file. Normally, the Windows loader resolves these addresses when the process loads. In our case, since we are replacing this loader, it is up to the worker thread to traverse the Import Directory, load each DLL with LoadLibraryA, resolve each function with GetProcAddress, and write the actual addresses to the IAT.

![[../../../Pasted image 20260421190246.png]

Execution Must Be Stealthy

Creating a new thread to execute the mapped PE is possible but easily detectable. EDR checks the start address of each thread created via the Thread Creation Callback: if this address points to a dynamically allocated private RWX memory region rather than to a legitimate module, it is a classic indicator of code injection.

The solution is to reuse the loader’s main thread by modifying its instruction pointer register (RIP) while it is suspended. Its start address remains clean since it points to the loader’s entry point, which is a legitimate executable present on disk.

Detailed Implementation

Step 1: Suspending the Main Thread

The main thread must be suspended so that its registers can be modified in the final step. However, a suspended thread cannot perform the decryption and mapping tasks. This is one of the reasons why we create a worker thread to take over:

An important point about DuplicateHandle: the pseudo-handle returned by GetCurrentThread() is always -1 and is resolved dynamically by the kernel. It references the calling thread. Thus, if the worker thread uses it to suspend the main thread, it will suspend itself. The real handle is a fixed identifier that always references the main thread, regardless of which thread is using it.

Step 2: Decryption and Validation of the PE

Once the worker thread is launched, its first task is to decrypt the payload. The AES-256-encrypted buffer can be embedded directly in the loader (in its .data or .rsrc section) or downloaded from a remote server at runtime.

After decryption, the buffer contains the PE in plaintext. Before proceeding further, it may be wise to validate its structure:

Step 3: Allocation and Mapping of Sections

After decrypting the PE, we allocate a memory region of size SizeOfImage. This field in the Optional Header represents the total size of the PE with memory alignment, which, as explained earlier, is larger than the file size on disk.

Next, we copy the PE headers into the allocated area, then each section to its correct memory location:

PointerToRawData indicates where the section is located in the file (disk layout). VirtualAddress indicates where it should be in memory (memory layout).

Finally, we need to perform the relocations. We calculate the delta between the actual allocation address and the PE’s ImageBase:

The .reloc table is organized into blocks. Each block covers a memory page (identified by VirtualAddress) and contains a list of 16-bit entries. The 4 most significant bits indicate the type of relocation, and the remaining 12 bits specify the offset within the page. For each entry, the loader reads the current value at that location and adds the delta.

Step 4: Resolving imports

The loader will traverse the Import Directory, which is an array of IMAGE_IMPORT_DESCRIPTOR structures with one entry per imported DLL. Each structure contains two important fields:

OriginalFirstThunk points to the Import Lookup Table (ILT) and contains the function names. This table never changes
FirstThunk points to the IAT. On disk, it also contains the names, but in memory these names are replaced by the actual addresses.

After this step, each entry in the IAT contains a resolved function pointer. For example, when Mimikatz executes call [IAT_OpenProcess], it will hit the actual address of OpenProcess in kernel32.dll.

Step 5: Redirecting the main thread

The PE is mapped, the addresses are corrected, and the imports are resolved. All that remains is to redirect the main thread, which has been suspended since Step 1, to the PE’s entry point:

AddressOfEntryPoint is an RVA that is added to baseAddr to obtain the absolute memory address. When ResumeThread is called, the main thread resumes execution. But instead of continuing with the loader’s code, it executes Mimikatz’s main().

Protecting the Loader Itself

At this stage, the payload is protected by AES-256 encryption: it never appears in plaintext on the disk. But the loader itself is a normally compiled executable. Its code contains calls to VirtualAlloc, CryptDecrypt, GetProcAddress, SetThreadContext, and other functions characteristic of a PE loader. An antivirus can create a signature based on these patterns and detect the loader before it even executes.

To counter this, we compile the loader with OLLVM, which applies four transformations to the generated machine code:

Substitution: replaces simple arithmetic operations with equivalent complex sequences.
Control Flow Flattening: flattens control structures (if/else, loops) into a switch statement within an infinite loop.
Bogus Control Flow: injects fake execution paths with opaque conditions.
Splitting: splits the base blocks into smaller fragments.

Compilation flags used:

1
2
3

-mllvm -sub -mllvm -split -mllvm -fla -mllvm -bcf (simple obfuscation)

-mllvm -sub -mllvm -sub_loop=3 -mllvm -split -mllvm -split_num=3 -mllvm -fla -mllvm -bcf -mllvm -bcf_loop=3 -mllvm -bcf_prob=100 (heavy obfuscation)

OLLVM applies to the loader (written in C) and not to the payload. Mimikatz is written in C++ and is not compatible with OLLVM. This is not a problem: the payload is protected by AES-256 encryption, and the loader is protected by OLLVM obfuscation.

To illustrate the power of OLLVM, here is the control flow graph of a simple Hello World in Ghidra. The version compiled with MSVC fits into a single linear block of about twenty instructions:

After compilation with OLLVM using heavy obfuscation, the same function produces a graph consisting of several hundred interconnected blocks. The instruction patterns on which the EDR’s static signatures are based are completely destroyed. The generated machine code bears no resemblance to the original, yet produces the same result at runtime:

Limitations of the Technique

Local Hollowing eliminates static on-disk detection, which is the most aggressive defense layer of an EDR. The malicious PE was never written in plaintext, and the Windows loader is never invoked to load the binary.

However, several defense mechanisms remain active:

Kernel Callbacks: The Process Creation Callback triggers for the loader, the Image Load Callbacks trigger for each loaded DLL, and the Thread Creation Callback triggers for the worker thread, allowing the EDR to monitor why these actions are being performed.
Inline hooks: Functions in ntdll.dll remain hooked by the EDR. When Mimikatz calls NtOpenProcess, it goes through the hook. Bypassing this requires techniques such as Direct Syscall or loading a fresh copy of ntdll.dll.
ETW: ETW providers continue to log events. The Microsoft-Windows-Threat-Intelligence (ETWTi) provider, protected at the kernel level, monitors suspicious memory operations.
Periodic memory scan: Defender can scan process memory regions and find Mimikatz signatures in plaintext within the RWX zone.
Behavioral detection: Allocating a large RWX region followed by massive dynamic function resolution is a pattern that EDR can correlate.

Local Hollowing is a building block in the evasion chain, not a complete solution. Each remaining defense layer requires its own bypass technique.