https://blog.codinghorror.com/understanding-user-and-kernel-mode/

JamieF

This article should be marked with a caveat that states that most of what it says applies only to .NET programs running on Windows on x86 hardware.

Exception functionality is not implemented uniformly across all programming languages on all operating systems. “Exceptions imply kernel mode transitions” may be true for C# on .NET on currently shipping Windows, but it’s certainly not true for all C++ implementations, nor for all Java implementations.

As a thought experiment, try making a list of every programming language that you can use for older operating systems which don’t utilize user mode, but simply run everything in the same privilege level as the kernel. Classic Mac OS on the 68000 CPU is a good example. How could exceptions imply a kernel mode transition then? (I’m not certain but I believe MS-DOS and older versions of Palm OS are additional examples of OS’s that lacked a distinction between user mode and supervisor / ring 0 privilege levels.)

There is some controversy as to whether exceptions are a bad idea for code clarity in general. I use them in cases of “this shouldn’t happen unless something is seriously wrong”, but I do not avoid them altogether. To boycott a programming language feature altogether just because you read some blog post saying it was slow is just a case of premature optimization. Prove that it’s slow for your architecture, or that it makes your code harder to maintain; otherwise you’re just a victim of (self-inflicted?) FUD.

Jan '08

gogole

@Jonathan Holland
BTW where/how did you get the reflection of TryParse() .

Disregard that comment.i figured it out myself (googled “reflector”).that tiny programme is a great tool.i’ll have alot of fun exploring .net libraries!thanks

Jan '08

Bernard

Jamie Flournoy -
“This article should be marked with a caveat that states that most of what it says applies only to .NET programs running on Windows on x86 hardware.”

Hey, the entire blog should marked with that caveat.

Jan '08

jldugger

Actually, I’ve heard some places used TUX to serve out static advertising landing pages for URLs they were sitting on, and I guess between the falling cost of hardware and the failing interest of kernel devs to touch it, TUX fell out of interest.

But this nonsense about software exception handling needs explanation. Between the intended audience (programmers who somehow don’t understand userspace / kernel space divisions) and the very light detail given, I can only assume for the moment that the statement about implying kernel transitions is bwrong/b.

Jan '08

codinghorror

For the many “exceptions don’t cause a transition to kernel mode” commenters, I refer you to Chris Brumme’s post, already linked by Mike Dimmick (thanks Mike)

51524.aspx

Consider some of the things that happen when you throw an exception:

Grab a stack trace by interpreting metadata emitted by the compiler to guide our stack unwind.
Run through a chain of handlers up the stack, calling each handler twice.
Compensate for mismatches between SEH, C++ and managed exceptions.
Allocate a managed Exception instance and run its constructor. Most likely, this involves looking up resources for the various error messages.
Probably take a trip through the OS kernel. Often take a hardware exception.
Notify any attached debuggers, profilers, vectored exception handlers and other interested parties.
==

Operative words being “probably take a trip through the OS kernel.”

Jan '08

Carra

I’ll remember to stop throwing exceptions from now on!

Jan '08

codinghorror

Exceptions are OK, just don’t throw them in giant loops, control structures, or anywhere that performance is extra-super-critical.

Jan '08

AndyA

“Exceptions imply kernel mode transitions.”

In what sense? Most high level languages handle their exceptions completely in userspace. What does a language exception have to do with the kernel?

Obviously bus errors c involve the kernel - but they’re not the same as language exceptions.

Jan '08

JohnS

Outside the kernel caching, I think you’re wrong on the perf impact of HTTP.SYS (and its ironic given what your article is about):

“Using HTTP.sys and the new WWW service architecture provides the following benefits:
…
• Requests are processed faster because they are routed directly from the kernel to the appropriate user-mode worker process instead of being routed between two user-mode processes.”

http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/a2a45c42-38bc-464c-a097-d7a202092a54.mspx?mfr=true

Jan '08

Rik_Hemsley

So… you think that exceptions in .NET involve the CPU executing in kernel mode? WTF?

Jan '08

Josue_Gomes

“Exceptions imply kernel mode transitions.”

You’re talking about OS exceptions, right?

Jan '08

gogole

if you can circumvent the use exceptions why not go ahead : for example in c# the TryParse () method is a good exception work-around .i have always had problems with exceptions because they are costly .i would say use them only when there is no other option or circumvention available.And throwing exceptions in a loop should be avoided at all times ,is it better to create a custom exception method and call it when needed.

Jan '08

M1147

Obviously, Jeff, you are confused by the identical name.

Hardware/CPU/OS/kernel exceptions are exceptions generated by the hardware and dealt with by the kernel, drivers, etc.

Language exception are a different mechanism altogether, that shares the same name, but is normally unrelated and is normally handled completely in user mode by the language VM or run time support library and the compiler. The reasons they are slow is basically their asynchronous nature which requires tracking of object lifetimes, and complicated stack unwinding (at least in C++).

An exception (heh) to this is the Structured Exception Handling mechanism invented and used by MS, which is maps certain kernel exceptions to software exceptions.

Jan '08

AndyA

Please don’t lets cargo-cult the idea that exceptions are /always/ bad. As with any other technique, if performance is critical you should benchmark.

For language exceptions the performance costs break down into two broad categories:

The cost of setting things up so it’s possible to throw an exception
The cost of actually throwing an exception.

The first cost may be unavoidable - it depends on the language. For example some C++ implementations generate slightly faster function entry code if the compiler knows that it won’t subsequently have to unwind the stack due to an exception. Higher level languages tend to incur the ‘being able to handle an exception’ cost whether you actually throw an exception or not - and in general that cost will be small compared to whatever else the language is doing.

Actually throwing an exception will also take some time - how much depends on the implementation.

Measure then decide. Don’t cargo cult.

Jan '08

XPav

Jeff,

I’m not buying it. A one liner in a blog post that says “Probably take a trip through the OS kernel. Often take a hardware exception.” isn’t enough proof to say why exceptions are s o slow, especially when the rest of the things that definitely happen explain most all of the performance issues with exceptions.

Yeah, you shouldn’t use exceptions in performance intensive code, that we know. Perhaps you should trace the entire execution of a few types of exceptions (C#, C++, SEH, hardware) to actually see what happens.

Jan '08

gogole

So… you think that exceptions in .NET involve the CPU executing in kernel mode?

i’m not sure but it seems reasonable enough to be true ,possibly some of the .Net libraries are used in low level OS operations.Is this true?

Jan '08

SergeW

This is what we extra-crashy-code-writing programmers like to call “progress”.

Haha! And to celebrate progress, I’ll write a couple (more) crashing bugs this afternoon

Jan '08

TraumaPony

I beleive he’s talking about Win32 exceptions.

Jan '08

Doug

Jeff, just to be clear: not all language exceptions require user/kernel mode transitions. There are three somewhat confusing ideas of “exceptions” here:

-Exceptions are a language feature in certain programming languages such as C++, C#, Java, and many scripting languages, for transferring control and triggering automatic stack unwinding. There’s no need for the kernel to get involved in a thrown exception caught by a try/catch block in general; the program can just save some information and jump to the handling code. The slowness of throwing and handling exceptions is generally due to implementations that trade off speed of entering try/catch/finally block with the speed of looking for exception handlers and handling an exception.

-Exceptions (more generally, structured exception handling or SEH) are a Windows operating system feature. They allow structures mirroring some languages’ try/catch or try/finally blocks to be used to handle things like page faults or memory access violations, which are raised from kernel mode, and also to handle application-raised conditions if desired. The exception handling blocks can be nested on the stack as deeply as you like. It’s not necessary to implement language exceptions using SEH, but you can. Visual C++ does. Visual C++ and the .NET CLR also translate certain kernel-raised SEH exceptions into language exceptions that can be detected and handled by try/catch blocks. For example, the .NET NullReferenceException is sometimes raised like this. Obviously, kernel-raised exceptions originate from kernel mode and need a mode transition, but user-raised SEH exceptions don’t: see http://www.nynaeve.net/?p=201 for an explanation of how this all works. Other operating systems use different mechanisms to communicate back to user mode; for example, Unix-like systems use “signals” for this purpose.

-Exceptions are the name for certain conditions detected by the processor, such as executing an illegal instruction, dividing by zero, or accessing memory “illegally”. A “machine check exception” is another example of this, where the CPU detects a hardware error. These cause the processor to immediately switch to kernel mode and run a piece of kernel code to handle the situation. In Windows, they may eventually result in an SEH exception being raised to user mode. There’s no idea of nested scopes or anything here; the processor just saves the address where the exception was triggered, and starts running kernel-mode code.

As you can see, the last category is the only one that requires an avoidable user/kernel mode transition, but it can be translated all the way back into the first category of exception.

Jan '08

MikeD

Actually, .NET exceptions are implemented in terms of SEH exceptions, which means that they do go on a trip through kernel mode. See http://blogs.msdn.com/cbrumme/archive/2003/10/01/51524.aspx for Chris Brumme’s explanation.

And it’s not fair to say that ‘most’ drivers are moving to user mode. Things like fixed disk drivers are still fundamental to system operation and will stay in kernel mode. But USB devices (e.g. Windows Mobile device sync) are starting to move to user mode, and audio devices did in Windows Vista.

Jan '08

DomagojK

Throwing exceptions in application code has nothing to do with processor rings. It’s all the same ring anyway.

First and foremost, throwing exceptions is software architecture decision. In general, you throw them when something goes wrong, which is a point where you don’t really care if it takes a bit longer to display error message to the user.

That said, exceptions in .NET are slow as hell, and that might be why .NET programmers tend to use them less than, say, Java programmers. I once saw a speed benchmark comparing throwing exceptions in different languages, and .NET was 5 or 6 times slower than all other languages, including C++ and Java.

Jan '08

f0dder

rim’s example will make your process run with SYSTEM credentials and will thus give you complete control of the system (SYSTEM has more privileges than any Administrator), but it doesn’t mean you’ll run in kernel mode. Wouldn’t even work, as Kernel/User mode are pretty different environments (you can’t do Win32 API calls from kernel mode, for instance).

Jeff: exception handling doesn’t necessarily take a detour through the kernel, it depends on your programming environment as well as stuff that’s project-specific. For C++ code, it typically won’t detour through the kernel, try tracing code in a debugger. Notice that kernel32.dll != kernel-mode code.

And throwing an exception in a programming language should never involve any hardware exceptions…

Jan '08

f0dder

Note: it seems the Visual C++ runtimes do call kernel32.RaiseException wich calls ntdll.RtlRaiseException which calls ntdll.ZwRaiseException, which eventually does do a switch to ring0.

Don’t have a copy of GCC installed on windows right now, but I think they’re using a different route without depending on OS support?

Jan '08

Daniel

If anyone is wondering,

View Show Kernel Times

in the Performance tab does the trick.

Jan '08

Qvasi

What has throwing exceptions got to do with kernel vs user mode? Switching to kernel mode is done by a “trap” call (wich is strictly speaking an interupt that is handled by a kernel process/thread), using the terminology I was thaught in “Operating Systems” at school. A runtime exception in a highlevel language may be a trap call or an interrupt handled by some error handling code in the OS or interpreter when trying to read/write to protected memory etc. but an “exception” doesn’t need to involve kernel traps (I think)…

Jan '08

Nikos

“User mode is clearly a net public good, but it comes at a cost. Transitioning between User and Kernel mode is expensive. Really expensive.”

Not always!
Tip: Windows GUI methods are executed in kernel mode!

Jan '08

dan28

Most drivers are shunted to the User side of the fence these days,
with the notable exception of video card drivers, which need
bare-knuckle Kernel mode performance. But even that is changing; in
Windows Vista, video drivers are segmented into User and Kernel
sections. Perhaps that’s why gamers complain that Vista performs about
10 percent slower in games.

A good guess might be that the “real” driver is running in kernel mode, while DRM shenanigans tied to the driver are taking place in user mode.

Jan '08

Jaster

“Windows GUI methods are executed in kernel mode!”

Which is why crashing the GUI can crash Windows …

Unix always seems to have had the policy of everything user mode unless absolutely required

Windows seems to have had the policy of … well none at all really user/kernel/whatever if it’s windows itself …?

Jan '08

Nikos

Unix is another world. There are good structures and good communications across APIs Modules.

Windows are full of patches and hacks!

Jan '08

mjmcinto

“his is what we extra-crashy-code-writing programmers like to call “progress””

Hey now, I never write crashy or buggy code…never!

Jan '08

JoshS

Good post, Jeff! Me too love user mode.

Jan '08

Dave

In User mode, the executing code has no ability to directly access
hardware or reference memory. Code running in user mode must delegate
to system APIs to access hardware or memory.

I can’t read memory without making an API call?

Jan '08

Catto

Hey Now Jeff,
I just learned some info about CPU modes (Kernel User).
Coding Horror Fan,
Catto

Jan '08

KashifS

The problem with your article is that you imply that kernel mode is faster than user space because it has direct-access to hardware. This is wrong.

Sometimes PCI hardware can directly-access a memory region mapped to user process memory. Sometimes a user process can write directly to a memory region that is mapped to a PCI device’s onboard memory (i.e. video memory). In such cases the kernel is bypassed and is only used to setup the mappings or issue commands to start an I/O transfer.

For instance under UNIX, if your user process bypasses the kernel cache to write out a file(called Direct I/O under UNIX/Linux) using the write() system call, the kernel will start an I/O transfer to the filesystem such that the write data will be directly read from your process’s memory. The user space “hit” in performance is the latency of calling a kernel system call which involves a context switch.

Some people get all worked up on context switches because they are heavy, etc. But under Linux, the context switch time is less than ONE microsecond.

BTW, the TUX article you reference is really old. Nowadays no one uses tux because the Linux kernel added sendfile() syscall which takes a file from user-space and directly maps it to a socket resulting in zero-copy sending.

Jan '08

Niyaz_PK

Great info. Thanks…

Jan '08

gnom

nice post

Jan '08

Jazz

Thanks for writing about something that i can understand. Not that i’m complaining, 'cause i’m not that savvy with this stuff, but sometimes you have a habit of losing me somewhere in the middle of your posts. And it sucks big time.

Good post!

Jan '08

Brendan

I never really understood the difference between kernel mode and user mode until reading the opening section of this post. Nice job of exposition, and thanks.

Jan '08

RyanP

I vote plus one for the you-have-the-wrong-kind-of-exceptions crowd. Other than that, good article

Jan '08

rim

In Task Manager, Under the processes Tab, In the Column labeled Username there is usually a user called SYSTEM…i think this user is using the Kernel Mode
so, here is a little trick on How to RUN as SYSTEM
Could be Useful with some Immagination

GetTime: 20:25

*in cmd,type: at 20:26 /interactive taskmgr.exe
and wait, at 20:26 the task manager will pop up, but it is running in interactive mode

in the process tab kill/end task of explorer, u will only see then ur desktop image and the task manager
in the task manager click file–new task
type explorer, press enter
explorer will run, and u will be back to full windows mode,but logged in as SYSTEM

I Guess this will make your new processes as SYSTEM(Kernel)
Could be Pretty Useful in some situation

Jan '08

mccoyn

I can’t read memory without making an API call?

I believe the CPU/Memory System takes care of this by creating an interrupt to ask the OS where the memory is actually mapped. Some amount of caching is used to make it faster. There is no need to get the OS involved when the read is on the same page as the last one.

Jan '08

JonathanH

gogolo:
if you can circumvent the use exceptions why not go ahead : for
example in c# the TryParse () method is a good exception work-around

If I recall correctly, if you reflect TryParse() you get something like this:

bool TryParse (object obj)
{
try
{
return Bool.Parse(obj);
}
catch (e)
{
return false;
}
}

Jan '08

Joe_Chung

Jonathan Holland, I just looked at the code for System.Boolean’s TryParse with Reflector. You recall incorrectly.

Jan '08

Fox_Cutter

Another notable case where code went from User to Kernal was with NT 4.0. The GDI Code went from being just another process to living in Kernal space. There was a long history behind why it was a process (partly related to how the OS could have different personas) but it was slow.

Jan '08

Mr_Simple

Seems to me like I remember Dave Cutler (architect of VAX VMS) having issues when hired at Microsoft to architect Windows NT.

He didn’t want any drivers to run in Ring 0, but was ultimately overruled - especially for graphics drivers.

Jan '08

jan_g1

Kernel Mode? User Mode? Never heard at all.
How do we choose in which mode our C/C#/Java Code is executed. From what I’ve read I guess that’s not even possible. When would there be a practical use of these modes? Show me a little code
Would someone be so kind and explain a little more?

Jan '08

gogole

@ Joe Chung
could you post the reflection of TryParse().

Jan '08

JonathanH

gogole:

Joe is right, my memory was faulty. It happens on occasion heh.

public static bool TryParse(string value, out bool result)
{
result = false;
if (value != null)
{
if (“True”.Equals(value, StringComparison.OrdinalIgnoreCase))
{
result = true;
return true;
}
if (“False”.Equals(value, StringComparison.OrdinalIgnoreCase))
{
result = false;
return true;
}
if (m_trimmableChars == null)
{
char[] destinationArray = new char[string.WhitespaceChars.Length + 1];
Array.Copy(string.WhitespaceChars, destinationArray, string.WhitespaceChars.Length);
destinationArray[destinationArray.Length - 1] = ‘\0’;
m_trimmableChars = destinationArray;
}
value = value.Trim(m_trimmableChars);
if (“True”.Equals(value, StringComparison.OrdinalIgnoreCase))
{
result = true;
return true;
}
if (“False”.Equals(value, StringComparison.OrdinalIgnoreCase))
{
result = false;
return true;
}
}
return false;
}

Jan '08

gogole

@Jonathan Holland

hey we all get things wrong sometimes .BTW where/how did you get the reflection of TryParse() .i’m still learning and don’t know much about reflection.

Jan '08

JacquesC

" Only rings 0 (Kernel) and 3 (User) are typically used".

For Windows, yes. This was a design decision during the original NT project. NT was going to run on multiple chips, not just x86. Some of those chips only had two privilege levels, so NT was designed to run with two, instead of four.

A pretty good summary, though you might have expanded on why it’s so “costly” to change modes.

Jan '08

David_E

Once upon a time Windows NT video drivers ran totally in User Mode. If the video system crashed the screen would just go black for a second and then redraw itself.

Microsoft switched them to Kernel Mode in version 4.0 in an attempt to increase performance and beat Novell and OS/2.

Jan '08

SashaG

Hi Jeff,

Some people already pointed out the slight inaccuracies in your post. Just a couple of things to emphasize:

It is fair enough to say that practically all exceptions involve a trip to kernel mode. This is true for “native” Win32 exceptions (SEH), this is true for C++ exceptions (which are just a particular kind of SEH), this is also true for .NET exceptions (which are implemented on top of SEH).
It is NOT fair to say that a user-mode to kernel-mode trip is “really expensive”. It is not free, but there are things significantly worse than just this transition. After all, if it were “really expensive” then by definition all system calls would be “really expensive”, because all system calls involve a user-mode to kernel-mode transition. So while it does make sense to mention that there’s a cost associated with transitioning between protection rings, it’s unfair to say that it’s “really expensive”.
While it is true that WDF offers the ability to run particular kinds of drivers (mostly USB drivers) in user-mode through the use of UMDF, even this architecture involves the user of a kernel-mode reflector device which reflects API calls from user-mode through kernel-mode back to the user-mode driver. The reason for this is that applications still communicate with the device through the usual Win32 system calls (e.g. ReadFile, WriteFile, DeviceIoControl etc.), which go through the I/O manager in kernel mode. Besides, most drivers on a Vista box are STILL kernel-mode drivers, entirely implemented in kernel-mode with no user-mode components whatsoever. So this was rather unfair to say, too.
Note that you slightly contradict yourself by saying that the greatest benefit of user-mode is protecting the system from catastrophic failures, yet mentioning (as a good example) the fact that Microsoft has transitioned parts of IIS to the HTTP.SYS driver. From the aspect of fault isolation, it is NOT a smart thing to do. From the aspect of getting top performance (e.g. performing DMA for cached content instead of going to main memory) it IS a smart thing to do.

I think it would be great if you provided a clarifying post for some of these things, because currently what you wrote is rather confusing for people who aren’t familiar with the material in question.

Sasha

Jan '08

M1148

Sasha: Not everyone is using Microsoft Visual C++ on Windows.
Let’s assume for a minute that regular C++ exceptions in the above platform are implemented as SEH and require kernel mode transitions.

Other implementations of C++ on windows, and on other OSs do NOT require going through kernel mode.
Not to mention other languages. Do Haskell exceptions require going through the kernel? I doubt it. Same for Python. There’s usually no need for that.

However, Jeff really needs to update this post. It’s below his usual standard.

Jan '08

Chris_Nahr

Got the C# version running. Jesus H. Christ. Those are incredible results.

38064 ms kernel, 89326 ms user, 5000000 exceptions

That’s three times slower than even VC++ on WOW!

But wait, maybe .NET itself is just rotten slow? To test that I’ve added another loop with regular “if” branches taken in alternation, and an Exception object created for each branch so we have the same load on the garbage collector. And here’s the result:

15 ms kernel, 140 ms user, 2500000 + 2500000 branches

Branching without exceptions is EIGHT HUNDRED TIMES faster than executing a try-catch block! I tried shuffling the code around to see if the order of JIT compilation or the “warmup” phase made any difference, but they didn’t.

To be honest, I’m not sure where the kernel time is coming from in the second case – I thought the .NET GC was running in user mode. Anyway, I’ve cross-checked both test cases with the regular System.DateTime facility, and the sum of kernel + user mode is correct in each case.

You can download the C# 2.0 program here:
http://www.kynosarges.de/misc/throw_net.cs

I’ve compiled the program with csc /o throw_net.cs, meaning it runs in native 64-bit mode on my Vista 64 system.

And now I’ll have to go and rip out any and all exceptions I used to control program flow in my C# code, thinking they couldn’t be THAT slow.

Jan '08

f0dder

Kernel mode time could be because of memory allocation? Or some serializing instruction? Hard to tell without digging into it with a debugger. I should pick up on dotNET stuff.

Please don’t run amok and think exceptions are superbad and should be avoided just because of this, the trick is to use them for exceptional conditions, and not to abuse them for general control flow manipulation (and obviously not for tight loops ;)).

If exceptions are used as they ought to be, imho the speed hit could be even worse, and it would still be insignificant.

Jan '08

f0dder

Chris Nahr: I tested both doing a “throw 42;” and throwing a “class EmptyException {};” - I also tested catching both with the catch-all ellipsis as well as cathing EmptyException specifically. All ways, VS2008 uses kernel32.RaiseException, which always does userkernel transition. I suppose this might have to do with VC++ supporting both SEH and C++ exception, so they probably decided to model C++ exceptions on SEH for simpler codepath?

There’s no reason that C++ exceptions need to go through kernel transitions generally, though. I’m going to have a look at GCC from MingW, I would think they handle everything on their own with usermode code.

Jan '08

f0dder

Just checked MingW/GCC-3.4.5, which indeed doesn’t end up in userkernel transitions. Throwing 50 million dummy exceptions took roughly 1/4th the time with G++ generated code than with VC++ from VS2008. Please don’t be too quick to infer anything from this though, as we’re talking basically “throw Dummy(); catch(const Dummy);” with no complex stack unwinding etc.

Jan '08

Dave

I can’t read memory without making an API call?

I believe the CPU/Memory System takes care of this by creating an
interrupt to ask the OS where the memory is actually mapped.

That’s got to slow things down.

Jan '08

Chris_Nahr

Thanks for the test report, that’s very interesting. Did you try compiling the VC++ executable in debug vs release mode, just in case this makes a difference?

I’m really surprised by these results. Sure, kernel mode transitions don’t necessarily imply terrible performance losses in real-life scenarios, but this seems like such an unnecessary thing to do.

Why did Microsoft choose this architecture for all their languages? Just because it was the simplest thing to do, or to facilitate interfacing with debuggers, or…?

Jan '08

f0dder

Dave: of course you can read/write memory without API calls or interrupts, you just can’t write to kernel memory from usermode… whether memory mapped devices would probably be set as kernel.

mccoyn is probably thinking of the CPU feature called paging, included on x86 all the way back on the 80386. It doesn’t involve interrupts though, but does have some caching going on (TLB entries). Modern operating systems use it to provide a per-process view of memory space (CR3 register + paging tables), to translate these virtual addresses to physical addresses, and insulate processes against eachother. Complex topic that deserves a lot more text

Jan '08

SashaG

I think the choice is for the sake of having a uniform approach that guarantees a similar way of registering and invoking exception handlers in a way that is relatively independent of the language or architecture in use. It facilitates communicating first-chance and second-chance exceptions to the debugger in a uniform way; communicating failure to find an exception handler to the environment subsystem; invoking the unhandled exception filter; etc.

Jan '08

f0dder

Debug vs. release builds won’t matter, and the various /EH types don’t matter either. I was surprised by this as well, but even if GCC is generally 4x faster at exception handling (and not just for this simple test), imho exception handling speed shouldn’t matter unless you’re abusing exceptions for something they weren’t made for (and considering that VC++ generally optimizes better than GCC, well…)

I threw together a little test package: http://f0dder.reteam.org/misc/throw_speed.zip

Interesting thing is that on 64bit XP, it seems like kernel32.RaiseException doesn’t go userkernel, but is still very slow (slower than 32bit with userkernel it would seem, but I don’t have XP32 on this box to test). Since testing was done with 32bit exe, it shouldn’t be because of extra registers needing to be saved, etc. With 64bit exe, things are bad.

32bit Vista, Turion-64-X2 TL-60 (2.00GHz)
throw_vc: 8174 ms kernel, 7316 ms user, 5000000 exceptions
throw_gcc: 0 ms kernel, 5350 ms user, 5000000 exceptions

64bit XP, AMD64x2 4400+ (2.21GHz)
throw_vc: 0 ms kernel, 7296 ms user, 5000000 exceptions
throw_gcc: 0 ms kernel, 2968 ms user, 5000000 exceptions
throw_x64: 0 ms kernel, 24437 ms user, 5000000 exceptions

Jan '08

f0dder

Just got some more results…

64bit Vista, Intel Core 2 quad Q6600 2.4Ghz:
throw_gcc: 0 ms kernel, 4140 ms user, 5000000 exceptions
throw_vc: 16406 ms kernel, 23640 ms user, 5000000 exceptions
throw_x64: 0 ms kernel, 19421 ms user, 5000000 exceptions

Jan '08

Chris_Nahr

keanuWhoa./keanu

I can confirm those values on my Vista 64 machine (Intel Core 2 Duo E6600, roughly the same results).

So SEH on 64-bit Vista is over four times slower than gcc’s exception mechanism, and going through the new WOW 32/64-bit translation layer doubles that time.

This is amazing. I’ll have to cook up test cases for C# and Java, though Java probably won’t give me kernel timing.

Jan '08

f0dder

Chris: even if java won’t let you get kernel times, you can at least watch taskmanager or process explorer with “show kernel times” enabled, that’ll give you a good idea even if not accurate timings.

Jan '08

k116

Amd Athlon XP 3000+ (32bit) running Linux kernel 2.6.23:

real 0m34.062s
user 0m33.784s
sys 0m0.202s

Which makes me wonder if I did something seriously wrong.

Jan '08

Chris_Nahr

Operative words being “probably take a trip through the OS kernel.”

Precisely. “Probably” does not mean “always”.

Nowhere in this article does he state that either Windows SEH or the exception models built on top of it (which by the way include both .NET and Visual C++, as he explicitly says) will always go into kernel mode, as you and others were implying.

“Probably” should refer to the cause of the exception, or related processing, when an exception occurs in the context of I/O. I would be extremely surprised if an Int32.Parse exception did in fact transition to kernel mode.

Jan '08

Chris_Nahr

Note: it seems the Visual C++ runtimes do call kernel32.RaiseException wich calls ntdll.RtlRaiseException which calls ntdll.ZwRaiseException, which eventually does do a switch to ring0.

Did you trace the code path for all exception handling, or did you just look up the referenced API calls?

The VC++ runtime might well switch to Ring 0 for some exceptions but that doesn’t mean this happens for all exceptions.

Jan '08

Chris_Nahr

PS Conjecture: This code path that calls ZwRaiseException might be executed only when an exception is unhandled by user code, so as to allow the Windows exception handler or an attached debugger to deal with it.

Jan '08

Chris_Nahr

Addendum: I thought that maybe JIT inlining made a difference, so I tried manually inlining the throw statement, and disabling optimization to disable automatic inlining of the “return new Exception()” statement. Once again, no difference in the results.

Sure, I know. I’m no Joel Spolsky, I’ll keep using exceptions for error handling.

Unfortunately, I did abuse .NET exceptions occasionally for regular flow control because I thought they would be fast enough to handle rarely visited branches. I’ll have to go back and check all those cases, and probably rewrite them.

Jan '08

Lordh2

Well I know it’s not my code that’s to blame for these exceptions. It’s those darn end users doing things they shouldn’t do!

Jan '08

f0dder

Chris Nahr: a thing I wouldn’t call misuse, btw, would be breaking out of deep nesting (whether it be just control structures or function calls as well), instead of using goto (which only works for local nesting) or propagating returncodes for many layers.

But again, only if the abort is done rarely (or, should I say, “exceptionally” :)).

Jan '08

f0dder

az: catch(…) catches SEH exceptions in VC because VC C++ exceptions are modeled ontop of SEH exceptions (thus the userkernel roundtrips in some windows versions).

GCC code won’t catch SEH exceptions with catch(…), which also means you need SetUnhandledExceptionFilter (or local SEH try/catch blocks, dunno if GCC supports that?) if you want to catch hardware exceptions or other SEH exceptions from the few API calls that can raise them.

Jan '08

RenatoG

The green line is total CPU time; the red line is Kernel time. The gap between the two is User time.

Not exactly.

Total CPU time means wall clock time, which is the total time the program took to run. CPU time, as you said, is the time that the CPU was running in level 0 (kernel mode) and User time is the time that the process was executing (ie. using the CPU) in level 3 (user mode).

There is a huge difference of User mode and total time because the process can be waiting for a resource and not use the CPU at all. To clarify, check the example below:

Create a FIFO:
$ mkfifo foo
‘cat’ the fifo (ie. read from it)
$ time cat foo

It’ll block, waiting from data or EOF, but there’s nothing writing to the FIFO so it’ll wait forever, without a single User mode CPU utilization.

In another terminal write something to the FIFO:
$ echo “foo” foo

The first terminal will then write “foo” and exit (echo sends EOF). The time reported will be something like this:

real 0m12.239s
user 0m0.000s
sys 0m0.008s

Which means it spent 0.008 seconds to send the output throughout the FIFO, 0 seconds in user mode and 12.24 seconds waiting.

Jan '08

f0dder

Renato: actually Jeff’s statement is correct, since taskmgr and process explorer show the current CPU usage/load, NOT Walltime. Same thing if you use the per-process “performance graph” of Process Explorer. If you have an app that’s doing a blocking wait (and has no other active threads), it’s CPU usage graph will flatline.

When talking about absolute time usage, you’re obviously right that we’re talking Walltime and CPUTime where CPUTime is split between kernel and user. But task managers don’t show absolute usage in their graph overview, they show relative CPU usage since last update-tick.

Jan '08

az11

great informative comments.
one interesting detail which wasn’t mentioned so far is that, at least in VC genreated code, catch(…) will truly catch all: both C++ and SEH.

which is, of course, a good enough reason to never ever use catch(…)

Jan '08

Jaster

“rim’s example will make your process run with SYSTEM credentials and will thus give you complete control of the system (SYSTEM has more privileges than any Administrator), but it doesn’t mean you’ll run in kernel mode”

Not only not kernel mode (which would as he said be no use to you anyway) but also not a true admin either it does have some elevated privileges (and full control of most files) but it is limited in some subtle ways, the main difference is that it is a local only (no privileges on the network) and service only (no interactive logon) which is why you have to trick the system to run cmd

Jan '08

Sri

Jeff, Could you tell me , how come there are two displays in your pic of the CPU Usage History ? If i open my Task Manager it shows only one box of graph for CPU Usage History.

Jan '08

f0dder

Jaster: true, the trick doesn’t grant you any network privs, you only get full (local) privileges - like being able to look inside “System Volume Information”

Sri: because Jeff has a dual-core machine.

Jonathan Wilson: ouch, no SEH support at all, compiler side? You can always resort to SetUnhandledExceptionFilter though, and you could add some SEH support with assembly, but that’d be messy.

Jan '08

f0dder

Vista still uses a two-ring user/supervisor split, and so will operating systems yet to come. Virtualization software like vmware have used the other rings because virtualization is tricky, but with the VMX (Vanderpool/Pacifica) instruction sets, my bet is ring 1 and 2 will be used even less.

Btw, I sent the vcblog team a mail asking about the exception handling stuff, and they said my questions might be answered here (haven’t had time to watch yet, though): http://channel9.msdn.com/Showpost.aspx?postid=343189

Jan '08

gogole

Jeff,i’ve been learning about the NT kernel which Microsoft uses for almost all its OS’s and found out that the kernel was originally designed for a two-ring system (the kernel ring and the application ring) thus the other two rings were practically left idle.I would like to know if this two-ring system is still being used i.e with Vista.If so ,how will the other two rings when implemented improve system performance (thinking of a well structured preemptive kernel).Please fill me in.

Jan '08

gogole

@fodder
Thanks for the info,checking out the link.

Jun '08

Ryan

Hello, I was wondering if someone could help point me in the right direction… (if there is one) I have a function that needs to test how many clock cycles it takes to execute a function. (I am using c# but can use anything else.) … are any of the below possible…

disable interrupts on a system. (this does not seem possible with xp/vista) (I understand that this is not safe.)
If i could get the OS to give me a guaranteed full time slice. If the function is small it can be timed without interruption or if the function where larger then different parts of the function could be run on different time-slices and then the times can be added.
If I could somehow ketch the CPU counter when it is preempted. I would then have different lengths that I could add it up.

Jun '08

f0dder

Ryan: forget about getting anything really exact when running under traditional operating systems (that includes linux, too). Forget about turning off interrupts. Forget about any “guarantees”.

Basically, the best you can do is boosting your thread priority to your OS equivalent of “realtime” (requires admin/root), giving away the reminder of you current timeslice, and then running your code… usually, for routines that take less than a second, you’re going to run the routine N times, and either do an average, or record min/max/avg times… or even recording all times, and reporting mean time as well.

If you want anything more accurate than that, get a profiler (AMD CodeAnalyst or intel VTune).

Oh, and remember to “burn some rubber” before doing the test, so you’re sure the CPU is not in reduced-speed power saving mode.

Jun '08

Ryan

Hi F0dder, Thank you for your post. That is what i have come up with so far. I run the code multiple times and then i take the most often occurring tick-count(the fastest ones) and then average those out. 90% of the time it is with-in a few clock cycles (that is all i need). The only drawback to this is that it need to be run several times.
BTW here is what i do in a nutshell…

Call the code 4 or 5 times. (to make sure its in cache)
Loop X times
call a sleep(0) (to get to a beginning of the timeslice)
start timer
call the code
end timer
Find the most often (and fastest) occurring clocks

I also set the system to “background processes” so that the OS creates longer timeslices. (helps a little bit)

Thanks for the VTune tip… ill check it out to see if there is anything that can be used without re-inventing the wheel.

Oct '08

antber

And what about DirectX applications? Can be transitions between UM KM the reason of performance penalties incurred by pipeline state changes? Maybe anybody do this tests?

Feb '10

RomuloC

“… crashes in user mode are always recoverable.”

Well, that depends on what you mean by recoverable. A crash in a critical process like winlogon.exe or csrss.exe halts the PC, although they’re user mode processes.

Feb '10

Jon

When did your ads move? They don’t show up in my RSS reader (which I like), but are subtlely on the left-hand side now.

Feb '10

JonathanW

GCC on windows does not support any kind of SEH at all, local or otherwise.

Jan '11

Wolferajd

Romulo: If it comes to csrss.exe, if you view this “user mode” application, you’ll see that the thread cdd.dll is running in special mode, which does not allow you to view stacks.

also, it is good to note that csrss.exe is a kernel of Win32 mode, thus’ it’s hang WILL hang all other applications that are not using Native code (while, I hope all here know Native code uses only kernel-mode programs in Windows).

This leads me to say that both csrss.exe and winlogon.exe are in-fact kernel-mode programs, which execute lots of code in kernel-mode, but they have also major user-mode part in the same process.

Besides, the csrss.exe “cannot be run in Win32 mode” while, winlogon.exe can do this.

Besides, hung of winlogon.exe doesn’t give bad things (try suspend the process, which I did offen).

Conclusion: winlogon.exe is hybrid user-mode process (major part in user-mode, only some calls to kernel-mode), you can even kill it and smss.exe or csrss.exe (not sure which one) will simply terminate all process in your session and reconnect you to new session, destroying previous one.

csrss.exe is hybrid kernel-mode process (major part in kernel-mode, mainly for “Canonical display driver” running inside csrss.exe (not to mention the csrss.exe itself is just a kernel-mode loader for JUST DLLs that run, the csrss.exe thread itself does not exist in csrss.exe)

The fact you see the process outside of SYSTEM process, does not mean they are user-mode yet.

Ofcourse, I am not sure of what I written here, I am pretty much sure that csrss.exe, smss.exe, wininit.exe (Vista), and subsystem programs (optional) all run in KERNEL-MODE, even tho’ they are not in SYSTEM(4) process.

While csrss.exe and winlogon.exe are exactly at the “border” of kernel-mode, user-mode. (Don’t know about services.exe, it runs in early time, but I think it’s hybrid too (mostly in user-mode tho’))

Aug '11

SasidharK

hi,
you told that the transition time to switch from user mode to kernel mode is very expensive, my question is whether this time could be greater than the context switch time from one process to another process. Please clarify and thanks in advance.

Dec '19

Guidra

The first figure implies that devices drivers run in ring 1 and 2, that is not the case. The x86 architecture defines four rings. Windows uses ring 0 for kernel mode and ring 3 for user mode. The reason Windows uses only two levels is that some architectures, such as ARM and MIPS/Alpha, implemented only two privilege levels. Settling on the lowest minimum bar allowed for a more efficient and portable architecture, especially as the other x86 ring levels do not provide the same guarantees as the ring 0/ring 3 divide.

Source : “Windows Internals Part 1”

Understanding User and Kernel Mode的更多相关文章

读Understanding the Linux Kernel, 3rd Edition有感
14.3.2.2. Avoiding request queue congestion Each request queue has a maximum number of allowed pendi ...
Understanding Virtual Memory
Understanding Virtual Memory by Norm Murray and Neil Horman Introduction Definitions The Life of a P ...
Windows Kernel Security Training Courses
http://www.codemachine.com/courses.html#kerdbg Windows Kernel Internals for Security Researchers Thi ...
linux kernel内存回收机制
转:http://www.wowotech.net/linux_kenrel/233.html linux kernel内存回收机制作者:itrocker 发布于:2015-11-12 20:37 ...
Linux Kernel - Debug Guide (Linux内核调试指南 )
http://blog.csdn.net/blizmax6/article/details/6747601 linux内核调试指南一些前言作者前言知识从哪里来为什么撰写本文档为什么需要汇编级 ...
Linux kernel 内核学习路线
看了下各位大神的推荐路线,总结如下: 0. 跟着项目走: 1. 学会用.熟练用linux系统: 2. Linux Kernel Development. 3. Understanding the Li ...
（转） [it-ebooks]电子书列表
[it-ebooks]电子书列表 [2014]: Learning Objective-C by Developing iPhone Games || Leverage Xcode and Obj ...
Linux 下系统调用的三种方法
系统调用(System Call)是操作系统为在用户态运行的进程与硬件设备(如CPU.磁盘.打印机等)进行交互提供的一组接口.当用户进程需要发生系统调用时,CPU 通过软中断切换到内核态开始执行内核系 ...
[转]透过 Linux 内核看无锁编程
非阻塞型同步 (Non-blocking Synchronization) 简介如何正确有效的保护共享数据是编写并行程序必须面临的一个难题,通常的手段就是同步.同步可分为阻塞型同步(Blocking ...

随机推荐

重磅！解锁Apache Flink读写Apache Hudi新姿势
感谢阿里云 Blink 团队Danny Chan的投稿及完善Flink与Hudi集成工作. 1. 背景 Apache Hudi 是目前最流行的数据湖解决方案之一,Data Lake Analytics ...
JVMGC+Spring Boot生产部署和调参优化
一.微服务开发完成,IDEA进行maven clean和package 出现BUILD SUCCESS说明打包成功二.要求微服务启动时,配置JVM GC调优参数 p.p1 { margin: 0; ...
Java 获取Word中的标题大纲（目录）
概述 Word中的标题可通过"样式"中的选项来快速设置(如图1), 图1 在添加目录时,可将"有效样式"设置为"目录级别"显示(如图2),一 ...
北航OO第一单元作业总结（1.1~1.3）
经过了三次作业之后,OO第一单元告一段落,作为一个蒟蒻,我初步了解了面向对象的编程思想,并将所学内容用于实践. 一.第一次作业 1.架构分析本次作业需要完成的任务为简单多项式导函数的求解.表达式仅支 ...
一文简述Java IO
Java IO 本文记录了在学习Java IO过程中的知识点,用于复习和快速查阅,不够详细的部分可能会在后续补充. 什么是流流:内存与存储设备(外存)之间传输数据的通道 IO:输入流输出流(如rea ...
如何以源码形式运行Nacos Server
官方标准运行方式下载解压可运行包 curl -O https://github.com/alibaba/nacos/releases/download/1.3.2/nacos-server-1.3. ...
day-26-封装-property装饰器-反射
一.super进阶在多继承中:严格按照mro顺序来执行 super是按照mro顺序来寻找当前类的下一类在py3中不需要传参数,自动就帮我们寻找当前类的mro顺序的下一个类中的同名方法在py2中的 ...
【beego】beego的路由设置
beego 存在三种方式的路由:固定路由.正则路由.自动路由基础路由从 beego 1.2 版本开始支持了基本的 RESTful 函数式路由,应用中的大多数路由都会定义在 routers/rout ...
hdu3870 基于最短路的最小割
题意: 给你一个平面图,让你输出(1,1),(n ,n)的最小割.. 思路: 看完题想都没想直接最大流,结果TLE,想想也是 G<400*400,400*400*4> ...
Linux中编写Shell脚本
目录 Shell Shell脚本的执行 Shell脚本编写规范 Shell 中的变量变量的算术运算双小括号 (()) 数值运算命令的用法 let 运算命令的用法 expr 命令的用法 br 命令 ...

Understanding User and Kernel Mode

随机推荐

热门专题