I wanted to explain the issue of idle priority at the time, and why it is a kernel problem, but was tied up. It's a "little" bit late, but here it is. Hope it was worth the wait.
Unix-like systems, including UNIX, BSD, OSX, Linux, and Minix all have a scheduler in the kernel. The scheduler is responsible for setting priorities for different threads, which it does by allocating "time slices". On bootup, the kernel checks what speed of CPU it is running on and sets a reasonable minimum time slice. You can see this happen on OSX during bootup, but the text passes by quickly, so get it from the system log file located at:
/VolumeName/private/var/log/system.log
If you have not rebooted in awhile, the bootup text will first end up being archived in the rotating system.log.x.gz archives, where x = 0-9. Eventually, even that archive will get overwritten. Anyway, the text you are looking for looks like this: (notes in
bold)
Code:
Jun 10 16:12:45 Minerva syslogd: restart start logfile daemon
Jun 10 16:12:45 Minerva mach_kernel: standard timeslicing quantum is 10000 us determine timeslice
Jun 10 16:12:45 Minerva mach_kernel: vm_page_bootstrap: 332389 free pages initialize VM system
...continued...
The kernel selects a time slice big enough for a thread to do something useful, but not so big as to make other threads seem unresponsive. In the example above from my system, the time slice is 10,000 microseconds / 10 milliseconds / 1% of a second on a 350 MHz CPU, which works out to 3.5 million CPU cycles per slice. I'd have expected the slices to be smaller, but that is what the logfile says.
During operation, the scheduler hands out time slices to threads based on their priority. If all threads are the same priority, each thread will get a slice, then the next thread gets a slice, until all threads have been given time and the schedule starts over. ("i" is the idle thread in the kernel)
1:2:
3:4:5:6:i:
1:2:
3:4:5:6: etc
A higher priority thread will get it's slice tacked in more often. For example, if thread #3 is max priority while all others are normal:
3:1:
3:2:
3:4:
3:5:
3:6:i:
3:1:
3:2:
3:4:
3:5:
3:6: etc
If thread #3 were max and everything else was at low priority:
333:1:
333:2:
333:4:
333:5:
333:6:
333:1:
333:2:
333:4:
333:5:
333:6: etc
A low priority thread amongst others may turn up only one cycle out of 10 and looks like this:
1:2:
3:4:5:6:i:
1:2:4:5:6:i:
1:2:4:5:6:i:
1:2:4:5:6:
1:2:4:5:6:i:
1:2:4:5:6:i:
1:2:4:5:6:i:
1:2:4:5:6:
1:2:4:5:6:
1:2:4:5:6:i:
1:2:
3:4:5:6: etc
In practice, most threads run at normal priority (0) while a few threads run at higher levels (-10) and a few more run at lower priorities (+10 like fix_prebinding) or minimum priority (+19). It is rare for a thread to run at max priority (-19) because it drags down everything else, this should happen only for critical things that need to execute ASAP, like a security reconfig or a repair operation on the boot disk, or a dedicated crunch box that is not expected to do anything else.
When a thread receives CPU, it can run through it's full slice or return the CPU to the kernel early. A thread that watches for input will check the input buffer, no buffered contents results in a quick abort. The scheduler notes an early return, and hands out a slice to the next thread early. If all threads in the schedule table exited early, then a slice goes to the idle thread. The scheduler could restart the schedule early, but if there is nothing to do in any thread, that is a waste of energy. Sending to the idle thread causes minimum CPU resources to be used - I'd imagine it's mostly a bunch of no-op commands or even a CPU sleep command. Even so, the idle thread still takes only one time slice, so the schedule will still restart quickly. If the user sets their system to maximum energy use, the idle thread may only make a note it was called, and exit immediately so the schedule can continue instantly.
If even one thread used their full slice during the schedule, the idle thread will not be called and the schedule will repeat until all threads exit early. When a DC client is running, the DC thread will almost always run it's full slice, so there is no idle thread activity unless the client needs to do some input / output to the disk or internet.
You can see this in action by running
top in Terminal. It will report relative CPU use based on CPU time each thread used. It will also report
Processes: 60 total, 3 running, 57 sleeping, 1 zombie... 149 threads.
Processess are applications, which may spawn more than one thread - good for multiple CPU systems.
A
running process is one with threads, where at least one used it's full slice.
A
sleeping process means all it's threads exited early.
A
zombie process is a process that has been killed, but the memory / VM system has not finished deallocating all it's assigned RAM, so it can't be removed from the schedule table yet. A zombie is not given a time slice, it is left in the table so you can see what the system is doing, until the system is done with it. The schedule table is most likely the master process table, so the memory system watches it. I'd guess that marking a process as a zombie in the table is what causes the memory system to notice that process and deallocate the relevant memory.
With just a few threads running and the rest exiting early, all threads get called frequently. My example hands out slices that are 1% of a second long, but if only 3 threads take their full slice and all the others collectively take 1 slice worth of time to exit, then the schedule table fully executes 25 times per second even with the full 149 threads. Even in this case, I would judge that with
top updating one time per second, that the active threads will have exited early most of the time, they only need to have used one slice fully in order to be flagged as active. So the minimum number of times through the table in this example is a little under 100 times per second, and is likely 50-90 times unless a thread maxes out the CPU actually doing something.
Schedulers can get more fancy with some threads. A CD burning thread may require a slice 10 times per second, and the scheduler will guarantee that, making sure the burner thread gets called that often. No buffer underruns. The same applies to iTunes to avoid skipping, and to anything involving streaming content, to avoid frame drops.
The problem with this scheduler approach to priority is that there is no true idle priority. Every thread will eventually get some CPU, even if it only gets a slice once every 100 runs through the schedule table. To handle a real idle priority thread, you would need to patch another schedule table in between the main table and the idle thread routine.
The second table would get a slice only if the main table had exited early on all threads. After the 2nd table hands one slice out to an idle-priority thread, it returns to the main table. The next idle-priority thread in the 2nd table will not get a slice until the next time the main schedule exits early on all threads.
If you wanted to prioritize idle threads, things would get still more involved. None of the Unix-like operating systems provide the 2nd schedule table, so none of them have a true idle priority, just a minimum priority. From things I've heard, it appears that Windoze32 versions do provide a true idle priority, but I do not have an x86 box to check that for sure.
Personally, I would like the added utility of a true idle priority, particularly if it offered prioritization with the idle processes. Perhaps the kernel developers of the various operating systems will provide that someday, but it does mostly apply to situations like ours' with DC clients. I've been hard pressed to come up with any other process that would benefit from a true idle, every other example that comes to mind may execute rarely, but does need to be guaranteed to execute sometime. A disk integrity check that normally finds nothing wrong may be postponed with little concern, but it does need to run eventually. A true idle priority thread may never get called, especially if a DC client were running at a low priority rather than an idle priority.
If anyone can think of other processes that would legitimately run at true idle, please post. The kernel developers are more likely to grant our wish if we can provide *cough* legitimate uses for it. That, or we could compile our own kernel and related utilities for setting the idle priorities.
Ok, that looks like a decent summary. Note that I am not a kernel developer, so it is possible that some of this is muxed up. Kernel experts, feel free to chime in and correct as needed.
ps - Scott, your "explanation in the above post" is condensed by about 10 KB and at least an hour.
