10/30/2006 06:20:00 PM
Posted by: Dave
MacLachlan, Member of Technical Staff, Mac Team(Editor's note: today's post is a bit different from our usual fare -- it's aimed at the Mac programmers out there. And if you're not a programmer, you might want to find one to guide you through this peek behind the scenes of how we make our applications fast and reliable.)At Google, software performance is extremely important. Every millisecond counts, which is why we spend a lot of time using performance tools and other techniques to help make our software faster. I was recently
Sharking a piece of multithreaded code and realized we were getting bitten by the use of an
@synchronized block around a shared resource we were using:
+(id)fooFerBar:(id)bar {
@synchronized(self) {
static NSDictionary *foo = nil;
if (!foo) foo = [NSDictionary dictionaryWithObjects:...];
}
return [foo objectWithKey:bar];
}
Shark told us without a doubt that we were paying heavily for the
@synchronized block each of the millions of times we were calling
fooFerBar. We couldn't create the resource in
+initialize, because
fooFerBar was part of a category, and overriding
+initialize in a category is a bad thing. We also couldn't use
+load, because other classes could have easily called
fooFerBar in their
+load, and there's no guarantee on loading order. So our only choice was to minimize the impact of that
@synchronized block, and we didn't want to run into the infamous and dreaded
double-checked locking anti-pattern.
So, I wondered, how exactly does
@synchronized work? And is there a cheaper way of getting the same thread-safe result? I disassembled the code to find out what
@synchronized does, and I saw something like this:
...
objc_sync_enter
objc_exception_try_enter
setjmp
objc_exception_extract
my actual code
objc_exception_try_exit
objc_sync_exit
...
objc_exception_throw
...
That's a lot of setup and tear-down for a simple lock around a shared resource. In this case, we don't need to be exception safe. By reading the
Objective-C documentation on exception handling and thread synchronization, we learn that not only does
@synchronized give us a lock, but it's a recursive lock, which is overkill for this particular usage.
By examining the
code (ADC registration required) that implements
objc_sync_enter and
obc_sync_exit, we can see that on every
@synchronized(foo) block, we are actually paying for 3 lock/unlock sequences.
objc_sync_enter calls
id2data, which is responsible for getting the lock associated with
foo, and then locks it.
objc_sync_exit also calls
id2data to get the lock associated with
foo, and then unlocks it. And,
id2data must lock/unlock its own internal data structures so that it can safely get the lock associated with
foo, so we pay for that on each call as well.
We need to do better than this. It looks like it's time to go back to basics, throw away the
@synchronized call, and wrap our code with some
pthread locks instead.
#include
+(id)fooFerBar:(id)bar {
static pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
if (pthread_mutex_lock(&mtx)) {
printf("lock failed sigh...");
exit(-1);
}
static NSDictionary *foo = nil;
if (!foo) foo = [NSDictionary dictionaryWithObjects:...];
if (pthread_mutex_unlock(&mtx) != 0)) {
printf("unlock failed sigh...");
exit(-1);
}
return [foo objectWithKey:bar];
}
This is ugly stuff, but it's significantly faster, according to Shark. And fast is what we want. We've avoided setting up an exception stack, two excess locks, and a bunch of miscellaneous support code.
So we've achieved our goal of faster code that will work fine, but are there other, cleaner options? After all, if the code is cleaner, there are fewer places for bugs to hide. Tune in for our next post, wherein we'll explore that question.
11/07/2006 10:52:00 AM
Posted by: Dave MacLachlan, Member of Technical Staff, Mac Team
Previously we addressed the problem of optimizing around a shared resource. We came up with one solution, but it was kind of messy, and we wondered if there might be a better way. And now: the conclusion.There is at least one more elegant solution, but it is slightly less safe. So far I've assumed we're using Objective-C, but what happens if we use Objective C++, specifically Objective C++ with gcc 4? According to the
GCC4 porting notes:
GCC 4.0 automatically adds locks around any code that initializes local static variables in C++. If you do not need this protection and want to reduce your code size slightly, you can disable the locking behavior by passing the -fno-threadsafe-statics option to the compiler.
This appears to be backed up by the
gcc 4.0 release notes and the
C++ ABI. So this implies that if we just change our compiler from standard Obj-C to Obj-C++ we should be able to do the following:
+(id)fooFerBar:(id)bar {
static NSDictionary *foo = [NSDictionary dictionaryWithObjects:...];
return [foo objectWithKey:bar];
}
which is certainly nice and clean. Let's take a quick look at the disassembly:
cxa_guard_acquire
my actual code
cxa_guard_release
cxa_guard_abort
Unwind_Resume
...
and by scanning the
code for cxa_* (ADC registration required) we can see that it's doing almost exactly what we want. The only pitfalls here are if we somehow attempt to compile with a gcc version less than 4.0 (we can put in guards against this happening) or we use
-fno-threadsafe-statics in a threaded environment, in which case we're asking for trouble, and trouble will certainly follow (we won't do that).
So, we've got a thread-safe shared resource that does what we want with a minimal amount of code. One last tiny issue remains. What happens if we accidentally mix our Objective-C
@synchronized with C++ dynamic initialization of local statics?
+(id)fooFerBar:(id)bar {
@synchronized(self) {
static NSDictionary *foo = [NSDictionary dictionaryWithObjects:...];
return [foo objectWithKey:bar];
}
}
and the disassembly shows:
...
objc_sync_enter
objc_exception_try_enter
setjmp
objc_exception_extract
cxa_guard_acquire
my actual code
cxa_guard_release
cxa_guard_abort
objc_exception_try_exit
objc_sync_exit
objc_exception_throw
...
Yes, ladies and gentlemen, you get to pay for 4 lock/unlocks and two
exception stacks to protect your wee shared resource, so you may want
to watch for this pattern in your performance-sensitive code when
porting to Objective C++.
Feedback from part 1Thanks to reader Bill Bumgarner, who came up with an interesting solution to my problem that has a distinctive Obj-C feel to it:
static NSDictionary *foo = nil;
+(id)fooFerBar:(id)bar {
@synchronized(self) {
if (!foo) foo = [NSDictionary dictionaryWithObjects:...];
ReplaceMethodImplementationWithSelector
([self class], @selector(fooFerBar), @selector(fooFerBar2));
}
return [foo objectWithKey:bar];
}
+(id)fooFerBar2:(id)bar {
return [foo objectWithKey:bar];
}
where
ReplaceMethodImplementationWithSelector swizzles fooFerBar with
fooFerBar2. So, the first time
fooFerBar is called we get our slow case, and any later calls get the fast case, assuming the first thread has completed
fooFerBar. This solution provides great performance, and you only have to pay for the
synchronize once. Very nice!