从 dyld 看 iOS App 的启动加载过程

一、应用启动初探

1.打印顺序

先看下这段代码,试想一下语句的输出顺序

@interface Person : NSObject

@end

@implementation Person

+ (void)load {
    printf("----------load-----------: %s\n", __func__);
}

@end

__attribute__((constructor)) void cc_func () {
    printf("--------cc_func----------: %s\n", __func__);
}

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        // insert code here...
        NSLog(@"Hello, World!");
    }
    return 0;
}
复制代码

你猜的没错,输出顺序如下:

----------load-----------: +[Person load]
--------cc_func----------: cc_func
Dyld[40374:1115383] Hello, World!
复制代码

它的顺序是:

load --> C++ constructor 方法 --> main()
复制代码

2.main 函数之前

你是否有所疑惑?

main不是入口函数吗?为什么不是main最先执行?

通常在 main 函数之前,还有一系列的事情要做

上面的图片已经清楚展示了启动的流程和阶段。

3. 断点 load 方法

在 load 方法中断点,然后打印堆栈

输出:

(lldb) bt

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 8.1
  * frame #0: 0x0000000100003e60 Dyld`+[Person load](self=Person, _cmd="load") at main.m:17:5
    frame #1: 0x00007fff203ab4d6 libobjc.A.dylib`load_images + 1556
    frame #2: 0x0000000100016527 dyld`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 425
    frame #3: 0x000000010002c794 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 474
    frame #4: 0x000000010002a55f dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 191
    frame #5: 0x000000010002a600 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82
    frame #6: 0x00000001000168b7 dyld`dyld::initializeMainExecutable() + 199
    frame #7: 0x000000010001ceb8 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 8702
    frame #8: 0x0000000100015224 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 450
    frame #9: 0x0000000100015025 dyld`_dyld_start + 37
复制代码

4.启动过程(Pre-main)

这里先看一下 main 函数之前(pre-main)都做了哪些事,大概有个印象,后面会具体探究

5.启动过程(main)

main 函数及后面的阶段大家应该比较熟悉啦

二、dyld

1.dyld 是什么?

dyld(The dynamic link editor) , 动态链接器。dyld 是一个用户态的进程,是 Apple 维护的 Darwin 的一部分(dyld),位于:/usr/lib/dyld,用它来加载动态库。

2.dyld 作用

  • 负责程序的链接及加载工作。应用被编译打包成可执行文件 Mach-O 后,启动时候,由 dyld 负责链接、加载程序到内存。
  • 符号绑定(binding)。因为,在OS X 上几乎所有的程序都是动态链接的,Mach-O 文件中有很多地方是外部库和符号的引用,因此需要在启动的时候进行索引填充,这个工作就是 dyld 来执行的。这个过程也被称为是符号绑定(binding)。

3.dyld 加载流程

  • dyld 是如何加载的?
  • 程序是如何进行初始化的?

在前面断点bt图,我们看到 dyld 有一个 _dyld_start 方法,当我分析它的时候,发现它是汇编实现的,我们一起来看一看。

当任何一个新的进程开始时,内核设置用户模式的入口点到 __dyld_start。

具体的调用示意图如下:

4._dyld_start

dyldStartup.s 这个是汇编代码,我们简单看一下

#if __arm64__ && !TARGET_OS_SIMULATOR
	.text
	.align 2
	.globl __dyld_start
__dyld_start:
	mov 	x28, sp
	and     sp, x28, #~15		// force 16-byte alignment of stack
	mov	x0, #0
	mov	x1, #0
	stp	x1, x0, [sp, #-16]!	// make aligned terminating frame
	mov	fp, sp			// set up fp to point to terminating frame
	sub	sp, sp, #16             // make room for local variables
#if __LP64__
	ldr     x0, [x28]               // get app's mh into x0
	ldr     x1, [x28, #8]           // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
	add     x2, x28, #16            // get argv into x2
#else
	ldr     w0, [x28]               // get app's mh into x0
	ldr     w1, [x28, #4]           // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
	add     w2, w28, #8             // get argv into x2
#endif
	adrp	x3,___dso_handle@page
	add 	x3,x3,___dso_handle@pageoff // get dyld's mh in to x4
	mov	x4,sp                   // x5 has &startGlue

	// call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
	bl	__ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm
	mov	x16,x0                  // save entry point address in x16
    ...
复制代码

通过注释,可以看到,调用了 dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue) 方法,这个方法在上一节的截图中也有看到。

5.dyldbootstrap::start

这个方法是在 C++ namespace 为 其实 dyldbootstrap 下的 start 方法。代码如下:

dyldInitialization.cpp 实现

namespace dyldbootstrap {
    ...
//
//  This is code to bootstrap dyld.  This work in normally done for a program by dyld and crt.
//  In dyld we have to do this manually.
//
uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
				const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{

    // Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536>
    dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0);

	// if kernel had to slide dyld, we need to fix up load sensitive locations
	// we have to do this before using any global variables
    rebaseDyld(dyldsMachHeader);

	// kernel sets up env pointer to be just past end of agv array
	const char** envp = &argv[argc+1];
	
	// kernel sets up apple pointer to be just past end of envp array
	const char** apple = envp;
	while(*apple != NULL) { ++apple; }
	++apple;

	// set up random value for stack canary
	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	// run all C++ initializers inside dyld
	runDyldInitializers(argc, argv, envp, apple);
#endif

	_subsystem_init(apple);

	// now that we are done bootstrapping dyld, call dyld's main
	uintptr_t appsSlide = appsMachHeader->getSlide();
	return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
}
复制代码

函数最终执行了 dyld::_main()  第一个参数,我们看到是 macho_header

这个我们如果清楚 mach-o 结构的话,对这个可能不太陌生。dyld 就是用来加载 Mach-O 文件的,到这里应该能看出一二了。

start 函数操作

  • 根据 dyldsMachHeader 计算出 slide, 进而判断是否需要重定位(rebaseDyld函数中)
  • mach_init() 初始化操作 (rebaseDyld函数中)
  • 溢出保护
  • 计算 appsMachHeader 偏移, 调用 dyld::_main 函数

接下来重点看一下 dyld::_main  的操作

6.dyld::_main()

dyld::main 函数实现

//
// Entry point for dyld.  The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
		int argc, const char* argv[], const char* envp[], const char* apple[], 
		uintptr_t* startGlue)
{
	if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
		launchTraceID = dyld3::kdebug_trace_dyld_duration_start(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, (uint64_t)mainExecutableMH, 0, 0);
	}

	//Check and see if there are any kernel flags
	dyld3::BootArgs::setFlags(hexToUInt64(_simple_getenv(apple, "dyld_flags"), nullptr));

#if __has_feature(ptrauth_calls)
	// Check and see if kernel disabled JOP pointer signing (which lets us load plain arm64 binaries)
	if ( const char* disableStr = _simple_getenv(apple, "ptrauth_disabled") ) {
		if ( strcmp(disableStr, "1") == 0 )
			sKeysDisabled = true;
	}
	else {
		// needed until kernel passes ptrauth_disabled for arm64 main executables
		if ( (mainExecutableMH->cpusubtype == CPU_SUBTYPE_ARM64_V8) || (mainExecutableMH->cpusubtype == CPU_SUBTYPE_ARM64_ALL) )
			sKeysDisabled = true;
	}
#endif

    // Grab the cdHash of the main executable from the environment
	uint8_t mainExecutableCDHashBuffer[20];
	const uint8_t* mainExecutableCDHash = nullptr;
	if ( const char* mainExeCdHashStr = _simple_getenv(apple, "executable_cdhash") ) {
		unsigned bufferLenUsed;
		if ( hexStringToBytes(mainExeCdHashStr, mainExecutableCDHashBuffer, sizeof(mainExecutableCDHashBuffer), bufferLenUsed) )
			mainExecutableCDHash = mainExecutableCDHashBuffer;
	}

	getHostInfo(mainExecutableMH, mainExecutableSlide);

#if !TARGET_OS_SIMULATOR
	// Trace dyld's load
	notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
	// Trace the main executable's load
	notifyKernelAboutImage(mainExecutableMH, _simple_getenv(apple, "executable_file"));
#endif

	uintptr_t result = 0;
	sMainExecutableMachHeader = mainExecutableMH;
	sMainExecutableSlide = mainExecutableSlide;

...
	return result;
}
复制代码

代码比较长,我们抛去无用或非主流程代码,分析主要流程:

  1. 环境变量配置
    • 根据环境变量设置相应的值、获取当前运行架构
  2. 共享缓存
    • 检查是否开启了共享缓存,以及共享缓存是否映射到共享区域
  3. 主程序的初始化
    • 调用 instantiateFromLoadedImage 函数实例化了一个 ImageLoader 对象
  4. 插入动态库
    • 遍历 DYLD_INSERT_LIBRARIES 环境变量,调用 loadInsertedDylib 加载
  5. link 主程序
  6. link 动态库
  7. 弱符号绑定
  8. 执行初始化方法
  9. 寻找主程序入口,即 main 函数

图示如下:

1).dyld 环境变量

  • 从环境变量中获取主要可执行文件的cdHash
  • 获取 Mach-O 头文件中平台、架构等信息
  • 检查设置环境变量:checkEnvironmentVariables(envp)
  • DYLD_FALLBACK 为空时设置默认值:defaultUninitializedFallbackPaths(envp)

相关代码

// Line: 6366
// Grab the cdHash of the main executable from the environment
// 从环境中获取主要可执行文件的 cdHash
uint8_t mainExecutableCDHashBuffer[20];
const uint8_t* mainExecutableCDHash = nullptr;
if ( const char* mainExeCdHashStr = _simple_getenv(apple, "executable_cdhash") ) {
    unsigned bufferLenUsed;
    if ( hexStringToBytes(mainExeCdHashStr, mainExecutableCDHashBuffer, sizeof(mainExecutableCDHashBuffer), bufferLenUsed) )
        mainExecutableCDHash = mainExecutableCDHashBuffer;
}
// 从 Mach-O 头部获取当前运行环境架构信息
getHostInfo(mainExecutableMH, mainExecutableSlide);

// Line: 6453
CRSetCrashLogMessage("dyld: launch started");
// 根据可执行文件头部,参数等设置上下文
setContext(mainExecutableMH, argc, argv, envp, apple);

// Pickup the pointer to the exec path.
// 获取可执行文件路径
sExecPath = _simple_getenv(apple, "executable_path");

// Line: 6535
{
    checkEnvironmentVariables(envp);          // 检查设置环境变量
    defaultUninitializedFallbackPaths(envp);  // 在DYLD_FALLBACK为空时设置默认值
}
复制代码

可以通过在 Scheme 设置环境变量进行配置, 详见 dyld2.cpp 文件

dyld 环境变量

struct EnvironmentVariables {
	const char* const *			DYLD_FRAMEWORK_PATH;
	const char* const *			DYLD_FALLBACK_FRAMEWORK_PATH;
	const char* const *			DYLD_LIBRARY_PATH;
	const char* const *			DYLD_FALLBACK_LIBRARY_PATH;
	const char* const *			DYLD_INSERT_LIBRARIES;
	const char* const *			LD_LIBRARY_PATH;			// for unix conformance
	const char* const *			DYLD_VERSIONED_LIBRARY_PATH;
	const char* const *			DYLD_VERSIONED_FRAMEWORK_PATH;
	bool						DYLD_PRINT_LIBRARIES_POST_LAUNCH;
	bool						DYLD_BIND_AT_LAUNCH;
	bool						DYLD_PRINT_STATISTICS;
	bool						DYLD_PRINT_STATISTICS_DETAILS;
	bool						DYLD_PRINT_OPTS;
	bool						DYLD_PRINT_ENV;
	bool						DYLD_DISABLE_DOFS;
	bool						hasOverride;
    ...
};
复制代码

示例:

  • DYLD_PRINT_OPTS = YES
  • DYLD_PRINT_ENV = YES , 打印所有环境变量
  • OBJC_PRINT_LOAD_METHODS 打印 Class 及 Category 的 + (void)load 方法的调用信息
  • OBJC_PRINT_INITIALIZE_METHODS 打印 Class 的 + (void)initialize 的调用信息

2).共享缓存 SharedCache

App 可能会用到很多的系统动态库,如 UIKitFoundation 等都是系统动态库,在 APP 启动后,如果在需要相应动态库能力的时候才加载动态库,会比较耗时,因此系统已经提前将 iOS 用到的动态库放入了动态库缓存,将这个大的缓存文件放入到 iOS 系统目录(/System/Library/Caches/com.apple.dyld/)下,以提升应用启动的性能,这就是动态库缓存的作用。

从动态共享缓存抽取动态库
其实是有方法从动态共享缓存中抽取动态库的,可以使用 dyld 源码中的 launch-cache/dsc_extractor.cpp 进行抽取

  • #if 0 代码和 #endif 删掉
  • 编译 `dsc_extractor.cpp
clang++ -o  desc_extractor  desc_extractor.cpp
复制代码
  • 使用 desc_extractor
./desc_extractor 动态库共享缓存文件目录 存放结果文件夹
复制代码

代码中涉及共享缓存的有:

  • checkSharedRegionDisable 检查是否开启共享缓存(在iOS中必须开启)
  • mapSharedCache 加载共享缓存库
    • 仅加载到当前进程 mapCachePrivate(模拟器仅支持加载到当前进程)
    • 共享缓存是第一次被加载,就去做加载操作 mapCacheSystemWide
    • 共享缓存不是第一次被加载,那么就不做任何处理
mapSharedCache --> loadDyldCache --> mapCachePrivate
                                 └-> mapCacheSystemWide
复制代码

相关代码

    // Line: 6584
	// load shared cache
    // 检查共享缓存是否开启,iOS 为必须
	checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
	if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion ) {
#if TARGET_OS_SIMULATOR
		if ( sSharedCacheOverrideDir)
			mapSharedCache(mainExecutableSlide);
#else
        // 检查共享缓存是否映射到了共享区域
		mapSharedCache(mainExecutableSlide);
#endif
	}

// Line: 4078
static void mapSharedCache(uintptr_t mainExecutableSlide)
{
	dyld3::SharedCacheOptions opts;
	opts.cacheDirOverride	= sSharedCacheOverrideDir;
	opts.forcePrivate		= (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion);
#if __x86_64__ && !TARGET_OS_SIMULATOR
	opts.useHaswell			= sHaswell;
#else
	opts.useHaswell			= false;
#endif
	opts.verbose			= gLinkContext.verboseMapping;
    // <rdar://problem/32031197> respect -disable_aslr boot-arg
    // <rdar://problem/56299169> kern.bootargs is now blocked
	opts.disableASLR		= (mainExecutableSlide == 0) && dyld3::internalInstall(); // infer ASLR is off if main executable is not slid
	loadDyldCache(opts, &sSharedCacheLoadInfo);

	// update global state
	if ( sSharedCacheLoadInfo.loadAddress != nullptr ) {
		gLinkContext.dyldCache 								= sSharedCacheLoadInfo.loadAddress;
		dyld::gProcessInfo->processDetachedFromSharedRegion = opts.forcePrivate;
		dyld::gProcessInfo->sharedCacheSlide                = sSharedCacheLoadInfo.slide;
		dyld::gProcessInfo->sharedCacheBaseAddress          = (unsigned long)sSharedCacheLoadInfo.loadAddress;
		sSharedCacheLoadInfo.loadAddress->getUUID(dyld::gProcessInfo->sharedCacheUUID);
		dyld3::kdebug_trace_dyld_image(DBG_DYLD_UUID_SHARED_CACHE_A, sSharedCacheLoadInfo.path, (const uuid_t *)&dyld::gProcessInfo->sharedCacheUUID[0], {0,0}, {{ 0, 0 }}, (const mach_header *)sSharedCacheLoadInfo.loadAddress);
	}
}

// Line: 858
bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo* results)
{
    results->loadAddress        = 0;
    results->slide              = 0;
    results->errorMessage       = nullptr;

#if TARGET_OS_SIMULATOR
    // simulator only supports mmap()ing cache privately into process
    return mapCachePrivate(options, results);
#else
    if ( options.forcePrivate ) {
        // mmap cache into this process only   仅加载当前进程
        return mapCachePrivate(options, results);
    }
    else {
        // fast path: when cache is already mapped into shared region
        bool hasError = false;
        if ( reuseExistingCache(options, results) ) {
            hasError = (results->errorMessage != nullptr);      // 已经被加载过
        } else {
            // slow path: this is first process to load cache
            hasError = mapCacheSystemWide(options, results);    // 第一次加载
        }
        return hasError;
    }
#endif
}
复制代码

3).主程序初始化

  • 通过 instantiateFromLoadedImage 获得 ImageLoader
  • ImageLoaderMachO::instantiateMainExecutable 创建 ImageLoader(主程序)
  • sniffLoadCommands 函数会获取 Mach-O 文件的 Load Command 进行各种校验

相关代码

        // Line: 6860
		CRSetCrashLogMessage(sLoadingCrashMessage);
		// instantiate ImageLoader for main executable
        // 加载可执行文件,生成 ImageLoader 实例
		sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
		gLinkContext.mainExecutable = sMainExecutable;
		gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);

// Line: 3092
// The kernel maps in main executable before dyld gets control.  We need to 
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
//	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
//	}
	
//	throw "main executable not a known format";
}

// ImageLoaderMachO.cpp Line: 566
// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
	//dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n",
	//	sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
	bool compressed;
	unsigned int segCount;
	unsigned int libCount;
	const linkedit_data_command* codeSigCmd;
	const encryption_info_command* encryptCmd;
	sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
	// instantiate concrete class based on content of load commands
	if ( compressed ) 
		return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
	else
#if SUPPORT_CLASSIC_MACHO
		return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
		throw "missing LC_DYLD_INFO load command";
#endif
}
复制代码

4).插入动态库

这一步,会调用 loadInsertedDylib加载遍历到的库,可以进行安全攻防,loadInsertedDylib 内部会从 DYLD_ROOT_PATHLD_LIBRARY_PATHDYLD_FRAMEWORK_PATH 等路径查找 dylib 并且检查代码签名,无效则直接抛出异常。

相关代码

        // Line: 6974
		// load any inserted libraries
        // 加载所有 DYLD_INSERT_LIBRARIES 指定的库
		if	( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
			for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib) 
				loadInsertedDylib(*lib);
		}
		// record count of inserted libraries so that a flat search will look at 
		// inserted libraries, then main, then others.
		sInsertedDylibCount = sAllImages.size()-1;

// Line: 5176
static void loadInsertedDylib(const char* path)
{
	unsigned cacheIndex;
	try {
		LoadContext context;
		context.useSearchPaths		= false;
		context.useFallbackPaths	= false;
		context.useLdLibraryPath	= false;
		context.implicitRPath		= false;
		context.matchByInstallName	= false;
		context.dontLoad			= false;
		context.mustBeBundle		= false;
		context.mustBeDylib			= true;
		context.canBePIE			= false;
		context.origin				= NULL;	// can't use @loader_path with DYLD_INSERT_LIBRARIES
		context.rpath				= NULL;
		load(path, context, cacheIndex);
	}
	catch (const char* msg) {
		if ( gLinkContext.allowInsertFailures )
			dyld::log("dyld: warning: could not load inserted library '%s' into hardened process because %s\n", path, msg);
		else
			halt(dyld::mkstringf("could not load inserted library '%s' because %s\n", path, msg));
	}
	catch (...) {
		halt(dyld::mkstringf("could not load inserted library '%s'\n", path));
	}
}
复制代码

5).链接主程序

相关代码

        // Line: 6982
		// link main executable
        // 链接主程序
		gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
		if ( mainExcutableAlreadyRebased ) {
			// previous link() on main executable has already adjusted its internal pointers for ASLR
			// work around that by rebasing by inverse amount
			sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
		}
#endif
		link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
		sMainExecutable->setNeverUnloadRecursive();
		if ( sMainExecutable->forceFlat() ) {
			gLinkContext.bindFlat = true;
			gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
		}
复制代码

6).链接动态库

相关代码

        // Line: 6999
		// link any inserted libraries
		// do this after linking main executable so that any dylibs pulled in by inserted 
		// dylibs (e.g. libSystem) will not be in front of dylibs the program uses
        // 链接所有插入的动态库
		if ( sInsertedDylibCount > 0 ) {
			for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
				ImageLoader* image = sAllImages[i+1];
				link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
				image->setNeverUnloadRecursive();
			}
			if ( gLinkContext.allowInterposing ) {
				// only INSERTED libraries can interpose
				// register interposing info after all inserted libraries are bound so chaining works
				for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
					ImageLoader* image = sAllImages[i+1];
					image->registerInterposing(gLinkContext); // 注册符号插入
				}
			}
		}
复制代码

7).弱符号绑定

相关代码

        // Line: 7060
		// apply interposing to initial set of images
		for(int i=0; i < sImageRoots.size(); ++i) {
            // 应用符号插入
			sImageRoots[i]->applyInterposing(gLinkContext);
		}
		ImageLoader::applyInterposingToDyldCache(gLinkContext);

		// Bind and notify for the main executable now that interposing has been registered
		uint64_t bindMainExecutableStartTime = mach_absolute_time();
        // 注意:
		sMainExecutable->recursiveBindWithAccounting(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true);
		uint64_t bindMainExecutableEndTime = mach_absolute_time();
		ImageLoaderMachO::fgTotalBindTime += bindMainExecutableEndTime - bindMainExecutableStartTime;
		gLinkContext.notifyBatch(dyld_image_state_bound, false);

		// Bind and notify for the inserted images now interposing has been registered
		if ( sInsertedDylibCount > 0 ) {
			for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
				ImageLoader* image = sAllImages[i+1];
				image->recursiveBind(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true, nullptr);
			}
		}
		
		// <rdar://problem/12186933> do weak binding only after all inserted images linked
		// 弱符号绑定
		sMainExecutable->weakBind(gLinkContext);
		gLinkContext.linkingMainExecutable = false;

		sMainExecutable->recursiveMakeDataReadOnly(gLinkContext);
复制代码

8).执行初始化方法

相关代码

    // Line: 7087
		CRSetCrashLogMessage("dyld: launch, running initializers");
	#if SUPPORT_OLD_CRT_INITIALIZATION
		// Old way is to run initializers via a callback from crt1.o
		if ( ! gRunInitializersOldWay ) 
			initializeMainExecutable(); 
	#else
		// run all initializers
        // 执行初始化
		initializeMainExecutable(); 
	#endif

// Line: 1636
void initializeMainExecutable()
{
	// record that we've reached this step
	gLinkContext.startedInitializingMainExecutable = true;

	// run initialzers for any inserted dylibs
	ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
	initializerTimes[0].count = 0;
	const size_t rootCount = sImageRoots.size();
	if ( rootCount > 1 ) {
		for(size_t i=1; i < rootCount; ++i) {
			sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
		}
	}
	
	// run initializers for main executable and everything it brings up 
	sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
	
	// register cxa_atexit() handler to run static terminators in all loaded images when this process exits
	if ( gLibSystemHelpers != NULL ) 
		(*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL);

	// dump info if requested
	if ( sEnv.DYLD_PRINT_STATISTICS )
		ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
	if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
		ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}

// ImageLoader.cpp Line: 609
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time();
	mach_port_t thisThread = mach_thread_self();
	ImageLoader::UninitedUpwards up;
	up.count = 1;
	up.imagesAndPaths[0] = { this, this->getPath() };
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time();
	fgTotalInitTime += (t2 - t1);
}

// ImageLoader.cpp Line: 587
// <rdar://problem/14412057> upward dylib initializers can be run too soon
// To handle dangling dylibs which are upward linked but not downward, all upward linked dylibs
// have their initialization postponed until after the recursion through downward dylibs
// has completed.
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
									 InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
	uint32_t maxImageCount = context.imageCount()+2;
	ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
	ImageLoader::UninitedUpwards& ups = upsBuffer[0];
	ups.count = 0;
	// Calling recursive init on all images in images list, building a new list of
	// uninitialized upward dependencies.
	for (uintptr_t i=0; i < images.count; ++i) {
		images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
	}
	// If any upward dependencies remain, init them.
	if ( ups.count > 0 )
		processInitializers(context, thisThread, timingInfo, ups);
}

// ImageLoader.cpp Line: 1595
// 获取到镜像的初始化
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
										  InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
	recursive_lock lock_info(this_thread);
	recursiveSpinLock(lock_info);

	if ( fState < dyld_image_state_dependents_initialized-1 ) {
		uint8_t oldState = fState;
		// break cycles
		fState = dyld_image_state_dependents_initialized-1;
		try {
			// initialize lower level libraries first
			for(unsigned int i=0; i < libraryCount(); ++i) {
				ImageLoader* dependentImage = libImage(i);
				if ( dependentImage != NULL ) {
					// don't try to initialize stuff "above" me yet
					if ( libIsUpward(i) ) {
						uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) };
						uninitUps.count++;
					}
					else if ( dependentImage->fDepth >= fDepth ) {
						dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps);
					}
                }
			}
			
			// record termination order
			if ( this->needsTermination() )
				context.terminationRecorder(this);

			// let objc know we are about to initialize this image
			uint64_t t1 = mach_absolute_time();
			fState = dyld_image_state_dependents_initialized;
			oldState = fState;
			context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
			
			// initialize this image
			bool hasInitializers = this->doInitialization(context);

			// let anyone know we finished initializing this image
			fState = dyld_image_state_initialized;
			oldState = fState;
			context.notifySingle(dyld_image_state_initialized, this, NULL);
			
			if ( hasInitializers ) {
				uint64_t t2 = mach_absolute_time();
				timingInfo.addTime(this->getShortName(), t2-t1);
			}
		}
		catch (const char* msg) {
			// this image is not initialized
			fState = oldState;
			recursiveSpinUnLock();
			throw;
		}
	}
	
	recursiveSpinUnLock();
}
复制代码

notifySingle 函数

相关代码

// dyld2.cpp Line: 985
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{
	...
	if ( state == dyld_image_state_mapped ) {
		// <rdar://problem/7008875> Save load addr + UUID for images from outside the shared cache
		// <rdar://problem/50432671> Include UUIDs for shared cache dylibs in all image info when using private mapped shared caches
		if (!image->inSharedCache()
			|| (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion)) {
			dyld_uuid_info info;
			if ( image->getUUID(info.imageUUID) ) {
				info.imageLoadAddress = image->machHeader();
				addNonSharedCacheImageUUID(info);
			}
		}
	}
	if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
		uint64_t t0 = mach_absolute_time();
		dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
        // 注意这一句
		(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
		uint64_t t1 = mach_absolute_time();
		uint64_t t2 = mach_absolute_time();
		uint64_t timeInObjC = t1-t0;
		uint64_t emptyTime = (t2-t1)*100;
		if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
			timingInfo->addTime(image->getShortName(), timeInObjC);
		}
	}
       ...
}

// Line: 4643
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
	// record functions to call
	sNotifyObjCMapped	= mapped;
	sNotifyObjCInit		= init;     // 赋值操作
	sNotifyObjCUnmapped = unmapped;

	// call 'mapped' function with all images mapped so far
	try {
		notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true);
	}
	catch (const char* msg) {
		// ignore request to abort during registration
	}

	// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)
	for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(); it != sAllImages.end(); it++) {
		ImageLoader* image = *it;
		if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) {
			dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
			(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
		}
	}
}

// dyldAPIs.cpp line: 2188
// 这个函数只在运行时提供给objc使用
// dyld_objc_notify_register 的函数需要在 libobjc 源码中搜索
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
	dyld::registerObjCNotifiers(mapped, init, unmapped);   // 此处调用了
}
复制代码

objc4 源码中搜索 _dyld_objc_notify_register,发现在 _objc_init 中调用了该方法,并传入了参数。

所以 sNotifyObjCInit 的赋值的就是 objc 中的 load_images,而 load_images会调用所有的 +load 方法,notifySingle 是一个回调函数。

说明
初始化流程链路相对比较长,此处不过多赘述,我们将在下一小节重点聊。

9).主程序入口

dyld2.cpp 中关于程序入口的代码如下:

// Line: 7104
#if TARGET_OS_OSX
		if ( gLinkContext.driverKit ) {
			result = (uintptr_t)sEntryOverride;
			if ( result == 0 )
				halt("no entry point registered");
			*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
		}
		else
#endif
		{
			// find entry point for main executable
			result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
			if ( result != 0 ) {
				// main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
				if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
					*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
				else
					halt("libdyld.dylib support not present for LC_MAIN");
			}
			else {
				// main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
				result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();
				*startGlue = 0;
			}
		}
复制代码

那如何证明是在 load 方法和 C++ constructor 方法之后调用的呢?

最简单的方法就是断点啦。

可以看到当前断点在 load 方法中

当前backtrace

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 8.1
  * frame #0: 0x0000000100003e60 Dyld`+[Person load](self=Person, _cmd="load") at main.m:17:5
    frame #1: 0x00007fff203ab4d6 libobjc.A.dylib`load_images + 1556
    frame #2: 0x0000000100016527 dyld`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 425
    frame #3: 0x000000010002c794 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 474
    frame #4: 0x000000010002a55f dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 191
    frame #5: 0x000000010002a600 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82
    frame #6: 0x00000001000168b7 dyld`dyld::initializeMainExecutable() + 199
    frame #7: 0x000000010001ceb8 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 8702
    frame #8: 0x0000000100015224 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 450
    frame #9: 0x0000000100015025 dyld`_dyld_start + 37
复制代码

继续走一步,进入了 c++ __attribute__((constructor))void cc_func()

最后进入了 main()

当前backtrace

(ll(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 11.1
  * frame #0: 0x0000000100003eb6 Dyld`main(argc=1, argv=0x00007ffeefbff3e8) at main.m:27:22
    frame #1: 0x00007fff20528f3d libdyld.dylib`start + 1
    frame #2: 0x00007fff20528f3d libdyld.dylib`start + 1
复制代码

可见执行完前2步方法后,又回到了_dyld_start,然后调用main()函数。

三、初始化流程

上面我们已经对 App 的加载有了一个清楚的认识,但这是否是全部流程呢?

当然不是,前面我们已经挖了一个坑,现在我们来抽丝剥茧,搞清楚 App 的加载和初始化流程。

1.回顾

再回顾一下前面的断点的方式,一步一步来进行探索。

断点在 +load 方法 bt

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 8.1
    frame #0: 0x0000000100003e60 Dyld`+[Person load](self=Person, _cmd="load") at main.m:17:5
    frame #1: 0x00007fff203ab4d6 libobjc.A.dylib`load_images + 1556
    frame #2: 0x0000000100016527 dyld`dyld::notifySingle(dyld_image_states, ImageLoader const*, ImageLoader::InitializerTimingList*) + 425
    frame #3: 0x000000010002c794 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 474
    frame #4: 0x000000010002a55f dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 191
    frame #5: 0x000000010002a600 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 82
  * frame #6: 0x00000001000168b7 dyld`dyld::initializeMainExecutable() + 199
    frame #7: 0x000000010001ceb8 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 8702
    frame #8: 0x0000000100015224 dyld`dyldbootstrap::start(dyld3::MachOLoaded const*, int, char const**, dyld3::MachOLoaded const*, unsigned long*) + 450
    frame #9: 0x0000000100015025 dyld`_dyld_start + 37
复制代码

2.dyld start

根据堆栈信息,咱们从 dyld_start开始跟踪

通过汇编可以看到,接下来来到了 bootstrap::start
之后调用了 dyld::_main 方法

再之后,调用了 initializeMainExecutalbe()

之后调用了 ImageLoader::runInitializers

之后调用了 ImageLoader::processInit

内部调用了 ImageLoader::recursiveInitial

接着看到了 notifySingle

继续往上追,能看到,是在dyld registerObjcNoti 注册的

再往上,能看到 _dyld_objc_notify_register 内部调用了

到这里,线索断了,没有其他的信息了

回过头看到,这个地方漏掉了一个方法 ImageLoader::doInitialization

它内部如何实现的呢?

doInit 实现

图片[1]-从 dyld 看 iOS App 的启动加载过程-一一网

3.libSystem init

接着只能看看 libSystem 里面的调用了

可以看到,它内部调用了 Dispatch 的函数

4.dispatch init

在 Dispatch 库里面往往下追,libdispatch_init()

这里看到了调用 _objc_init

当我们下一个符号断点 objc_init时候,这下发现了新天地

这个方法调用的是 objc_objc_init

5.objc init

接着到 objc 中探索,发现了我们前面疑惑的函数 _dyld_objc_notify_register

可以看出,它就是在这里被调用的

啊哈
到这里,你会惊奇的发现,notifySingle 是一个回调函数

它将 load_images 作为第二个参数传入了,因此在执行完之后,就做了 load_images 的操作

看到 loadImages 里面的调用,就明白,为什么 Person load 方法被调用了吧。

到这里,你对 dyld 的加载和 应用程序的初始化过程就清楚很多了吧。

各个 lib 直接的关系如图:

接下来我们去分析下 objc_init 都做了什么。

© 版权声明
THE END
喜欢就支持一下吧
点赞0 分享