iOS 底层探索12——dyld加载流程

前言

之前我们通过源码阅读调试的方式分析了objc类的结构、方法查找、消息机制的流程,但这一切都是建立在dyld已经把相关信息加载好的前提下,本文就通过dyld的源码来探索dyld是如何做到这些的;

  1. 代码-》macho;
  2. 加载到内存;
  3. objc_init->objc;

应用程序加载的原理

动态库 && 静态库

  1. 每个程序都伊拉很多基础库,uikit corefoundation libsysterm
  2. 库:可执行的二进制文件,能够被操作系统加载进内存中,包括:静态库(.a .lib)、动态库(.so .dll .framework .dylb)、
  3. 动态和静态的区别在于链接的区别在于静态链接和动态链接;

image.png
4. 静态库是按照顺序依次加载,可能会存在重复的情况,会浪费空间和加载时间;如图会B和D两个静态库会进行两次链接;
image.png

  1. 动态库会根据情况进行共享同一份动态库,会对内存空间进行优化;

image.png

  1. 动态库的优势:减少包的体积大小、共享内存、大多数热更新也是基于动态库进行实施的;但当前系统下真正可以实现共享内存的只有系统级别的动态库,开发者自己的动态库是无法真正共享的;
  2. 运行mac 工程的可执行文件,可以看到控制台可以直接打印相关的信息

image.png

从main到dyld

  1. 在main函数之前打断点,会发现main函数之前执行了libdyld.dylib 中的start方法;但是通过添加start符号断的方式发现无法断点成功,说明start可能不是真正的符号信息,需要使用其他方式下断点;

image.png
2. 我们尝试在main函数之前的类中的load方法中添加断点,查看到了相关的信息;
image.png
3. 由此可见dyld是在app启动时负责将macho格式的文件装载进内存的库,不管是动态库、静态库还是其他macho文件都是通过dyld加载到内存中的;

dyld介绍

  1. dyld(the dynamic link editor)是苹果的动态链接器,是苹果操作系统一个重要组成部分,app启动过程中,在系统内核做好程序准备工作之后,交由dyld负责余下的工作。
  2. objc在初始化的过程中会通过dyld加载所有库;

image.png

dyld 加载过程分析

  1. 通过查看dyld源码可以看到dyld_start是由汇编代码编写的,根据不同架构做了不同的判断,最终都执行到了dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
#if __arm__   //arm架构下dyld_star的入口
	.text
	.align 2
__dyld_start:
	mov	r8, sp		// save stack pointer
	sub	sp, #16		// make room for outgoing parameters
	bic     sp, sp, #15	// force 16-byte alignment

	// call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
        //当前调用dyldbootstrap::start
	ldr	r0, [r8]	// r0 = mach_header
	ldr	r1, [r8, #4]	// r1 = argc
	add	r2, r8, #8	// r2 = argv
	adr	r3, __dyld_start
	sub	r3 ,r3, #0x1000 // r3 = dyld_mh
	add	r4, sp, #12
	str	r4, [sp, #0]	// [sp] = &startGlue

	bl	__ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm
	ldr	r5, [sp, #12]
	cmp	r5, #0
	bne	Lnew

	// traditional case, clean up stack and jump to result
	add	sp, r8, #4	// remove the mach_header argument.
	bx	r0		// jump to the program's entry point

	// LC_MAIN case, set up stack for call to main()
复制代码
  1. dyldbootstrap命名空间下找到start()方法,对start进行分析发现最后执行的是dyld::main()函数;发现dyldbootstrap::start()函数中做了很多dyld初始化相关的工作,包括:
  • rebaseDyld() 对dyld进行rebae,添加aslr。
  • mach_init() mach消息初始化。
  • __guard_setup() 栈溢出保护。

下面是dyldbootstrap::`start()的代码分析

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
				const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{

    // Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536>
    dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0);

	// if kernel had to slide dyld, we need to fix up load sensitive locations
	// we have to do this before using any global variables
    //对dyld进行rebae,添加aslr
    rebaseDyld(dyldsMachHeader);
    

	// kernel sets up env pointer to be just past end of agv array
	const char** envp = &argv[argc+1];
	
	// kernel sets up apple pointer to be just past end of envp array
	const char** apple = envp;
	while(*apple != NULL) { ++apple; }
	++apple;

	// set up random value for stack canary
    // 堆栈溢出保护
	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	// run all C++ initializers inside dyld
	runDyldInitializers(argc, argv, envp, apple);
#endif

	_subsystem_init(apple);

	// now that we are done bootstrapping dyld, call dyld's main
    //获取当前应用的asly
	uintptr_t appsSlide = appsMachHeader->getSlide();
    //进入dyld的核心方法main()函数
	return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
复制代码
  1. 进入dyld::main()函数,发现代码量很大正向查找很不方便,此时我们采用反推的方法找到result赋值的地方,最终发现result的赋值和sMainExecutable有关系:result->sMainExecutable,又找到sMainExecutable赋值的地方在sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);

下面是精简版dyld::main()函数

//
// Entry point for dyld.  The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
        int argc, const char* argv[], const char* envp[], const char* apple[], 
        uintptr_t* startGlue)
{
    // Grab the cdHash of the main executable from the environment
    // 第一步,设置运行环境
    uint8_t mainExecutableCDHashBuffer[20];
    const uint8_t* mainExecutableCDHash = nullptr;
    if ( hexToBytes(_simple_getenv(apple, "executable_cdhash"), 40, mainExecutableCDHashBuffer) )
        // 获取主程序的hash
        mainExecutableCDHash = mainExecutableCDHashBuffer;
    // Trace dyld's load
    notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
#if !TARGET_IPHONE_SIMULATOR
    // Trace the main executable's load
    notifyKernelAboutImage(mainExecutableMH, _simple_getenv(apple, "executable_file"));
#endif
    uintptr_t result = 0;
    // 获取主程序的macho_header结构
    sMainExecutableMachHeader = mainExecutableMH;
    // 获取主程序的slide值
    sMainExecutableSlide = mainExecutableSlide;
    CRSetCrashLogMessage("dyld: launch started");
    // 设置上下文信息
    setContext(mainExecutableMH, argc, argv, envp, apple);
    // Pickup the pointer to the exec path.
    // 获取主程序路径
    sExecPath = _simple_getenv(apple, "executable_path");
    // <rdar://problem/13868260> Remove interim apple[0] transition code from dyld
    if (!sExecPath) sExecPath = apple[0];
    if ( sExecPath[0] != '/' ) {
        // have relative path, use cwd to make absolute
        char cwdbuff[MAXPATHLEN];
        if ( getcwd(cwdbuff, MAXPATHLEN) != NULL ) {
            // maybe use static buffer to avoid calling malloc so early...
            char* s = new char[strlen(cwdbuff) + strlen(sExecPath) + 2];
            strcpy(s, cwdbuff);
            strcat(s, "/");
            strcat(s, sExecPath);
            sExecPath = s;
        }
    }
    // Remember short name of process for later logging
    // 获取进程名称
    sExecShortName = ::strrchr(sExecPath, '/');
    if ( sExecShortName != NULL )
        ++sExecShortName;
    else
        sExecShortName = sExecPath;
    
    // 配置进程受限模式
    configureProcessRestrictions(mainExecutableMH);
    // 检测环境变量
    checkEnvironmentVariables(envp);
    defaultUninitializedFallbackPaths(envp);
    // 如果设置了DYLD_PRINT_OPTS则调用printOptions()打印参数
    if ( sEnv.DYLD_PRINT_OPTS )
        printOptions(argv);
    // 如果设置了DYLD_PRINT_ENV则调用printEnvironmentVariables()打印环境变量
    if ( sEnv.DYLD_PRINT_ENV ) 
        printEnvironmentVariables(envp);
    // 获取当前程序架构
    getHostInfo(mainExecutableMH, mainExecutableSlide);
    //-------------第一步结束-------------
    
    // load shared cache
    // 第二步,加载共享缓存
    // 检查共享缓存是否开启,iOS必须开启
    checkSharedRegionDisable((mach_header*)mainExecutableMH);
    if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion ) {
        mapSharedCache();
    }
    ...
    try {
        // add dyld itself to UUID list
        addDyldImageToUUIDList();
        // instantiate ImageLoader for main executable
        // 第三步 实例化主程序
        sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
        gLinkContext.mainExecutable = sMainExecutable;
        gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
        // Now that shared cache is loaded, setup an versioned dylib overrides
    #if SUPPORT_VERSIONED_PATHS
        checkVersionedPaths();
    #endif
        // dyld_all_image_infos image list does not contain dyld
        // add it as dyldPath field in dyld_all_image_infos
        // for simulator, dyld_sim is in image list, need host dyld added
#if TARGET_IPHONE_SIMULATOR
        // get path of host dyld from table of syscall vectors in host dyld
        void* addressInDyld = gSyscallHelpers;
#else
        // get path of dyld itself
        void*  addressInDyld = (void*)&__dso_handle;
#endif
        char dyldPathBuffer[MAXPATHLEN+1];
        int len = proc_regionfilename(getpid(), (uint64_t)(long)addressInDyld, dyldPathBuffer, MAXPATHLEN);
        if ( len > 0 ) {
            dyldPathBuffer[len] = '\0'; // proc_regionfilename() does not zero terminate returned string
            if ( strcmp(dyldPathBuffer, gProcessInfo->dyldPath) != 0 )
                gProcessInfo->dyldPath = strdup(dyldPathBuffer);
        }
        // load any inserted libraries
        // 第四步 加载插入的动态库
        if  ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
            for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib)
                loadInsertedDylib(*lib);
        }
        // record count of inserted libraries so that a flat search will look at 
        // inserted libraries, then main, then others.
        // 记录插入的动态库数量
        sInsertedDylibCount = sAllImages.size()-1;
        // link main executable
        // 第五步 链接主程序
        gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
        if ( mainExcutableAlreadyRebased ) {
            // previous link() on main executable has already adjusted its internal pointers for ASLR
            // work around that by rebasing by inverse amount
            sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
        }
#endif
        link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
        sMainExecutable->setNeverUnloadRecursive();
        if ( sMainExecutable->forceFlat() ) {
            gLinkContext.bindFlat = true;
            gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
        }
        // link any inserted libraries
        // do this after linking main executable so that any dylibs pulled in by inserted 
        // dylibs (e.g. libSystem) will not be in front of dylibs the program uses
        // 第六步 链接插入的动态库
        if ( sInsertedDylibCount > 0 ) {
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1];
                link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
                image->setNeverUnloadRecursive();
            }
            // only INSERTED libraries can interpose
            // register interposing info after all inserted libraries are bound so chaining works
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1];
                image->registerInterposing();
            }
        }
        // <rdar://problem/19315404> dyld should support interposition even without DYLD_INSERT_LIBRARIES
        for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) {
            ImageLoader* image = sAllImages[i];
            if ( image->inSharedCache() )
                continue;
            image->registerInterposing();
        }
        ...
        // apply interposing to initial set of images
        for(int i=0; i < sImageRoots.size(); ++i) {
            sImageRoots[i]->applyInterposing(gLinkContext);
        }
        gLinkContext.linkingMainExecutable = false;
        
        // <rdar://problem/12186933> do weak binding only after all inserted images linked
        // 第七步 执行弱符号绑定
        sMainExecutable->weakBind(gLinkContext);
        // If cache has branch island dylibs, tell debugger about them
        if ( (sSharedCacheLoadInfo.loadAddress != NULL) && (sSharedCacheLoadInfo.loadAddress->header.mappingOffset >= 0x78) && (sSharedCacheLoadInfo.loadAddress->header.branchPoolsOffset != 0) ) {
            uint32_t count = sSharedCacheLoadInfo.loadAddress->header.branchPoolsCount;
            dyld_image_info info[count];
            const uint64_t* poolAddress = (uint64_t*)((char*)sSharedCacheLoadInfo.loadAddress + sSharedCacheLoadInfo.loadAddress->header.branchPoolsOffset);
            // <rdar://problem/20799203> empty branch pools can be in development cache
            if ( ((mach_header*)poolAddress)->magic == sMainExecutableMachHeader->magic ) {
                for (int poolIndex=0; poolIndex < count; ++poolIndex) {
                    uint64_t poolAddr = poolAddress[poolIndex] + sSharedCacheLoadInfo.slide;
                    info[poolIndex].imageLoadAddress = (mach_header*)(long)poolAddr;
                    info[poolIndex].imageFilePath = "dyld_shared_cache_branch_islands";
                    info[poolIndex].imageFileModDate = 0;
                }
                // add to all_images list
                addImagesToAllImages(count, info);
                // tell gdb about new branch island images
                gProcessInfo->notification(dyld_image_adding, count, info);
            }
        }
        CRSetCrashLogMessage("dyld: launch, running initializers");
        ...
        // run all initializers
        // 第八步 执行初始化方法
        initializeMainExecutable(); 
        // notify any montoring proccesses that this process is about to enter main()
        dyld3::kdebug_trace_dyld_signpost(DBG_DYLD_SIGNPOST_START_MAIN_DYLD2, 0, 0);
        notifyMonitoringDyldMain();
        // find entry point for main executable
        // 第九步 查找入口点并返回
        result = (uintptr_t)sMainExecutable->getThreadPC();
        if ( result != 0 ) {
            // main executable uses LC_MAIN, needs to return to glue in libdyld.dylib
            if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
                *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
            else
                halt("libdyld.dylib support not present for LC_MAIN");
        }
        else {
            // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
            result = (uintptr_t)sMainExecutable->getMain();
            *startGlue = 0;
        }
    }
    catch(const char* message) {
        syncAllImages();
        halt(message);
    }
    catch(...) {
        dyld::log("dyld: launch failed\n");
    }
    ...
    
    return result;
}
复制代码

initializeMainExecutable()相关逻辑

  1. 在objc源码中_objc_init()断点调试,发现最终是通过libdispatch调用的相关objc的objc_init方法方法;

  2. libdispatch_init是通过libsysterm_initalizer方法调用的;

  3. libsysterm_initalizer方法是由dyld的ImageLoaderMacho::domodInitFunctions调起的:

  4. 初始化镜像文件的方法:doInitialization();

load、C++构造函数、main三者的加载顺序;

  1. 先加载三方库镜像,最后再加载主程序镜像;
  2. 同一个镜像内,先加载load、后加载C++构造函数;
  3. 最后加载main函数();

load方法详解;

  1. load_images中会调用所有的load方法,但在调用之前需要把所有的load方法都加载的load方法的数组中;2. 查找load方法并添加的时候,除了查找当前类的load方法,还会不断向上查找父类的load方法添加到load方法数组;

dyld加载流程

未完待续!!!

参考资料

© 版权声明
THE END
喜欢就支持一下吧
点赞0 分享