FirmAE extractor.py 固件提取源码解读

我们在拿到一个固件的时候,往往想要拿到一些固件的一些基础信息,比如固件的的架构、内核版本、能否提取文件系统之类的信息,FirmAE 编写 extractor 使用 binwalk 提取,之前自己写的提取方法太过简易,基本都被这个脚本包含了,现在结合源码看看提取方法是如何实现的。

  • 调用方法
usage: extractor.py [-h] [-sql  SQL] [-nf] [-nk] [-np] [-b BRAND] [-d]
                    input [output]

Extracts filesystem and kernel from Linux-based firmware images

positional arguments:
  input       Input file or directory
  output      Output directory for extracted firmware

optional arguments:
  -h, --help  show this help message and exit
  -sql  SQL   Hostname of SQL server
  -nf         Disable extraction of root filesystem (may decrease extraction
              time)
  -nk         Disable extraction of kernel (may decrease extraction time)
  -np         Disable parallel operation (may increase extraction time)
  -b BRAND    Brand of the firmware image
  -d          Print debug information
复制代码

1. 实现原理

1.1 Extractor.extract

首先是一个类 Extractor ,从固件中提取内核和文件系统,采用了线程池 multiprocessing.Pool() 支持并发提取多个固件。调用 Extractor 类的 extract 方法,完成并发提取的初始化工作。

def extract(self):
    """
    Perform extraction of firmware updates from input to tarballs in output
    directory using a thread pool.
    """
    # 输入支持目录或单个文件将所有文件都存储到 _list 中 
    if os.path.isdir(self._input):
        for path, _, files in os.walk(self._input):
            for item in files:
                self._list.append(os.path.join(path, item))
    elif os.path.isfile(self._input):
        self._list.append(self._input)
    # 创建输出文件夹
    if self.output_dir and not os.path.isdir(self.output_dir):
        os.makedirs(self.output_dir)

    if self._pool:
        # since we have to handle multiple files in one firmware image, it
        # is better to use chunk_size=1 
        # 对于大文件的处理设置 chunksize 设置较大会加速处理过程
        # chunk_size 设置为 1 的话回根据超时时间抛出超时异常 
        chunk_size = 1
        # 将迭代器展开,提取每一个文件,相当于按下多进程提取的开关
        list(self._pool.imap_unordered(self._extract_item, self._list, chunk_size))
    else:
        for item in self._list:
            self._extract_item(item)
复制代码

而对于每一个文件的提取使用 self._extract_item 单独处理,内部调用了 ExtractionItem 类进行提取。

def _extract_item(self, path):
    """
    Wrapper function that creates an ExtractionItem and calls the extract()
    method.
    """
    ExtractionItem(self, path, 0, None, self.debug).extract()
复制代码

1.2 ExtractionItem.extract

ExtractionItem 的 extract 函数的实现逻辑:

  1. 一开始是退出条件:

    • 检查提取状态(有中断信号或全部提取完成就退出)
    • 是否超出递归范围,递归提取,自行指定迭代宽度和深度。
  2. 检查固件的校验和(md5)是否在 visited set 中,跳过相同校验和的文件,避免重复提取.

# check if checksum is in visited set
self.printf(">> MD5: %s" % self.checksum)
with Extractor.visited_lock:
    # Skip the same checksum only in the same status
    # asus_latest(FW_RT_N12VP_30043804057.zip) firmware
    if (self.checksum in self.extractor.visited and
            self.extractor.visited[self.checksum] == self.status):
        self.printf(">> Skipping: %s..." % self.checksum)
        return self.get_status()
    else:
        self.extractor.visited[self.checksum] = self.status
复制代码
  1. 检查文件类型是否在黑名单中,在黑名单中就不提取

首先,使用MIME-type排除大型文件类别,为以下任意一种类型

if filetype:
    if any(s in filetype for s in ["application/x-executable",
                                    "application/x-dosexec",
                                    "application/x-object",
                                    "application/x-sharedlib",
                                    "application/pdf",
                                    "application/msword",
                                    "image/", "text/", "video/"]):
        self.printf(">> Skipping: %s..." % filetype)
        return True
复制代码

接下来,检查具有mime类型的特定文件类型

filetype = Extractor.magic(real_path.encode("utf-8", "surrogateescape"))
if filetype:
    if any(s in filetype for s in ["executable", "universal binary",
                                    "relocatable", "bytecode", "applet",
                                    "shared"]):
        self.printf(">> Skipping: %s..." % filetype)
        return True
复制代码

最后,检查可能被错误识别的特定文件扩展名

black_lists = ['.dmg', '.so', '.so.0']
for black in black_lists:
    if self.item.endswith(black):
        self.printf(">> Skipping: %s..." % (self.item))
        return True
复制代码
  1. 开始真正的提取流程

建立一个临时文件夹

self.temp = tempfile.mkdtemp()
# Move to temporary directory so binwalk does not write to input
os.chdir(self.temp)
复制代码

使用 binwalk 提取,主要还是使用搜索关键词的方式进行提取。

try:
    self.printf(">> Tag: %s" % self.tag)
    self.printf(">> Temp: %s" % self.temp)
    self.printf(">> Status: Kernel: %s, Rootfs: %s, Do_Kernel: %s, \
        Do_Rootfs: %s" % (self.get_kernel_status(),
                            self.get_rootfs_status(),
                            self.extractor.do_kernel,
                            self.extractor.do_rootfs))

    for module in binwalk.scan(self.item, "-e", "-r", "-C", self.temp,
                                signature=True, quiet=True):
        prev_entry = None
        for entry in module.results:
            desc = entry.description
            dir_name = module.extractor.directory

            if prev_entry and prev_entry.description == desc and \
                    'Zlib comparessed data' in desc:
                continue
            prev_entry = entry

            self.printf('========== Depth: %d ===============' % self.depth)
            self.printf("Name: %s" % self.item)
            self.printf("Desc: %s" % desc)
            self.printf("Directory: %s" % dir_name)

            self._check_firmware(module, entry)

            if not self.get_rootfs_status():
                self._check_rootfs(module, entry)

            if not self.get_kernel_status():
                self._check_kernel(module, entry)

            if self.update_status():
                self.printf(">> Skipping: completed!")
                return True
            else:
                self._check_recursive(module, entry)


except Exception:
    print ("ERROR: ", self.item)
    traceback.print_exc()
复制代码
  • 提取逻辑
    • 对于每一 entry,描述信息和上一entry 相同则跳过,描述信息中有 Zlib comparessed data 就跳过了

    • 调用 _check_firmware,如果是已知的固件类型,直接提取之。

      • 描述信息中是否有 header 关键字
        • 是否有 uImage header 关键词,有则根据 size 关键词后大小提取(针对 uImage)
        • 是否有 rootfs offsetkernel offset 关键词,有则根据 size 关键词后大小提取内核和文件系统(针对TP-Link or TRX)
    • def _check_rootfs(self, module, entry)

      • 描述信息中是否有 filesystemarchivecompressed 关键词,有则判断提取结果中是否出现 unix 文件系统的目录名称

        UNIX_DIRS = ["bin", "etc", "dev", "home", "lib", "mnt", "opt", "root",
                         "run", "sbin", "tmp", "usr", "var"]
        UNIX_THRESHOLD = 4
        复制代码
    • def _check_kernel(self, module, entry)

      • 描述信息中是否有 kernel ,有则提取内核版本
    • 检查是否提取结束,没有则迭代提取,直到提取结束。提取的结果存储到 postgresql 数据库中。

2. 实例

以提取固件 DIR859Ax_FW105b03.bin 为例子,分析一下 提取过程。

python3 sources/extractor/extractor.py -sql 127.0.0.1 -d ./firmwares/DIR859Ax_FW105b03.bin  ./testext2
复制代码

输出日志

cuc@cuc-VirtualBox:~/workspace/FirmAE$ python3 sources/extractor/extractor.py -sql 127.0.0.1 -d ./firmwares/DIR859Ax_FW105b03.bin  ./testext2
>> Database Image ID: 2

/home/cuc/workspace/FirmAE/firmwares/DIR859Ax_FW105b03.bin
>> MD5: f0398570673fcc633d35dcbb672b3792
>> Tag: 2
>> Temp: /tmp/tmpt7x2ygb0
>> Status: Kernel: False, Rootfs: False, Do_Kernel: True,                 Do_Rootfs: True
========== Depth: 0 ===============
Name: /home/cuc/workspace/FirmAE/firmwares/DIR859Ax_FW105b03.bin
Desc: DLOB firmware header, boot partition: "dev=/dev/mtdblock/1" # 固件头
Directory: /tmp/tmpt7x2ygb0
========== Depth: 0 ===============
Name: /home/cuc/workspace/FirmAE/firmwares/DIR859Ax_FW105b03.bin
Desc: LZMA compressed data, properties: 0x5D, dictionary size: 33554432 bytes, uncompressed size: 3650048 bytes
Directory: /tmp/tmpt7x2ygb0
>>>> Found Linux filesystem in /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/squashfs-root! # 发现文件系统
>> Recursing into LZMA compressed data, properties: 0x5D, dictionary size: 33554432 bytes, uncompressed size: 3650048 bytes ...

/tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
        >> MD5: ee431f37e5fb18459f3b9e9554e02505
        >> Tag: 2
        >> Temp: /tmp/tmp2bvlhs7m
        >> Status: Kernel: False, Rootfs: True, Do_Kernel: True,                 Do_Rootfs: True
        ========== Depth: 1 ===============
        Name: /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
        Desc: Certificate in DER format (x509 v3), header length: 4, sequence length: 30 # 证书
        Directory: /tmp/tmp2bvlhs7m
        ========== Depth: 1 ===============
        Name: /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
        Desc: Certificate in DER format (x509 v3), header length: 4, sequence length: 30  # 证书
        Directory: /tmp/tmp2bvlhs7m
        ========== Depth: 1 ===============
        Name: /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
        Desc: Certificate in DER format (x509 v3), header length: 4, sequence length: 30  # 证书
        Directory: /tmp/tmp2bvlhs7m
        ========== Depth: 1 ===============
        Name: /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
        Desc: Linux kernel version 2.6.31 # 内核 2.6 
        Directory: /tmp/tmp2bvlhs7m
        >>>> Linux kernel version 2.6.31
        >> Skipping: completed!
        >> Cleaning up /tmp/tmp2bvlhs7m...
========== Depth: 0 ===============
Name: /home/cuc/workspace/FirmAE/firmwares/DIR859Ax_FW105b03.bin
Desc: PackImg section delimiter tag, little endian size: 8421120 bytes; big endian size: 8355840 bytes
Directory: /tmp/tmpt7x2ygb0
>> Skipping: completed!
>> Cleaning up /tmp/tmpt7x2ygb0...
复制代码

能看到内核和文件系统提取出来了

ext_res.png

© 版权声明
THE END
喜欢就支持一下吧
点赞0 分享