我们在拿到一个固件的时候,往往想要拿到一些固件的一些基础信息,比如固件的的架构、内核版本、能否提取文件系统之类的信息,FirmAE 编写 extractor 使用 binwalk 提取,之前自己写的提取方法太过简易,基本都被这个脚本包含了,现在结合源码看看提取方法是如何实现的。
- 调用方法
usage: extractor.py [-h] [-sql SQL] [-nf] [-nk] [-np] [-b BRAND] [-d]
input [output]
Extracts filesystem and kernel from Linux-based firmware images
positional arguments:
input Input file or directory
output Output directory for extracted firmware
optional arguments:
-h, --help show this help message and exit
-sql SQL Hostname of SQL server
-nf Disable extraction of root filesystem (may decrease extraction
time)
-nk Disable extraction of kernel (may decrease extraction time)
-np Disable parallel operation (may increase extraction time)
-b BRAND Brand of the firmware image
-d Print debug information
复制代码
1. 实现原理
1.1 Extractor.extract
首先是一个类 Extractor ,从固件中提取内核和文件系统,采用了线程池 multiprocessing.Pool()
支持并发提取多个固件。调用 Extractor 类的 extract 方法,完成并发提取的初始化工作。
def extract(self):
"""
Perform extraction of firmware updates from input to tarballs in output
directory using a thread pool.
"""
# 输入支持目录或单个文件将所有文件都存储到 _list 中
if os.path.isdir(self._input):
for path, _, files in os.walk(self._input):
for item in files:
self._list.append(os.path.join(path, item))
elif os.path.isfile(self._input):
self._list.append(self._input)
# 创建输出文件夹
if self.output_dir and not os.path.isdir(self.output_dir):
os.makedirs(self.output_dir)
if self._pool:
# since we have to handle multiple files in one firmware image, it
# is better to use chunk_size=1
# 对于大文件的处理设置 chunksize 设置较大会加速处理过程
# chunk_size 设置为 1 的话回根据超时时间抛出超时异常
chunk_size = 1
# 将迭代器展开,提取每一个文件,相当于按下多进程提取的开关
list(self._pool.imap_unordered(self._extract_item, self._list, chunk_size))
else:
for item in self._list:
self._extract_item(item)
复制代码
而对于每一个文件的提取使用 self._extract_item 单独处理,内部调用了 ExtractionItem 类进行提取。
def _extract_item(self, path):
"""
Wrapper function that creates an ExtractionItem and calls the extract()
method.
"""
ExtractionItem(self, path, 0, None, self.debug).extract()
复制代码
1.2 ExtractionItem.extract
ExtractionItem 的 extract 函数的实现逻辑:
-
一开始是退出条件:
- 检查提取状态(有中断信号或全部提取完成就退出)
- 是否超出递归范围,递归提取,自行指定迭代宽度和深度。
-
检查固件的校验和(md5)是否在
visited set
中,跳过相同校验和的文件,避免重复提取.
# check if checksum is in visited set
self.printf(">> MD5: %s" % self.checksum)
with Extractor.visited_lock:
# Skip the same checksum only in the same status
# asus_latest(FW_RT_N12VP_30043804057.zip) firmware
if (self.checksum in self.extractor.visited and
self.extractor.visited[self.checksum] == self.status):
self.printf(">> Skipping: %s..." % self.checksum)
return self.get_status()
else:
self.extractor.visited[self.checksum] = self.status
复制代码
- 检查文件类型是否在黑名单中,在黑名单中就不提取
首先,使用MIME-type排除大型文件类别,为以下任意一种类型
if filetype:
if any(s in filetype for s in ["application/x-executable",
"application/x-dosexec",
"application/x-object",
"application/x-sharedlib",
"application/pdf",
"application/msword",
"image/", "text/", "video/"]):
self.printf(">> Skipping: %s..." % filetype)
return True
复制代码
接下来,检查具有mime类型的特定文件类型
filetype = Extractor.magic(real_path.encode("utf-8", "surrogateescape"))
if filetype:
if any(s in filetype for s in ["executable", "universal binary",
"relocatable", "bytecode", "applet",
"shared"]):
self.printf(">> Skipping: %s..." % filetype)
return True
复制代码
最后,检查可能被错误识别的特定文件扩展名
black_lists = ['.dmg', '.so', '.so.0']
for black in black_lists:
if self.item.endswith(black):
self.printf(">> Skipping: %s..." % (self.item))
return True
复制代码
- 开始真正的提取流程
建立一个临时文件夹
self.temp = tempfile.mkdtemp()
# Move to temporary directory so binwalk does not write to input
os.chdir(self.temp)
复制代码
使用 binwalk 提取,主要还是使用搜索关键词的方式进行提取。
try:
self.printf(">> Tag: %s" % self.tag)
self.printf(">> Temp: %s" % self.temp)
self.printf(">> Status: Kernel: %s, Rootfs: %s, Do_Kernel: %s, \
Do_Rootfs: %s" % (self.get_kernel_status(),
self.get_rootfs_status(),
self.extractor.do_kernel,
self.extractor.do_rootfs))
for module in binwalk.scan(self.item, "-e", "-r", "-C", self.temp,
signature=True, quiet=True):
prev_entry = None
for entry in module.results:
desc = entry.description
dir_name = module.extractor.directory
if prev_entry and prev_entry.description == desc and \
'Zlib comparessed data' in desc:
continue
prev_entry = entry
self.printf('========== Depth: %d ===============' % self.depth)
self.printf("Name: %s" % self.item)
self.printf("Desc: %s" % desc)
self.printf("Directory: %s" % dir_name)
self._check_firmware(module, entry)
if not self.get_rootfs_status():
self._check_rootfs(module, entry)
if not self.get_kernel_status():
self._check_kernel(module, entry)
if self.update_status():
self.printf(">> Skipping: completed!")
return True
else:
self._check_recursive(module, entry)
except Exception:
print ("ERROR: ", self.item)
traceback.print_exc()
复制代码
- 提取逻辑
-
对于每一 entry,描述信息和上一entry 相同则跳过,描述信息中有 Zlib comparessed data 就跳过了
-
调用 _check_firmware,如果是已知的固件类型,直接提取之。
- 描述信息中是否有
header
关键字- 是否有
uImage header
关键词,有则根据 size 关键词后大小提取(针对 uImage) - 是否有
rootfs offset
或kernel offset
关键词,有则根据 size 关键词后大小提取内核和文件系统(针对TP-Link or TRX)
- 是否有
- 描述信息中是否有
-
def _check_rootfs(self, module, entry)
-
描述信息中是否有
filesystem
、archive
、compressed
关键词,有则判断提取结果中是否出现 unix 文件系统的目录名称UNIX_DIRS = ["bin", "etc", "dev", "home", "lib", "mnt", "opt", "root", "run", "sbin", "tmp", "usr", "var"] UNIX_THRESHOLD = 4 复制代码
-
-
def _check_kernel(self, module, entry)
- 描述信息中是否有
kernel
,有则提取内核版本
- 描述信息中是否有
-
检查是否提取结束,没有则迭代提取,直到提取结束。提取的结果存储到 postgresql 数据库中。
-
2. 实例
以提取固件 DIR859Ax_FW105b03.bin 为例子,分析一下 提取过程。
python3 sources/extractor/extractor.py -sql 127.0.0.1 -d ./firmwares/DIR859Ax_FW105b03.bin ./testext2
复制代码
输出日志
cuc@cuc-VirtualBox:~/workspace/FirmAE$ python3 sources/extractor/extractor.py -sql 127.0.0.1 -d ./firmwares/DIR859Ax_FW105b03.bin ./testext2
>> Database Image ID: 2
/home/cuc/workspace/FirmAE/firmwares/DIR859Ax_FW105b03.bin
>> MD5: f0398570673fcc633d35dcbb672b3792
>> Tag: 2
>> Temp: /tmp/tmpt7x2ygb0
>> Status: Kernel: False, Rootfs: False, Do_Kernel: True, Do_Rootfs: True
========== Depth: 0 ===============
Name: /home/cuc/workspace/FirmAE/firmwares/DIR859Ax_FW105b03.bin
Desc: DLOB firmware header, boot partition: "dev=/dev/mtdblock/1" # 固件头
Directory: /tmp/tmpt7x2ygb0
========== Depth: 0 ===============
Name: /home/cuc/workspace/FirmAE/firmwares/DIR859Ax_FW105b03.bin
Desc: LZMA compressed data, properties: 0x5D, dictionary size: 33554432 bytes, uncompressed size: 3650048 bytes
Directory: /tmp/tmpt7x2ygb0
>>>> Found Linux filesystem in /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/squashfs-root! # 发现文件系统
>> Recursing into LZMA compressed data, properties: 0x5D, dictionary size: 33554432 bytes, uncompressed size: 3650048 bytes ...
/tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
>> MD5: ee431f37e5fb18459f3b9e9554e02505
>> Tag: 2
>> Temp: /tmp/tmp2bvlhs7m
>> Status: Kernel: False, Rootfs: True, Do_Kernel: True, Do_Rootfs: True
========== Depth: 1 ===============
Name: /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
Desc: Certificate in DER format (x509 v3), header length: 4, sequence length: 30 # 证书
Directory: /tmp/tmp2bvlhs7m
========== Depth: 1 ===============
Name: /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
Desc: Certificate in DER format (x509 v3), header length: 4, sequence length: 30 # 证书
Directory: /tmp/tmp2bvlhs7m
========== Depth: 1 ===============
Name: /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
Desc: Certificate in DER format (x509 v3), header length: 4, sequence length: 30 # 证书
Directory: /tmp/tmp2bvlhs7m
========== Depth: 1 ===============
Name: /tmp/tmpt7x2ygb0/_DIR859Ax_FW105b03.bin.extracted/74
Desc: Linux kernel version 2.6.31 # 内核 2.6
Directory: /tmp/tmp2bvlhs7m
>>>> Linux kernel version 2.6.31
>> Skipping: completed!
>> Cleaning up /tmp/tmp2bvlhs7m...
========== Depth: 0 ===============
Name: /home/cuc/workspace/FirmAE/firmwares/DIR859Ax_FW105b03.bin
Desc: PackImg section delimiter tag, little endian size: 8421120 bytes; big endian size: 8355840 bytes
Directory: /tmp/tmpt7x2ygb0
>> Skipping: completed!
>> Cleaning up /tmp/tmpt7x2ygb0...
复制代码
能看到内核和文件系统提取出来了