dk.exe自动填报程序的反编译

dk.exe用于学校每日健康报的自动填写。

image

显而易见这是一个对Python进行打包生成的简易应用程序。最初取得时仅有.exe文件。现基于该.exe文件尝试取得源码。

现有工具

现有工具有pyinstxtractor.py、archive_viewer.py和uncompyle6等。前两个单文件脚本用于直接处理.exe文件,拆解出库文件、.pyc或类似.pyc的文件;uncompyle6一键反编,理论上支持所有Python版本。

pyinstxtractor.py实现代码:

#!/usr/bin/python

"""
PyInstaller Extractor v1.8 (Supports pyinstaller 3.2, 3.1, 3.0, 2.1, 2.0)
Author : Extreme Coders
E-mail : extremecoders(at)hotmail(dot)com
Web : https://0xec.blogspot.com
Date : 28-April-2017
Url : https://sourceforge.net/projects/pyinstallerextractor/
For any suggestions, leave a comment on
https://forum.tuts4you.com/topic/34455-pyinstaller-extractor/
This script extracts a pyinstaller generated executable file.
Pyinstaller installation is not needed. The script has it all.
For best results, it is recommended to run this script in the
same version of python as was used to create the executable.
This is just to prevent unmarshalling errors(if any) while
extracting the PYZ archive.
Usage : Just copy this script to the directory where your exe resides
and run the script with the exe file name as a parameter
C:\path\to\exe\>python pyinstxtractor.py <filename>
$ /path/to/exe/python pyinstxtractor.py <filename>
Licensed under GNU General Public License (GPL) v3.
You are free to modify this source.
CHANGELOG
================================================
Version 1.1 (Jan 28, 2014)
-------------------------------------------------
- First Release
- Supports only pyinstaller 2.0
Version 1.2 (Sept 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 2.1 and 3.0 dev
- Cleaned up code
- Script is now more verbose
- Executable extracted within a dedicated sub-directory
(Support for pyinstaller 3.0 dev is experimental)
Version 1.3 (Dec 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 3.0 final
- Script is compatible with both python 2.x & 3.x (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)
Version 1.4 (Jan 19, 2016)
-------------------------------------------------
- Fixed a bug when writing pyc files >= version 3.3 (Thanks to Daniello Alto: https://github.com/Djamana)
Version 1.5 (March 1, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.1 (Thanks to Berwyn Hoyt for reporting)
Version 1.6 (Sept 5, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.2
- Extractor will use a random name while extracting unnamed files.
- For encrypted pyz archives it will dump the contents as is. Previously, the tool would fail.
Version 1.7 (March 13, 2017)
-------------------------------------------------
- Made the script compatible with python 2.6 (Thanks to Ross for reporting)
Version 1.8 (April 28, 2017)
-------------------------------------------------
- Support for sub-directories in .pyz files (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)
""" """
Author: In Ming Loh
Email: inming.loh@countercept.com
Changes have been made to Version 1.8 (April 28, 2017).
CHANGELOG
================================================
- Function extractFiles(self, custom_dir=None) has been modfied to allow custom output directory.
""" import os
import struct
import marshal
import zlib
import sys
import imp
import types
from uuid import uuid4 as uniquename class CTOCEntry:
def __init__(self, position, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name):
self.position = position
self.cmprsdDataSize = cmprsdDataSize
self.uncmprsdDataSize = uncmprsdDataSize
self.cmprsFlag = cmprsFlag
self.typeCmprsData = typeCmprsData
self.name = name class PyInstArchive:
PYINST20_COOKIE_SIZE = 24 # For pyinstaller 2.0
PYINST21_COOKIE_SIZE = 24 + 64 # For pyinstaller 2.1+
MAGIC = b'MEI\014\013\012\013\016' # Magic number which identifies pyinstaller def __init__(self, path):
self.filePath = path def open(self):
try:
self.fPtr = open(self.filePath, 'rb')
self.fileSize = os.stat(self.filePath).st_size
except:
print('[*] Error: Could not open {0}'.format(self.filePath))
return False
return True def close(self):
try:
self.fPtr.close()
except:
pass def checkFile(self):
print('[*] Processing {0}'.format(self.filePath))
# Check if it is a 2.0 archive
self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET)
magicFromFile = self.fPtr.read(len(self.MAGIC)) if magicFromFile == self.MAGIC:
self.pyinstVer = 20 # pyinstaller 2.0
print('[*] Pyinstaller version: 2.0')
return True # Check for pyinstaller 2.1+ before bailing out
self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET)
magicFromFile = self.fPtr.read(len(self.MAGIC)) if magicFromFile == self.MAGIC:
print('[*] Pyinstaller version: 2.1+')
self.pyinstVer = 21 # pyinstaller 2.1+
return True print('[*] Error : Unsupported pyinstaller version or not a pyinstaller archive')
return False def getCArchiveInfo(self):
try:
if self.pyinstVer == 20:
self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET) # Read CArchive cookie
(magic, lengthofPackage, toc, tocLen, self.pyver) = \
struct.unpack('!8siiii', self.fPtr.read(self.PYINST20_COOKIE_SIZE)) elif self.pyinstVer == 21:
self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET) # Read CArchive cookie
(magic, lengthofPackage, toc, tocLen, self.pyver, pylibname) = \
struct.unpack('!8siiii64s', self.fPtr.read(self.PYINST21_COOKIE_SIZE)) except:
print('[*] Error : The file is not a pyinstaller archive')
return False print('[*] Python version: {0}'.format(self.pyver)) # Overlay is the data appended at the end of the PE
self.overlaySize = lengthofPackage
self.overlayPos = self.fileSize - self.overlaySize
self.tableOfContentsPos = self.overlayPos + toc
self.tableOfContentsSize = tocLen print('[*] Length of package: {0} bytes'.format(self.overlaySize))
return True def parseTOC(self):
# Go to the table of contents
self.fPtr.seek(self.tableOfContentsPos, os.SEEK_SET) self.tocList = []
parsedLen = 0 # Parse table of contents
while parsedLen < self.tableOfContentsSize:
(entrySize, ) = struct.unpack('!i', self.fPtr.read(4))
nameLen = struct.calcsize('!iiiiBc') (entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name) = \
struct.unpack( \
'!iiiBc{0}s'.format(entrySize - nameLen), \
self.fPtr.read(entrySize - 4)) name = name.decode('utf-8').rstrip('\0')
if len(name) == 0:
name = str(uniquename())
print('[!] Warning: Found an unamed file in CArchive. Using random name {0}'.format(name)) self.tocList.append( \
CTOCEntry( \
self.overlayPos + entryPos, \
cmprsdDataSize, \
uncmprsdDataSize, \
cmprsFlag, \
typeCmprsData, \
name \
)) parsedLen += entrySize
print('[*] Found {0} files in CArchive'.format(len(self.tocList))) def extractFiles(self, custom_dir=None):
print('[*] Beginning extraction...please standby')
if custom_dir is None:
extractionDir = os.path.join(os.getcwd(), os.path.basename(self.filePath) + '_extracted') if not os.path.exists(extractionDir):
os.mkdir(extractionDir) os.chdir(extractionDir)
else:
if not os.path.exists(custom_dir):
os.makedirs(custom_dir)
os.chdir(custom_dir) for entry in self.tocList:
basePath = os.path.dirname(entry.name)
if basePath != '':
# Check if path exists, create if not
if not os.path.exists(basePath):
os.makedirs(basePath) self.fPtr.seek(entry.position, os.SEEK_SET)
data = self.fPtr.read(entry.cmprsdDataSize) if entry.cmprsFlag == 1:
data = zlib.decompress(data)
# Malware may tamper with the uncompressed size
# Comment out the assertion in such a case
assert len(data) == entry.uncmprsdDataSize # Sanity Check with open(entry.name, 'wb') as f:
f.write(data) if entry.typeCmprsData == b'z':
self._extractPyz(entry.name) def _extractPyz(self, name):
dirName = name + '_extracted'
# Create a directory for the contents of the pyz
if not os.path.exists(dirName):
os.mkdir(dirName) with open(name, 'rb') as f:
pyzMagic = f.read(4)
assert pyzMagic == b'PYZ\0' # Sanity Check pycHeader = f.read(4) # Python magic value if imp.get_magic() != pycHeader:
print('[!] Warning: The script is running in a different python version than the one used to build the executable')
print(' Run this script in Python{0} to prevent extraction errors(if any) during unmarshalling'.format(self.pyver)) (tocPosition, ) = struct.unpack('!i', f.read(4))
f.seek(tocPosition, os.SEEK_SET) try:
toc = marshal.load(f)
except:
print('[!] Unmarshalling FAILED. Cannot extract {0}. Extracting remaining files.'.format(name))
return print('[*] Found {0} files in PYZ archive'.format(len(toc))) # From pyinstaller 3.1+ toc is a list of tuples
if type(toc) == list:
toc = dict(toc) for key in toc.keys():
(ispkg, pos, length) = toc[key]
f.seek(pos, os.SEEK_SET) fileName = key
try:
# for Python > 3.3 some keys are bytes object some are str object
fileName = key.decode('utf-8')
except:
pass # Make sure destination directory exists, ensuring we keep inside dirName
destName = os.path.join(dirName, fileName.replace("..", "__"))
destDirName = os.path.dirname(destName)
if not os.path.exists(destDirName):
os.makedirs(destDirName) try:
data = f.read(length)
data = zlib.decompress(data)
except:
print('[!] Error: Failed to decompress {0}, probably encrypted. Extracting as is.'.format(fileName))
open(destName + '.pyc.encrypted', 'wb').write(data)
continue with open(destName + '.pyc', 'wb') as pycFile:
pycFile.write(pycHeader) # Write pyc magic
pycFile.write(b'\0' * 4) # Write timestamp
if self.pyver >= 33:
pycFile.write(b'\0' * 4) # Size parameter added in Python 3.3
pycFile.write(data) def main():
if len(sys.argv) < 2:
print('[*] Usage: pyinstxtractor.py <filename>') else:
arch = PyInstArchive(sys.argv[1])
if arch.open():
if arch.checkFile():
if arch.getCArchiveInfo():
arch.parseTOC()
arch.extractFiles()
arch.close()
print('[*] Successfully extracted pyinstaller archive: {0}'.format(sys.argv[1]))
print('')
print('You can now use a python decompiler on the pyc files within the extracted directory')
return arch.close() if __name__ == '__main__':
main()

​ archive_viewer.py实现代码:

#-----------------------------------------------------------------------------
# Copyright (c) 2013-2021, PyInstaller Development Team.
#
# Distributed under the terms of the GNU General Public License (version 2
# or later) with exception for distributing the bootloader.
#
# The full license is in the file COPYING.txt, distributed with this software.
#
# SPDX-License-Identifier: (GPL-2.0-or-later WITH Bootloader-exception)
#----------------------------------------------------------------------------- """
Viewer for archives packaged by archive.py
""" import argparse
import os
import pprint
import sys
import tempfile
import zlib from PyInstaller.loader import pyimod02_archive
from PyInstaller.archive.readers import CArchiveReader, NotAnArchiveError
from PyInstaller.compat import stdin_input
import PyInstaller.log stack = []
cleanup = [] def main(name, brief, debug, rec_debug, **unused_options): global stack if not os.path.isfile(name):
print(name, "is an invalid file name!", file=sys.stderr)
return 1 arch = get_archive(name)
stack.append((name, arch))
if debug or brief:
show_log(arch, rec_debug, brief)
raise SystemExit(0)
else:
show(name, arch) while 1:
try:
toks = stdin_input('? ').split(None, 1)
except EOFError:
# Ctrl-D
print(file=sys.stderr) # Clear line.
break
if not toks:
usage()
continue
if len(toks) == 1:
cmd = toks[0]
arg = ''
else:
cmd, arg = toks
cmd = cmd.upper()
if cmd == 'U':
if len(stack) > 1:
arch = stack[-1][1]
arch.lib.close()
del stack[-1]
name, arch = stack[-1]
show(name, arch)
elif cmd == 'O':
if not arg:
arg = stdin_input('open name? ')
arg = arg.strip()
try:
arch = get_archive(arg)
except NotAnArchiveError as e:
print(e, file=sys.stderr)
continue
if arch is None:
print(arg, "not found", file=sys.stderr)
continue
stack.append((arg, arch))
show(arg, arch)
elif cmd == 'X':
if not arg:
arg = stdin_input('extract name? ')
arg = arg.strip()
data = get_data(arg, arch)
if data is None:
print("Not found", file=sys.stderr)
continue
filename = stdin_input('to filename? ')
if not filename:
print(repr(data))
else:
with open(filename, 'wb') as fp:
fp.write(data)
elif cmd == 'Q':
break
else:
usage()
do_cleanup() def do_cleanup():
global stack, cleanup
for (name, arch) in stack:
arch.lib.close()
stack = []
for filename in cleanup:
try:
os.remove(filename)
except Exception as e:
print("couldn't delete", filename, e.args, file=sys.stderr)
cleanup = [] def usage():
print("U: go Up one level", file=sys.stderr)
print("O <name>: open embedded archive name", file=sys.stderr)
print("X <name>: extract name", file=sys.stderr)
print("Q: quit", file=sys.stderr) def get_archive(name):
if not stack:
if name[-4:].lower() == '.pyz':
return ZlibArchive(name)
return CArchiveReader(name)
parent = stack[-1][1]
try:
return parent.openEmbedded(name)
except KeyError:
return None
except (ValueError, RuntimeError):
ndx = parent.toc.find(name)
dpos, dlen, ulen, flag, typcd, name = parent.toc[ndx]
x, data = parent.extract(ndx)
tempfilename = tempfile.mktemp()
cleanup.append(tempfilename)
with open(tempfilename, 'wb') as fp:
fp.write(data)
if typcd == 'z':
return ZlibArchive(tempfilename)
else:
return CArchiveReader(tempfilename) def get_data(name, arch):
if isinstance(arch.toc, dict):
(ispkg, pos, length) = arch.toc.get(name, (0, None, 0))
if pos is None:
return None
with arch.lib:
arch.lib.seek(arch.start + pos)
return zlib.decompress(arch.lib.read(length))
ndx = arch.toc.find(name)
dpos, dlen, ulen, flag, typcd, name = arch.toc[ndx]
x, data = arch.extract(ndx)
return data def show(name, arch):
if isinstance(arch.toc, dict):
print(" Name: (ispkg, pos, len)")
toc = arch.toc
else:
print(" pos, length, uncompressed, iscompressed, type, name")
toc = arch.toc.data
pprint.pprint(toc) def get_content(arch, recursive, brief, output):
if isinstance(arch.toc, dict):
toc = arch.toc
if brief:
for name, _ in toc.items():
output.append(name)
else:
output.append(toc)
else:
toc = arch.toc.data
for el in toc:
if brief:
output.append(el[5])
else:
output.append(el)
if recursive:
if el[4] in ('z', 'a'):
get_content(get_archive(el[5]), recursive, brief, output)
stack.pop() def show_log(arch, recursive, brief):
output = []
get_content(arch, recursive, brief, output)
# first print all TOCs
for out in output:
if isinstance(out, dict):
pprint.pprint(out)
# then print the other entries
pprint.pprint([out for out in output if not isinstance(out, dict)]) def get_archive_content(filename):
"""
Get a list of the (recursive) content of archive `filename`.
This function is primary meant to be used by runtests.
"""
archive = get_archive(filename)
stack.append((filename, archive))
output = []
get_content(archive, recursive=True, brief=True, output=output)
do_cleanup()
return output class ZlibArchive(pyimod02_archive.ZlibArchiveReader): def checkmagic(self):
""" Overridable.
Check to see if the file object self.lib actually has a file
we understand.
"""
self.lib.seek(self.start) # default - magic is at start of file.
if self.lib.read(len(self.MAGIC)) != self.MAGIC:
raise RuntimeError("%s is not a valid %s archive file"
% (self.path, self.__class__.__name__))
if self.lib.read(len(self.pymagic)) != self.pymagic:
print("Warning: pyz is from a different Python version",
file=sys.stderr)
self.lib.read(4) def run():
parser = argparse.ArgumentParser()
parser.add_argument('-l', '--log',
default=False,
action='store_true',
dest='debug',
help='Print an archive log (default: %(default)s)')
parser.add_argument('-r', '--recursive',
default=False,
action='store_true',
dest='rec_debug',
help='Recursively print an archive log (default: %(default)s). '
'Can be combined with -r')
parser.add_argument('-b', '--brief',
default=False,
action='store_true',
dest='brief',
help='Print only file name. (default: %(default)s). '
'Can be combined with -r')
PyInstaller.log.__add_options(parser)
parser.add_argument('name', metavar='pyi_archive',
help="pyinstaller archive to show content of") args = parser.parse_args()
PyInstaller.log.__process_options(parser, args) try:
raise SystemExit(main(**vars(args)))
except KeyboardInterrupt:
raise SystemExit("Aborted by user request.") if __name__ == '__main__':
run()

取得.pyc文件

当拥有Python源码时,直接使用Python自带的py_compile模块即可将源码.py文件编译成.pyc文件,这对保护源码有一定作用;但在上述工具帮助下,将.pyc或者.exe还原成.py的难度大大降低。

使用cmd命令python pyinstxtractor.py dk.exe获得初代解压文件。

image

执行完成后即得dk.exe_extracted文件夹。使用该工具的期望结果是得到可用的.pyc文件,但实际上文件夹中并不包含期望结果,有大量文件为库文件,没有利用价值;值得关注的是几个无后缀名文件。

image

pyinstxtractor.py功能存在瑕疵,未能得到正确格式,需要手动进行修复。显然直接修改dk文件名(手动加上后缀.pyc)并不能直接用于反编,因为后缀名并不是本质错误;至关重要的是确定该dk文件头的幻数(magic number),而不同版本的Python拥有不同的幻数。

因为是从二进制层面操作.pyc文件,有必要对其文件结构进行剖析

image

image

struct文件包含了正确的.pyc文件重要的头信息,dk文件则是缺乏头信息的部分.pyc文件,只需将struct的第1行添加到dk文件即可。

image

如此即可取得正确的dk_true.pyc文件。

取得源代码文件

使用uncompyle6即可。使用cmd命令uncompyle6 -o dk_true.py dk_true.pyc即得源代码文件。

image

附源代码:

import tkinter as tk, tkinter.messagebox, pickle, requests, re, json
session = requests.Session() def gui():
window = tk.Tk()
window.title('便捷化打卡系统')
screenWidth = window.winfo_screenwidth()
screenHeight = window.winfo_screenheight()
width = 300
height = 200
left = (screenWidth - width) / 2
top = (screenHeight - height) / 2
window.geometry('%dx%d+%d+%d' % (width, height, left, top))
tk.Label(window, text='学号', font=('Arial', 14)).place(x=10, y=10)
tk.Label(window, text='密码', font=('Arial', 14)).place(x=10, y=80)
var_usr_name = tk.StringVar()
entry_usr_name = tk.Entry(window, textvariable=var_usr_name, font=('Arial', 14))
entry_usr_name.place(x=60, y=10)
var_usr_pwd = tk.StringVar()
entry_usr_pwd = tk.Entry(window, textvariable=var_usr_pwd, font=('Arial', 14), show='*')
entry_usr_pwd.place(x=60, y=80) def usr_login():
global password
global username
username = var_usr_name.get()
password = var_usr_pwd.get()
r = check()
if r['m'] == '操作成功':
json_data = post()
if '今天已经填报了' in json_data['m']:
tkinter.messagebox.showinfo(title='打卡系统', message='已经填报过了噢!')
elif '操作成功' in json_data['m']:
tkinter.messagebox.showerror(title='打卡系统', message='今日填报成功!')
else:
tkinter.messagebox.showerror(title='打卡系统', message='账户/密码有误') btn_login = tk.Button(window, text='打卡', command=usr_login)
btn_login.place(x=110, y=150)
window.mainloop() def check():
url = 'https://wfw.scu.edu.cn/a_scu/api/sso/check'
data = {'username':username,
'password':password,
'redirect':'https://wfw.scu.edu.cn/ncov/wap/default/index'}
header = {'Referer':'https://wfw.scu.edu.cn/site/polymerization/polymerizationLogin?redirect=https%3A%2F%2Fwfw.scu.edu.cn%2Fncov%2Fwap%2Fdefault%2Findex&from=wap',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3754.400 QQBrowser/10.5.4034.400',
'Host':'wfw.scu.edu.cn',
'Origin':'https://wfw.scu.edu.cn'}
r = session.post(url, data=data, headers=header, timeout=3).json()
return r def data_get() -> dict:
url_for_id = 'https://wfw.scu.edu.cn/ncov/wap/default/index'
header = {'Referer':'https://wfw.scu.edu.cn/site/polymerization/polymerizationLogin?redirect=https%3A%2F%2Fwfw.scu.edu.cn%2Fncov%2Fwap%2Fdefault%2Findex&from=wap',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3754.400 QQBrowser/10.5.4034.400',
'Host':'wfw.scu.edu.cn',
'Origin':'https://wfw.scu.edu.cn'}
r2 = session.get(url_for_id, headers=header).text
x = re.findall('.*?oldInfo: (.*),.*?', r2)
data = eval(x[0])
return data def post() -> json:
headers = {'Host':'wfw.scu.edu.cn',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3754.400 QQBrowser/10.5.4034.400',
'Accept':'application/json,text/javascript,*/*;q=0.01',
'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Accept-Encoding':'gzip,deflate,br',
'Content-Type':'application/x-www-form-urlencoded;',
'X-Requested-With':'XMLHttpRequest',
'Content-Length':'2082',
'Origin':'https://wfw.scu.edu.cn',
'Connection':'keep-alive',
'Referer':'https://wfw.scu.edu.cn/ncov/wap/default/index'}
data = data_get()
r1 = session.post('https://wfw.scu.edu.cn/ncov/wap/default/save', headers=headers, data=data)
return r1.json() if __name__ == '__main__':
gui()

扩展:.pyc文件结构

.pyc文件是Python编译过程中产生的中间过程文件。.pyc是二进制的,可以直接被Python虚拟机执行。显然.pyc文件对于实现Python的编译和反编译都尤为重要。

Python代码的编译结果就是PyCodeObject对象。PyCodeObject对象可以由虚拟机加载后直接运行,而.pyc文件就是PyCodeObject对象在硬盘上保存的形式。

.pyc即是PyCodeObject和头部信息的组合,包含了:

  • 4个字节的幻数(magic number)
  • 12个字节的源代码文件信息(因版本而异)
  • 序列化后的PyCodeObject对象

幻数(magic number)

.pyc这一格式最值得注意的就是每个版本的Python生成的.pyc文件拥有不同的幻数。以Python 2.7为例,前两个字节以小端存储形式写入,后加上“rn”形成四个字节的.pyc文件幻数,可以表示为:MAGIC_NUMBER = (62211).to_bytes(2, 'little') + b'rn'

Python 2.7生成的.pyc文件前32个字节(前4个字节为03f3 0d0a):

00000000: 03f3 0d0a b9c7 895e 6300 0000 0000 0000
00000010: 0003 0000 0040 0000 0073 1f00 0000 6400

源代码文件信息

这一部分的信息在不同Python版本之间差异较大。在Python 2系列中,这一部分只有4个字节包含信息,为源代码的修改时间(Unix Timestamp),精确到秒,以小端存储形式写入。如:(1586087865).to_bytes(4, 'little').hex()-> b9c7 895e

后续版本如Python 3.5和Python 3.6,在时间后又增加了4个有效字节用于表示源代码文件大小,单位为字节,以小端存储形式写入。如源码文件大小为87字节,则文件信息部分需要写入5700 0000,与前面的修改时间一同存储,即为b9c7 895e 5700 0000。Python 3.6生成的.pyc文件的前32个字节:

00000000: 330d 0d0a b9c7 895e 5700 0000 e300 0000
00000010: 0000 0000 0000 0000 0003 0000 0040 0000

从Python 3.7开始,支持hash-based .pyc文件。即Python不仅支持校验时间戳(timestamp)来判断文件是否被修改,也支持校验hash值。Python为了支持hash校验需要使源代码信息部分增加4个有效字节,故现版本源代码信息部分总共需要使用12个字节。但hash校验机制默认是不启用的(可以通过调用py_compile模块的compile函数传入参数invalidation_mode=PycInvalidationMode.CHECKED_HASH启用)。不启用时前4个字节为0000 0000,后8个字节为与先前版本(Python 3.6等)一样的源码文件修改时间和大小;启用时前四个字节为0100 0000或者0300 0000,后8个字节为源码文件的hash值。

PyCodeObject对象

PyCodeObject实际上是定义在Python源码Include/code.h中的结构体,结构体中的数据通过Python的marshal模块序列化后存储到.pyc文件中。不同版本的PyCodeObject内容并不一样,这导致了不同版本间的Python产生的.pyc文件不能完全通用。

mashal模块中实现了一些基本的Python对象(PyObject)的序列化,一个PyObject序列化时首先会写入一个字节表示这是一个什么类型的PyObject,不同类型的PyObject对应的类型如下(PyCodeObject对应的就是TYPE_CODE,写入第1个字节就是63):

// Python/marshal.c
// ......
#define TYPE_NULL '0'
#define TYPE_NONE 'N'
#define TYPE_FALSE 'F'
#define TYPE_TRUE 'T'
#define TYPE_STOPITER 'S'
#define TYPE_ELLIPSIS '.'
#define TYPE_INT 'i'
/* TYPE_INT64 is not generated anymore.
Supported for backward compatibility only. */
#define TYPE_INT64 'I'
#define TYPE_FLOAT 'f'
#define TYPE_BINARY_FLOAT 'g'
#define TYPE_COMPLEX 'x'
#define TYPE_BINARY_COMPLEX 'y'
#define TYPE_LONG 'l'
#define TYPE_STRING 's'
#define TYPE_INTERNED 't'
#define TYPE_REF 'r'
#define TYPE_TUPLE '('
#define TYPE_LIST '['
#define TYPE_DICT '{'
#define TYPE_CODE 'c'
#define TYPE_UNICODE 'u'
#define TYPE_UNKNOWN '?'
#define TYPE_SET '<'
#define TYPE_FROZENSET '>'
#define FLAG_REF 'x80' /* with a type, add obj to index */ // 以下都是Python3.5之后支持的
#define TYPE_ASCII 'a'
#define TYPE_ASCII_INTERNED 'A'
#define TYPE_SMALL_TUPLE ')'
#define TYPE_SHORT_ASCII 'z'
#define TYPE_SHORT_ASCII_INTERNED 'Z'
// ......

Python 3.7生成的.pyc文件前32个字节为:

00000000: 420d 0d0a 0000 0000 b9c7 895e 5700 0000
00000010: e300 0000 0000 0000 0000 0000 0003 0000

可知第17个字节(PyCodeObject的第1个字节)是0xe3,这是因为PyObject对象的第1个字节还可以包含一个flag(#define FLAG_REF 'x80'),即第1个字节为0x63|0x80 -> 0xe3。FLAG_REF表示将这个对象加入引用列表,当下次出现这个对象时就不需要再次进行序列化,直接使用TYPE_REF取这个对象即可,这可以视作Python序列化的一种优化。

一般情况下PyCodeObject对象具有如下的属性和数据类型:

/* Bytecode object */
typedef struct {
PyObject_HEAD
int co_argcount; /* #arguments, except *args */
int co_posonlyargcount; /* #positional only arguments */
int co_kwonlyargcount; /* #keyword only arguments */
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_..., see below */
int co_firstlineno; /* first source line number */
PyObject *co_code; /* instruction opcodes */
PyObject *co_consts; /* list (constants used) */
PyObject *co_names; /* list of strings (names used) */
PyObject *co_varnames; /* tuple of strings (local variable names) */
PyObject *co_freevars; /* tuple of strings (free variable names) */
PyObject *co_cellvars; /* tuple of strings (cell variable names) */
/* The rest aren't used in either hash or comparisons, except for co_name,
used in both. This is done to preserve the name and line number
for tracebacks and debuggers; otherwise, constant de-duplication
would collapse identical functions/lambdas defined on different lines.
*/
Py_ssize_t *co_cell2arg; /* Maps cell vars which are arguments. */
PyObject *co_filename; /* unicode (where it was loaded from) */
PyObject *co_name; /* unicode (name, for reference) */
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */
// ......
}PyCodeObject;

每个属性在虚拟机执行.pyc文件时都有其作用,但并非要求全部写入.pyc文件。marshal序列化PyCodeObject的实现部分:

// ......
else if (PyCode_Check(v)) {
PyCodeObject *co = (PyCodeObject *)v;
W_TYPE(TYPE_CODE, p);
w_long(co->co_argcount, p);
w_long(co->co_kwonlyargcount, p);
w_long(co->co_nlocals, p);
w_long(co->co_stacksize, p);
w_long(co->co_flags, p);
w_object(co->co_code, p);
w_object(co->co_consts, p);
w_object(co->co_names, p);
w_object(co->co_varnames, p);
w_object(co->co_freevars, p);
w_object(co->co_cellvars, p);
w_object(co->co_filename, p);
w_object(co->co_name, p);
w_long(co->co_firstlineno, p);
w_object(co->co_lnotab, p);
}
// ......

Python使用marshal.dump的方法将PyCodeObject对象转化为对应的二进制文件结构。每个字段在二进制文件中的结构如下所示:

TYPE_CODE byte 表示这是一个PyCodeObject
co_argcount long 对应PyCodeObject结构体里的各个域
co_nlocals long
co_stacksize long
co_flags long
TYPE_STRING byte 字符串的表示方法,对应PyCodeObject的co_code
co_code size long
co_code value bytes
TYPE_LIST byte 这是一个列表
co_consts size long 列表co_consts的元素个数
TYPE_INT byte co_consts[0]是一个整型
co_consts[0] long
TYPE_STRING byte co_consts[1]是一个字符串
co_consts[1] size long
co_consts[1] value bytes
TYPE_CODE byte co_consts[2]又是一个PyCodeObject对象,它对应的代码可能是一个函数或类
co_consts[2]
...

其中,byte表示仅占用1个字节,long表示占用4个字节,bytes表示该字段占用1个或者多个字节。值得注意的是,PyCodeObject对象中每个属性及其值都会按照一定的顺序表示在二进制文件中。

PyCodeObject中的co_code

Python的opcode决定了程序的执行流程,这被作为TYPE_STRING类型的PyObject存到了PyCodeObject的co_code中。

Python 3.7的opcode序列:

00000000: 420d 0d0a 0000 0000 b9c7 895e 5700 0000
00000010: e300 0000 0000 0000 0000 0000 0003 0000
00000020: 0040 0000 0073 1e00 0000 6500 6400 8301
00000030: 0100 6401 6402 8400 5a01 6501 6403 6404
00000040: 8302 0100 6405 5300 2906 7a0c 4865 6c6c
00000050: 6f2c 2077 6f72 6c64 6302 0000 0000 0000

offset 0x2a-0x47即为序列化后的opcode序列(6500 6400直到6405 5300)。第25个字节0x73表示TYPE_STRING,第26-29个字节表示对象的长度,1e00 0000就是小端存储形式的30。

opcode

Python的源码Include/opcode.h中定义了一系列的opcode。其中,以HAVE_ARGUMENT为界限,凡是大于HAVE_ARGUMENT的opcode都是有且仅有1个参数的,凡是小于HAVE_ARGUMENT的opcode都是没有参数的。

CPython implementation detail: Bytecode is an implementation detail of the CPython interpreter. No guarantees are made that bytecode will not be added, removed, or changed between versions of Python. Use of this module should not be considered to work across Python VMs or Python releases.

Changed in version 3.6: Use 2 bytes for each instruction. Previously the number of bytes varied by instruction.

Python不保证不同版本之间的opcode兼容性,这也是Python各个版本之间.pyc不兼容的一个原因。

从Python 3.6开始,有一个较大的改变,就是无论opcode有无参数,每一条指令的长度都是2个字节,opcode占用1个字节,若这个opcode是有参数的,则另外1个字节表示参数;如果opcode没有参数,则另外1个字节就会被忽略,一般为0x00。实际上opcode的参数仅有1个:offset。

Python3.6 以前,对于有参数的opcode,指令长度为3个字节,包含opcode、argv_low、argv_high,opcode占用1个字节,参数占用2个字节,也采用小端存储。如Python 2.7中的指令6401 00表示opcode为LOAD_CONST,参数为1。

LOAD_CONST(consti)Pushes co_consts[consti] onto the stack.

即从co_consts这个tuple对象取出第1个对象(从0开始计算,第1个元素即为co_consts[1]),压到栈顶。

查看opcode

可以使用Python自带的dis和marshal库帮助查看opcode序列,下面以2个经典版本(Python 2.7和Python 3.7)为例。

现设定源码:

print('Hello, world')
def fff(a,b):
c = a + b
return c & 0xffff
fff(34,67)
Python 2.7
>>> import dis, marshal
>>> f=open('t.pyc', 'rb').read()
>>> co=marshal.loads(f[8:]) # Python2.7中,PyCodeObject在.pyc文件中的偏移为8
>>> dis.dis(co)
1 0 LOAD_CONST 0 ('Hello, world')
3 PRINT_ITEM
4 PRINT_NEWLINE 3 5 LOAD_CONST 1 (<code object fff at 0x10a1c9630, file "t.py", line 3>)
8 MAKE_FUNCTION 0
11 STORE_NAME 0 (fff) 7 14 LOAD_NAME 0 (fff)
17 LOAD_CONST 2 (34)
20 LOAD_CONST 3 (67)
23 CALL_FUNCTION 2
26 POP_TOP
27 LOAD_CONST 4 (None)
30 RETURN_VALUE
>>> co.co_names
('fff',)
>>> co.co_consts
('Hello, world', <code object fff at ..., file ".../t.py", line 3>, 34, 67, None)
16进制指令 行号 指令偏移与指令名称 参数
65 00 00 1 0 LOAD_CONST 0('Hello, world')
64 02 00 3 PRINT_ITEM
48 4 PRINT_NEWLINE
64 01 00 3 5 LOAD_CONST 1(<code object fff at ... line 3>)
84 00 00 8 MAKE_FUNCTION 0
5a 00 00 11 STORE_NAME 0(fff)
65 00 00 7 14 LOAD_NAME 0(fff)
64 02 00 17 LOAD_CONST 2(34)
64 03 00 20 LOAD_CONST 3(67)
83 02 00 23 CALL_FUNCTION 2
01 26 POP_TOP
64 04 00 27 LOAD_CONST 4(None)
53 30 RETURN_VALUE
Python 3.7
>>> import dis, marshal
>>> f=open('t.pyc', 'rb').read()
>>> co=marshal.loads(f[16:]) # Python3.7中,PyCodeObject在pyc文件中的偏移为16
>>> dis.dis(co)
1 0 LOAD_NAME 0 (print)
2 LOAD_CONST 0 ('Hello, world')
4 CALL_FUNCTION 1
6 POP_TOP 3 8 LOAD_CONST 1 (<code object fff at ..., line 3>)
10 LOAD_CONST 2 ('fff')
12 MAKE_FUNCTION 0
14 STORE_NAME 1 (fff) 7 16 LOAD_NAME 1 (fff)
18 LOAD_CONST 3 (34)
20 LOAD_CONST 4 (67)
22 CALL_FUNCTION 2
24 POP_TOP
26 LOAD_CONST 5 (None)
28 RETURN_VALUE
>>> co.co_names
('print', 'fff')
>>> co.co_name
'<module>'
>>> co.co_consts
('Hello, world', <code object fff at ..., file".../t.py", line 3>,'fff', 34, 67,None)
16进制指令 行号 指令偏移与指令名称 参数
65 00 1 0 LOAD_NAME 0(print)
64 00 2 LOAD_CONST 0('Hello, world')
83 01 4 CALL_FUNCTION 1
01 00 6 POP_TOP
64 01 3 8 LOAD_CONST 1(<code object fff at ..., line 3>)
64 02 10 LOAD_CONST 2('fff')
84 00 12 MAKE_FUNCTION 0
5a 01 14 STORE_NAME 1(fff)
65 01 7 16 LOAD_NAME 1(fff)
64 03 18 LOAD_CONST 3(34)
64 04 20 LOAD_CONST 4(67)
83 02 22 CALL_FUNCTION 2
01 00 24 POP_TOP
64 05 26 LOAD_CONST 5(None)
53 00 28 RETURN_VALUE

小结

.pyc文件处理的重要难点在于版本的差异和结构、逻辑关系,本次处理的.exe文件是个没有任何保护的裸程序,也没有涉及去混淆操作,故很容易得到结果;当遇到.pyc混淆处理等问题时,则需要细致的分析,得到结果的难度显著增大,甚至不能得出结果。

能够得出源码确实值得庆幸,但更重要的是加深对.pyc文件结构、作用的了解。

dk.exe自动填报程序的反编译的更多相关文章

  1. 反编译Android APK及防止APK程序被反编译

    怎么逆向工程对Android Apk 进行反编译 google Android开发是开源的,开发过程中有些时候会遇到一些功能,自己不知道该怎么做,然而别的软件里面已经有了,这个时候可以采用反编译的方式 ...

  2. 对用pyinstaller打包的exe程序进行反编译,获得源码

    参考文章: 1.https://www.cnblogs.com/DirWang/p/12018949.html#PyInstallerExtractor 2.https://msd.misuland. ...

  3. 谈谈JAVA程序的反编译

      如今JAVA语言在全世界范围正如火如荼般的流行,它广范地应用在INTERNET的数据库.多媒体.CGI.及动态网页的制作方面.1999年在美国对JAVA程序员的需求量首次超过C++! 最近分析一些 ...

  4. Android 应用程序的反编译

    1.ApkTool工具 安装ApkTool工具,该工具可以解码得到资源文件,但不能得到Java源文件.安装环境:需要安装JRE1.61> 到http://code.google.com/p/an ...

  5. Android如何防止apk程序被反编译

    作为Android应用开发者,不得不面对一个尴尬的局面,就是自己辛辛苦苦开发的应用可以被别人很轻易的就反编译出来. Google似乎也发现了这个问题,从SDK2.3开始我们可以看到在android-s ...

  6. Android程序apk反编译破解方法

    简短不割了,我们直接奔主题吧. 把apktool-install-windows-r05-ibot文件里的两个文件剪切到apktool1.5.1目录. 新建一个文件夹把需要破解的apk应用程序放进去. ...

  7. Android程序的反编译对抗研究

    转自: http://www.freebuf.com/tools/76884.html 一.前言 对抗反编译是指让apk文件或者dex文件无法正常通过反编译工具,而且有可能导致工具异常或者崩溃,如ap ...

  8. de4Dot用法 解决 .net程序 reflecter反编译 “索引超出了数组界限”问题

    de4Dot 反混淆工具.当你反编译 .net写的dll 或exe时出现:索引超出了数组界限 问题时 可以去网上下这个工具,通过cmd命令 打开de4dot的exe 空格 dll的全路径. 这样 :D ...

  9. [C#]使用dnSpy对目标程序(EXE或DLL)进行反编译修改并编译运行

    本文为原创文章.源代码为原创代码,如转载/复制,请在网页/代码处明显位置标明原文名称.作者及网址,谢谢! 本文使用的工具下载地址为: https://github.com/cnxy/dnSpy/arc ...

随机推荐

  1. Gym 100796B Wet Boxes(思维)题解

    题意:给一个如图坐标系,每个方形都放在下面两个中间,已知一个木块湿了那么他下面所有的都会湿,显然,不能湿两次.问,每次给出一个坐标,把他弄湿,有几个木块从干变成湿了. 思路:我们把坐标系拉直,就变成了 ...

  2. 解决debian (Friendly ARM 嵌入式板)的sudo等一部分命令无法TAB补全

    TAB对于比较长的命令在使用时是十分方便的,最近就遇到TAB 键无法补全sudo后跟的命令的情况因此去网上取经.在一篇博客中找到解决问题的方法,觉得大牛们写的太精炼然后自己做如下总结方便自已以后解决类 ...

  3. Ajax & JSONP 原理

    Ajax & JSONP 原理 AJAX不是JavaScript的规范,它只是一个哥们"发明"的缩写:Asynchronous JavaScript and XML,意思就 ...

  4. 1GB === 1000MB & 1GB === 1024MB

    1GB === 1000MB & 1GB === 1024MB 字节单位换算 1 Gigabyte = 1000 Megabytes 1 Gibibyte = 1024 Mebibytes 十 ...

  5. 神奇的数学学习网站 All In One

    神奇的数学学习网站 All In One magical math websites {{uploading-image-923797.png(uploading...)}} Math is Fun ...

  6. Upcoming Browser Behavior Changes & Chrome & SameSite

    Upcoming Browser Behavior Changes & Chrome & SameSite Chrome 80 https://auth0.com/blog/brows ...

  7. web effects collection

    web effects collection typewriter effect js 打字机效果 http://www.mattboldt.com/demos/typed-js/ https://g ...

  8. CSS Layout All In One

    CSS Layout All In One CSS2 position float % px , rem, em CSS3 flex grid multi column vw / vh 常见布局模式 ...

  9. WEB 使用lazysizes延迟加载图像

    原文 Native lazy-loading for the web Example <style> div { height: 3000px; } </style> < ...

  10. Flutter: 监听App显示,隐藏

    关键代码 class _MyAppState extends State<MyApp> with WidgetsBindingObserver { @override void initS ...