Python Byte-code Compiler
This app provides the ability to convert Python files into their .pyc files. Python .pyc files are compiled Python files, compiled into byte-code. If you ever wondered why sometimes Python generates these and the __pycache__ folder, it's for performance reasons.
The purpose of this exercise is to expose the internals of Python so that some people might experiment with writing their own language that runs on the Python virtual machine. A lot of the more recent languages such as Scala and Clojure run on the JVM. They've become popular because they immediately come with batteries included so-to-speak, because they're capable of importing all existing Java libraries. Python is arguably a cleaner language than Java, and so it would be advantageous to have a functional language, for example, that integrates well with Python--a language that follows Pythonic principles (see import this). I plan on working on such a language, but I'd like to open the flood gates for everyone else as well.
Generating byte-code (.pyc files)
The structure of .pyc files is as follows:
- 4 bytes: Magic number
- 4 bytes: Timestamp
- 4 bytes: Padding
- N bytes: Marshalled code object
You can get each segment to create a .pyc file in the following ways:
- The magic number corresponds to the required Python version. You can get this number through the imp module:
import imp
magic_number = imp.get_magic() - The timestamp corresponds to the time it was created. If there's a corresponding .py file, it checks this timestamp with that file to see if they match. Otherwise it's irrelevant if the .pyc file is on its own. You can get this number by using the time and struct modules:
import struct, time
timestamp = struct.pack('i', int(time.time())) - The padding is just padding before the code object, basically 4-byte sequence of 0's. This padding seems to only be in Python 3, so omit it for Python 2. Sometimes the first byte has some value, but it doesn't seem relevant. You can just use this bytestring:
b'A\x00\x00\x00' - The code object is a marshalled python code object. You can use the
compilecommand to compile a segment of python code into a code object to test this out initially. The command signature iscompile(code_segment, 'file_name', 'exec'). You need to make sure thatfile_namecorresponds to the filename you are writing the .pyc file into. Here's a simple example:
import marshal
filename = 'addnum.py'
code_segment = 'a = 123 + 321\nprint(a)'
code = compile(code_segment, filename, 'exec')
marshalled_code_object = marshal.dumps(code)
You can put it all together like this:
# write to addnum.pyc
with open(filename + 'c', 'wb') as f:
f.write(magic_number)
f.write(timestamp)
f.write(padding)
f.write(marshalled_code_object)