Talk about the execution process of Python program

1. Is Python an interpreted language?

When I first learned Python, the first sentence I heard about Python was that Python is an interpreted language. I continued to believe it until I discovered the existence of *. pyc files. If it is an interpreted language, what is the generated *.pyc file? c should be the abbreviation of compiled!

In order to prevent other people learning Python from being misunderstood by this sentence, we will clarify this issue in the article and clarify some basic concepts.

2. Interpreted and compiled languages

Computers cannot recognize high-level languages, so when we run a high-level language program, we need a "translator" to engage in the process of converting high-level languages into machine language that computers can understand. This process is divided into two categories, the first is compilation and the second is interpretation.

Before the program is executed in a compiled language, a compiler will first perform a compilation process on the program to convert the program into machine language. There is no need for translation at runtime, just execution. The most typical example is the C language.

Interpreted languages do not have this compilation process. Instead, when the program is running, the interpreter interprets the program line by line and then runs it directly. The most typical example is Ruby.

Through the above examples, we can summarize the advantages and disadvantages of interpreted languages and compiled languages. Because compiled languages have already "translated" the program before it is run, there is less "translation" during runtime. ” process, so the efficiency is relatively high. But we cannot generalize. Some interpreted languages can also optimize the entire program when translating the program through the optimization of the interpreter, thus surpassing compiled languages in efficiency.

In addition, with the rise of virtual machine-based languages such as Java, we cannot purely divide languages into interpreted and compiled languages.

Take Java as an example. Java is first compiled into a bytecode file by a compiler, and then interpreted into a machine file by an interpreter at runtime. So we say that Java is a language that is compiled first and then interpreted.

Switching to C#, C# first compiles the C# file into an IL file through the compiler, and then compiles the IL file into a machine file through the CLR. So we say that C# is a purely compiled language, but C# is a language that requires secondary compilation. The same principle can also be applied to other languages based on the .NET platform.

3. What exactly is Python?

In fact, Python, like Java/C#, is also a language based on a virtual machine. Let's first briefly understand the running process of a Python program from the surface.

When we enter python hello.py on the command line, we actually activate the Python "interpreter" and tell the "interpreter": you are about to start working. But before "interpretation", the first work performed is actually the same as Java, which is compilation.

Students who are familiar with Java can think about how we execute a Java program on the command line:

javac hello.java

java hello

It's just that when we use IDEs like Eclipse, we merge these two into one. In fact, the same is true for Python. When we execute python hello.py, it also executes such a process, so we should describe Python like this. Python is a language that is compiled first and then interpreted.

4. Briefly describe the running process of Python

Before talking about this problem, let's first talk about two concepts, PyCodeObject and pyc files.

Needless to say, the pyc we see on the hard disk goes without saying, but in fact PyCodeObject is the result of the actual compilation by the Python compiler. We can simply know it first and continue to look down.

When the Python program runs, the compiled result is stored in the PyCodeObject in memory. When the Python program ends, the Python interpreter writes the PyCodeObject back to the pyc file.

When the python program is run for the second time, the program will first search for the pyc file on the hard disk. If it is found, it will be loaded directly. Otherwise, the above process will be repeated.

So we should locate PyCodeObject and pyc files in this way. We say that pyc files are actually a persistent storage method of PyCodeObject.

5. Run a Python program

Let's write a program to actually run it:

Talk about the execution process of Python program

The program itself is meaningless. Let’s continue looking at:

Talk about the execution process of Python program

However, we did not see the pyc file in the program, test.py still stayed there alone!

Then let's change the way of writing. We change the print_str method to another python module:

Talk about the execution process of Python program

Then run the program:

Talk about the execution process of Python program

At this time, the pyc file appeared. In fact, it is not difficult to get the reason if you think carefully. Let's consider the actual business situation.

6. The purpose of pyc is to reuse

Recalling the second paragraph of this article when explaining the advantages and disadvantages of compiled languages and interpreted languages, I said that the advantage of compiled languages is that we can directly use "translated" files without interpretation when the program is running. In other words, the biggest advantage of why we compile py files into pyc files is that we do not need to re-interpret the module when running the program.

Therefore, what we need to compile into pyc files should be those modules that can be reused, which is the same purpose as when we design software classes. Therefore, the Python interpreter believes that only imported modules are the modules that need to be reused.

At this time, some people may say, that’s not right! Your question has not been explained clearly. Doesn’t my test.py also need to be run? Although it is not a module, I can save time every time I run it in the future!

OK, let’s start from the actual situation and think about when we can run the python xxx.py file:

A. When performing a test.

B. When starting a Web process.

C. Execute a program script.

Let's talk about it one by one. We don't need to say more about the first case. At this time, it doesn't matter even if all the files don't have pyc files.

In the second case, let's imagine a webpy program. We usually execute it like this:

Talk about the execution process of Python program

Or:

Talk about the execution process of Python program

Then this program is similar to a daemon process and has been monitoring port 8181/9002. Once interrupted, the only possibility is that the program is killed or other unexpected situations occur. Then what you need to do to recover is to restart the entire Web service. . So since it is always monitored, it is enough to keep the PyCodeObject in memory. There is no need to persist it to the hard disk.

In the last case, when executing a program script, the main entrance of a program is actually very similar to the Controller in a Web program. In other words, it should be responsible for the scheduling between Models and does not include any main logic. The Controller It should be a Facade, without any detailed logic, just turning parameters around. So if students who do algorithms can know that in an algorithm script, the easiest thing to change is the various parameters of the algorithm, then at this time, it can be lasting Turning it into a pyc file would be a bit superfluous.

So we can understand the intention of the Python interpreter in this way. The Python interpreter only persists modules that we may reuse into pyc files.

7. Expiration time of pyc

After talking about the pyc file, some people may think that every time the Python interpreter persists the module into a pyc file, then when my module changes, do I have to manually remove the previous pyc file? Woolen cloth?

Of course, the designers of Python would not make such an idiotic mistake. This process actually depends on how PyCodeObject is written into the pyc file.

Let’s take a look at the source code of the import process:

Talk about the execution process of Python program

This code is relatively long. Let’s only look at the code I marked. In fact, when writing the pyc file, he wrote a Long variable. The content of the variable is the most recent modification date of the file. Similarly, let’s look at it again. Download the code into pyc:

Talk about the execution process of Python program

Without looking at the code carefully, we can clearly see the principle. In fact, every time before loading, the last modification date saved in the py file and the pyc file will be checked. If they are inconsistent, a pyc file will be regenerated.

8. Write at the end

In fact, understanding the execution process of Python programs is of little significance to most programmers, including Python programmers. What is really meaningful is what we can learn from the practices of the Python interpreter. I think there is this A few points:

A. In fact, whether Python is saved as a pyc file is the same as when we design the cache system. We can think carefully about what is worth throwing in the cache and what is not worth throwing in the cache.

B. When running a time-consuming Python script, how can we squeeze out some of the running time of the program is to separate the module from the main module. (Although this is often not the bottleneck)

C. When designing a software system, should reused and non-reused things be treated separately? This is an important part of software design principles.

D. When designing a cache system (or other systems), how can we avoid program expiration? In fact, the Python interpreter also provides us with a particularly common and effective solution.

Talk about the execution process of Python program​

Talk about the execution process of Python program