More source analysis with VLD

VLD is a tool that I started working on years ago to visualise the opcode arrays in PHP. Opcode arrays are what PHP's compiler generates from your source code and can be compared to assembler code that is generated by a C compiler. Instead of it being directly executed by the CPU, it is instead executed by PHP's interpreter.

Over the years I've been adding some functionality, also aided by Ilia and some others, to show more information. For example Ilia has added a more verbose dumping format for opcodes (through the vld.verbosity setting) whereas I have added routines to find out which ops in oparrays can never be reached. A very simple example of the latter is shown here:

<?php
function test()
{
        echo "Hello!\n";
        return true;

        echo "This will not be executed.\n";
}
?>

If we run the above through VLD with php -dvld.active=1 test.php, you'll see the following output (I removed the part about the script body itself):

Function test:
filename:       /tmp/test1.php
function name:  test
number of ops:  9
compiled vars:  none
line     # *  op           fetch  ext  return  operands
---------------------------------------------------------
   2     0  >   EXT_NOP
   4     1      EXT_STMT
         2      ECHO                           'Hello%21%0A'
   5     3      EXT_STMT
         4    > RETURN                         true
   7     5*     EXT_STMT
         6*     ECHO                           'This+will+not+be+executed.%0A'
   8     7*     EXT_STMT
         8*   > RETURN                         null

End of function test.

Every opcode that has a * after the number (like in 5*) is code that can not be reached, and can possibly be eliminated from the oparrays in an optimiser.

The dead code analysis routines have also made their way into Xdebug which uses them for the code coverage functionality to highlight dead code. This mostly makes sense if you are running your code coverage together with unit tests such as you can do with PHPUnit.

Recently I've been working on some new functionality to visualise all the code paths that make up each function. These new routines sit on top of the routines that do dead code analysis. Every branch instruction (such as if, but also for and foreach) is analysed and a list of branches is created. Each branch contains information about the line on which the branch starts, the starting and ending opcode numbers that belong to the branch, as well as to which other branches this branch can jump to. There can be either no linked branches (when for example a return or throw statement is found), one linked branch (for an unconditional jump) or two linked branches (on a branch instruction). However, you need to be aware that internally, PHP's opcode don't always reflect the source code exactly.

Once all the branches and their links are found, another algorithm runs to figure out which paths can be created out of all the branches. It is best to illustrate this with an example. So let us look at the following script:

<?php
function test()
{
        for( $i = 0; $i < 10; $i++ )
        {
                if ( $i < 5 )
                {
                        echo "-";
                }
                else
                {
                        echo "+";
                }
        }
        echo "\n";
}
?>

In this script we have a for-loop with a nested if construct. When we run this script through VLD (with php -dvld.verbosity=0 -dvld.dump_paths=1 -dvld.active=1 test2.php) we get the following output (again, only the test() function and with some white space modifications):

Function test:
filename:       /tmp/test2.php
function name:  test
number of ops:  22
compiled vars:  !0 = $i
line     # *  op             fetch  ext  return  operands
-----------------------------------------------------------
   2     0  >   EXT_NOP
   4     1      EXT_STMT
         2      ASSIGN                             !0, 0
         3  >   IS_SMALLER                 ~1      !0, 10
         4      EXT_STMT
         5    > JMPZNZ                  9          ~1, ->18
         6  >   POST_INC                   ~2      !0
         7      FREE                               ~2
         8    > JMP                                ->3
   6     9  >   EXT_STMT
        10      IS_SMALLER                 ~3      !0, 5
   7    11    > JMPZ                               ~3, ->15
   8    12  >   EXT_STMT
        13      ECHO                               '-'
   9    14    > JMP                                ->17
  12    15  >   EXT_STMT
        16      ECHO                               '%2B'
  14    17  > > JMP                                ->6
  15    18  >   EXT_STMT
        19      ECHO                               '%0A'
  16    20      EXT_STMT
        21    > RETURN                             null

branch: #  0; line:  2- 4; sop:  0; eop:  2; out1:   3
branch: #  3; line:  4- 4; sop:  3; eop:  5; out1:  18; out2:   9
branch: #  6; line:  4- 4; sop:  6; eop:  8; out1:   3
branch: #  9; line:  6- 7; sop:  9; eop: 11; out1:  12; out2:  15
branch: # 12; line:  8- 9; sop: 12; eop: 14; out1:  17
branch: # 15; line: 12-14; sop: 15; eop: 16; out1:  17
branch: # 17; line: 14-14; sop: 17; eop: 17; out1:   6
branch: # 18; line: 15-16; sop: 18; eop: 21
path #1: 0, 3, 18,
path #2: 0, 3, 9, 12, 17, 6, 3, 18,
path #3: 0, 3, 9, 15, 17, 6, 3, 18,
End of function test.

This dump consists of a few different parts. First of all we can see some basic information containing the name, the number of ops (22) and the compiled variables. The second part is a dump of all the opcodes that make up this function. The last part contains information about all the branches and the possible paths. This information is a bit hard to visualize in its textual form, so I've also added some code that dumps this information to a file format that the GraphViz tool "dot" can use to create a pretty graph. For this we re-run the previous PHP invocation as php -dvld.dump_paths=1 -dvld.verbosity=0 -dvld.save_paths=1 -dvld.active=1 test2.php. This creates the file /tmp/paths.dot that "dot" can use. If we run dot -Tpng /tmp/paths.dot > /tmp/paths.png we end up with the following picture:

vld-paths.png

If we put this graph next to the code, we can explain how this works. Every branch is named by the number of the first opcode in that branch:

  • op #1 is the assignment of $i in line 4.

  • op #3 is the loop test in line 4. If the condition doesn't match, we jump to op #18 on line 16 that echos the newline.

  • op #9 is the if condition on line 6.

  • op #12 is when the if condition returns true and

  • op #15 is when the if condition returns false.

  • op #17 sits behind both op #12 and op #15 and makes sure there is a jump to the counting expression in #op 6.

  • op #6 is the post increment operation on line 4 which will then again be followed by op #3 to check whether the end of the loop has been reached.

This is of course a very simple example, but it also works for (multiple) classes and functions in a file. You just need to make sure to tell VLD that you don't want the code executed as the output could be very large. You can use the vld.execute=0 php.ini setting for that.

I hope this new functionality can spread some light on how loops etc. work in PHP. In order to play with the code, you need to check-out VLD from my SVN with svn co svn://svn.xdebug.org/svn/php/vld/trunk vld. You can also view the code on-line at http://svn.xdebug.org/cgi-bin/viewvc.cgi/vld/trunk/?root=php. Look out for a new release coming soon!

Comments

Nice nice nice nice nice Derick ! I'm gonna use that ASAP.

When will the code be pushed to pecl to use the pecl command to update vld ?

Seems like a cool idea : )

I wont use in current projects but i can see how that can come in handy! : )

thanks

art

Add Comment

Name:
Email:

Will not be posted. Please leave empty instead of filling in garbage though!
Comment:

Please follow the reStructured Text format. Do not use the comment form to report issues in software, use the relevant issue tracker. I will not answer them here.


All comments are moderated

Life Line