Skip to content

FAST

4 posts with the tag “FAST”

Control Flow Graph for FAST Fortran

A Control Flow Graph analysis for FAST Fortran

Section titled “A Control Flow Graph analysis for FAST Fortran”

Control Flow Graphs (CFG) are a common tool for static analyzis of a computation unit (eg. a method) and find some errors (unreachable code, infinite loops)

It is based on the concept of Basic Block: a sequence of consecutive statements in which flow of control can only enter at the beginning and leave at the end. Only the last statement of a basic block can be a branch statement and only the first statement of a basic block can be a target of a branch.

There are two distinctive basic blocks:

  • Start Block: The entry block allows the control to enter into the control flow graph. There should be only one start block.
  • Final Block: Control flow leaves through the exit block. There may be several final blocks.

The package FAST-Fortran-Analyses in https://github.com/moosetechnology/FAST-Fortran contains classes to build a CFG of a Fortran program unit (a main program, a function, or a subroutine).

We must first create a FAST model of a Fortran program. For this we need an external parser. We currently use fortran-src-extras from https://github.com/camfort/fortran-src-extras.

To run it on a fortran file you do:

fortran-src-extras serialize -t json -v77l encode <fortran-file.f>

This will produce a json AST of the program that we can turn into a FAST-Fortran AST.

If you have fortran-src-extras installed on your computer, all this is automated in FAST-Fortran

<fortran-file.f> asFileReference
readStreamDo: [ :st |
FortranProjectImporter new getFASTFor: st contents ]

This script will create an array of ASTs from the <fortran-file.f> given fortran file. If there are several program units in the file, there will be several FAST models in this array. In the example below, there is only one program, so the list contains only the AST for this program.

We will use the following Fortran-77 code:

PROGRAM EUCLID
* Find greatest common divisor using the Euclidean algorithm
PRINT *, 'A?'
READ *, NA
IF (NA.LE.0) THEN
PRINT *, 'A must be a positive integer.'
STOP
END IF
PRINT *, 'B?'
READ *, NB
IF (NB.LE.0) THEN
PRINT *, 'B must be a positive integer.'
STOP
END IF
IA = NA
IB = NB
1 IF (IB.NE.0) THEN
ITEMP = IA
IA = IB
IB = MOD(ITEMP, IB)
GOTO 1
END IF
PRINT *, 'The GCD of', NA, ' and', NB, ' is', IA, '.'
STOP
END

From the FAST model above, we will now create a Control-Flow-Graph:

<FAST-model> accept: FASTFortranCFGVisitor new

The class FASTFortranCFGVisitor implements an algorithm to compute basic blocks from https://en.wikipedia.org/wiki/Basic_block.

This visitor goes throught the FAST model and creates a list of basic blocks that can be inspected with the #basicBlocks method.

There is a small hierarchy of basic block classes:

  • FASTFortranAbstractBasicBlock, the root of the hierarchy. It contains #statements (which are FAST statement nodes). It has methods to test its nature: isStart, isFinal, isConditional. It defines an abstract method #nextBlocks that returns a list of basic blocks that this one can directly reach. Typically there are 1 or 2 next blocks, but Fortran can have more due to “arithmetic IF”, “computed GOTO” and “assigned GOTO” statements.
  • FASTFortranBasicBlock, a common basic block with no branch statement. If it is final, its #nextBlocks is empty, otherwise it’s a list of 1 block.
  • FASTFortranConditionalBasicBlock, a conditional basic block. It may reach several #nextBlocks, each one associated with a value, for example true and false. The method #nextBlockForValue: returns the next block associated to a given value. In our version of CFG, a conditional block may only have one statement (a conditional statement).

You may have noticed that our blocks are a bit different from the definition given at the beginning of the blog-post:

  • our “common” blocs cannot have several next, they never end with a conditional statement;
  • our conditional blocks can have only one statement.

For the program above, the CFG has 10 blocks.

  • the first block is a common block and contains 2 statements, the PRINT and the READ;
  • its next bloc is a conditional block for the IF. It has 2 next blocs:
    • true leads to a common block with 2 statements, the PRINT and the STOP. This is a final block (STOP ends the program);
    • false leads to the common block after the IF

As a first analysis tool, we can visualize the CFG. Inspecting the result of the next script will open a Roassal visualization on the CFG contained in the FASTFortranCFGVisitor.

FASTFortranCFGVisualization on: <aFASTFortranCFGVisitor>

For the program above, this gives the visualization below.

  • the dark dot is the starting block (note that it is a block and contains statements);
  • the hollow dots are final blocks;
  • it’s not the case here, but a block may also be start and final (if there are no conditional blocks in the program) and this would be represented by a “target”, a circle with a dot inside;
  • a grey square is a comon block;
  • a blue square is a conditional block;
  • hovering the mouse on a block will bring a pop up with the list of its statements (this relies on the FASTFortranExporterVisitor)

"Viualizing the Control Flow Graph"

One can see that:

  • the start block has 2 associated statements (PRINT and READ);
  • there are several final blocks, due to the STOP statements;
  • there is a loop at the bottom left of the graph where the last blue conditional block is “IF (IB.NE.0)” and the last statement of the grey block (true value of the IF), is a GOTO.

There are little analyses for now on the CFG, but FASTFortranCFGChecker will compute a list of unreachableBlocks that would represent dead code.

Control flow graphs may also be used to do more advanced analyses and possibly refactor code. For example, we mentioned the loop at the end of our program implemented with a IF statement and a GOTO. This could be refactored into a real WHILE loop that would be easier to read.

This is left as an exercise for the interested people 😉

Building a control flow graph is language dependant to identify the conditional statements, where they lead, and the final statements.

But much could be done in FAST core based on FASTTReturnStatement and a (not yet existing at the time of writing) FASTTConditionalStatement.

Inspiration could be taken from FASTFortranCFGVisitor and the process is not overly complicated. It would probably be even easier for modern languages that do not have the various GOTO statements of Fortran.

Once the CFG is computed, the other tools (eg. the visualization) should be completely independant of the language.

All hands on deck!

Some tools on FAST models

The package FAST-Core-Tools in repository https://github.com/moosetechnology/FAST offers some tools or algorithms that are running on FAST models.

These tools may be usable directly on a specific language FAST meta-model, or might require some adjustements by subtyping them. They are not out-of-the-shelf ready to use stuff, but they can provide good inspiration for whatever you need to do.

Writing test for FAST can be pretty tedious because you have to build a FAST model in the test corresponding to your need. It often has a lot of nodes that you need to create in the right order with the right properties.

This is where FASTDumpVisitor can help by visiting an existing AST and “dump” it as a string. The goal is that executing this string in Pharo should recreate exactly the same AST.

Dumping an AST can also be useful to debug an AST and checking that it has the right properties.

To use it, you can just call FASTDumpVisitor visit: <yourAST> and print the result. For example:

FASTDumpVisitor visit:
(FASTJavaUnaryExpression new
operator: '-' ;
expression:
(FASTJavaIntegerLiteral new
primitiveValue: '5'))

will return the string: FASTJavaUnaryExpression new expression:(FASTJavaIntegerLiteral new primitiveValue:'5');operator:'-' which, if evaluated, in Pharo will recreate the same AST as the original.

Note: Because FAST models are actually Famix models (Famix-AST), the tools works also for Famix models. But Famix entities typically have more properties and the result is not so nice:

FASTDumpVisitor visit:
(FamixJavaMethod new
name: 'toto' ;
parameters: {
FamixJavaParameter new name: 'x' .
FamixJavaParameter new name: 'y'} ).

will return the string: FamixJavaMethod new parameters:{FamixJavaParameter new name:'x';isFinal:false;numberOfLinesOfCode:0;isStub:false.FamixJavaParameter new name:'y';isFinal:false;numberOfLinesOfCode:0;isStub:false};isStub:false;isClassSide:false;isFinal:false;numberOfLinesOfCode:-1;isSynchronized:false;numberOfConditionals:-1;isAbstract:false;cyclomaticComplexity:-1;name:'toto'.

By definition an AST (Abstract Syntax Tree) is a tree (!). So the same variable can appear several time in an AST in different nodes (for example if the same variable is accessed several times).

The idea of the class FASTLocalResolverVisitor is to relate all uses of a symbol in the AST to the node where the symbol is defined. This is mostly useful for parameters and local variables inside a method, because the local resover only looks at the AST itself and we do not build ASTs for entire systems.

This local resolver will look at identifier appearing in an AST and try to link them all together when they correspond to the same entity. There is no complex computation in it. It just looks at names defined or used in the AST.

This is dependant on the programming language because the nodes using or defining a variable are not the same in all languages. For Java, there is FASTJavaLocalResolverVisitor, and for Fortran FASTFortranLocalResolverVisitor.

The tool brings an extra level of detail by managing scopes, so that if the same variable name is defined in different loops (for example), then each use of the name will be related to the correct definition.

The resolution process creates:

  • In declaration nodes (eg. FASTJavaVariableDeclarator or FASTJavaParameter),a property #localUses will list all referencing nodes for this variable;
  • In accessing nodes, (eg. FASTJavaVariableExpression), a property #localDeclarations will lists the declaration node corresponding this variable.
  • If the declaration node was not found a FASTNonLocalDeclaration is used as the declaration node.

Note: That this looks a bit like what Carrefour does (see /blog/2022-06-30-carrefour), because both will bind several FAST nodes to the same entity. But the process is very different:

  • Carrefour will bind a FAST node to a corresponding Famix node;
  • The local resolver binds FAST nodes together.

So Carrefour is not local, it look in the entire Famix model to find the entity that matches a FAST node. In Famix, there is only one Famix entity for one software entity and it “knows” all its uses (a FamixVariable has a list of FamixAccess-es). Each FAST declaration node will be related to the Famix entity (the FamixVariable) and the FAST use nodes will be related to the FamixAccess-es.

On the other hand, the local resolver is a much lighter tool. It only needs a FAST model to work on and will only bind FAST nodes between themselves in that FAST model.

For round-trip re-engineering, we need to import a program in a model, modify the model, and re-export it as a (modified) program. A lot can go wrong or be fogotten in all these steps and they are not trivial to validate.

First, unless much extra information is added to the AST, the re-export will not be syntactically equivalent: there are formatting issues, indentation, white spaces, blank lines, comments that could make the re-exported program very different (apparently) from the original one.

The class FASTDifferentialValidator helps checking that the round-trip implementation works well. It focuses on the meaning of the program independently of the formatting issues. The process is the follwing:

  • parse a set of (representative) programs
  • model them in FAST
  • re-export the programs
  • re-import the new programs, and
  • re-create a new model

Hopefully, the two models (2nd and last steps) should be equivalent This is what this tool checks.

Obviously the validation can easily be circumvented. Trivially, if we create an empty model the 1st time, re-export anything, and create an empty model the second time, then the 2 models are equivalent, yet we did not accomplish anything. This tool is an help for developers to pinpoint small mistakes in the process.

Note that even in the best of conditions, there can still be subtle differences between two equivalent ASTs. For example the AST for “a + b + c” will often differ from that of “a + (b + c)”.

The validator is intended to run on a set of source files and check that they are all parsed and re-exported correctly. It will report differences and will allow to fine tune the comparison or ignore some differences.

It goes through all the files in a directory and uses an importer, an exporter, and a comparator. The importer generates a FAST model from some source code (eg. JavaSmaCCProgramNodeImporterVisitor); the exporter generates source code from a model (eg. FASTJavaExportVisitor); the comparator is a companion class to the DifferentialValidator that handle the differences between the ASTs.

The basic implementation (FamixModelComparator) does a strict comparison (no differences allowed), but it has methods for accepting some differences:

  • #ast: node1 acceptableDifferenceTo: node2: If for some reason the difference in the nodes is acceptable, this method must return true and the comparison will restart from the parent of the two nodes as if they were the same.
  • #ast: node1 acceptableDifferenceTo: node2 property: aSymbol. This is for property comparison (eg. the name of an entity), it should return nil if the difference in value is not acceptable and a recovery block if it is acceptable. Instead of resuming from the parent of the nodes, the comparison will resume from an ancestor for which the recovery block evaluates to true.

Carrefour: The bridge between FAMIX and FAST

To analyze software systems, the Famix meta-model provides enough abstraction to understand how models work.

However, when we are interested in details, the FAST (Famix AST) meta-model provides less abstraction and gives us more information about our model (for example expression statements, identifiers etc.).

In some situations, such as modernization/migration projects, we need the binding between the two meta models. And here Carrefour comes in!

Carrefour

Carrefour represents a two-way link between Famix and FAST, it allows one to navigate on the AST and at the same time return to the elements of FAMIX when needed.

In this blog, we are going to use a simple snippet of code to simplify and grasp all necessary concepts we should know about FAST & Carrefour & Famix. Consider the MyClass class and the following methodAB method:

class MyClass {
public int methodAB(int a, int b){
if (a > b) {
a = a + 2;
} else {
b = 1;
}
return b;
}
}

Let’s prepare the ground for using Carrefour by generating the Famix model of the MyClass class using VerveineJ. Open a code editor and create a new MyClass.java file. Inside the Java file, we add the code above.

To generate the Famix Java model we use VerveineJ by running this command in the MyClass java file directory:

Terminal window
/path/to/VerveineJ/verveinej.sh -format json -o MyClass.json -anchor assoc -autocp ./ ./

PS: Note that Carrefour uses entities & associations as source anchor information, so make sure to add the option -anchor assoc.

In this section, we start from the MyClass Famix java model and we build the link between Famix and FAST using Carrefour. First, we need to install Carrefour, for example by running the following script on Moose Playground:

Metacello new
githubUser: 'badetitou' project: 'Carrefour' commitish: 'v3' path: 'src';
baseline: 'Carrefour';
load

Then we import the MyClass model and pick the first class (which is the only class MyClass).

'/path/to/MyClass.json' asFileReference readStreamDo: [ :stream |
model := FamixJavaModel new importFromJSONStream: stream
].
model rootFolder: '/path/to/MyClass/Directory/'.
method := model allModelClasses first.

Now, we call Carrefour to generate the AST (the figure below) and bind the newly created AST with Famix.

method generateFastAndBind

Class Code in left and the generated AST in right

it’s recommended to use generateFastIfNotDoneAndBind instead of generateFastAndBind in complex project and heavy computation when generating AST

To have a complete vision of the meta-models described above, we give the corresponding figures of each meta-model FAST and FAMIX:

Famix &#x26; Fast Overview

Once Carrefour has been called and the binding is done, we will have the first links between the meta-models as follows:

Famix &#x26; Fast 1st call

As an example, for the condition level variable (a>b) in FAST we would want its correspondence in FAMIX. To do this, we send the #famixVariable message to the FASTJavaVariableExpression object and we get as returned value the corresponding FAMIX variable.

FamixVariable Call

Now we go in the opposite direction, we will access all the matches of the FAMIX variable a in the FAST meta-model.

To do so, we use the #fastAccesses message as in the figure:

fastAccesses Call

Carrefour also provides the #fastDeclaration API to get where a Famix variable has been declared at the FAST meta-model level.

In conclusion, Carrefour allows us to go back and forth between FAST and FAMIX meta-models. The example used in the blog post is not complex but allowed us to see how to navigate between the two meta-models. In large projects, where there are more initialization, invocation, and relationships between entities Carrefour is crucial to perform deep analysis 💪.

Load FAST Pharo/Java model

When we are interested in the migration/modernization of projects we are using models of the project and their meta-models. Moose revolves around a powerful Famix meta-model that allows us to do several operations. For instance, previous posts present how to analyze and query a model, visualize a model with plantUML, or create a model, etc.

FAST is a meta-model that helps us understand source code in a less abstract way. Indeed, FAST is based on AST (Abstract Syntax Tree) which is close to the source code. And as the devil is in the details, FAST contains interesting elements when analyzing programs (for example some specific expression or statement), and effectively this is what makes the difference between FAST and Famix. (Consult this overview about the FAST model).

Abstraction Level

In this blog post we will explain how to load FAST Java and generate a FAST model of Java code. For this we will take ArgoUML, an open-source Java project, as an example.

First of all, we have to understand from where we are going to start and where we are going to end up. As already mentioned, we will take the AgroUML’s java code and the goal is to generate the corresponding AST and do analyses on it. To do this, 3 steps are necessary to have the AST as illustrated in the figure below:

  1. Parse Java to build a Famix model
  2. Load the model into Moose
  3. Generate the AST

Steps for generating FAST Model

Before starting, we must download the source code and the Famix model of the ArgoUML project, step 1 of the diagram above (follow this blog for more details).

Now, we will import the Famix model from the ArgoUML-0-34.json file in the Models Browser. Then, we should know that the FAST meta-model is specific to a gien programming language, i.e for Pharo code we need FAST for Pharo, for X language code we need the FAST meta-model for the X language. Right now, there are two FAST meta-models: FAST Java and FAST Pharo.

In the following, we will generate the AST of a class (or method) for Pharo/Java code in three different ways: directly from some source code, from a method in Pharo, or from a Famix entity.

To install FAST Java you can run the following script on Moose Playground:

Metacello new
githubUser: 'moosetechnology' project: 'FAST-JAVA' commitish: 'v3' path: 'src';
baseline: 'FASTJava';
load: 'all'

To install FAST Pharo use the following script:

Metacello new
baseline: 'FASTPharo';
repository: 'github://moosetechnology/FAST-Pharo:v2/src';
load: 'importer'.

In this case, we will use a specialized importer “FAST-Java importer” to import the AST from a method source code. The complete code of the method to import is between single quote (i.e. a Pharo string) in the following code:

JavaSmaCCProgramNodeImporterVisitor new
parseCodeMethodString: 'public boolean covidTest(Person person) {
if(testCovid(person) == "POSITIVE"){
return true;
} else {
return false;
}
}'

The following script imports the method #collect: of Collection :

FASTSmalltalkImporterVisitor new
runWithSource: (Collection >> #collect:) sourceCode

In this section, we will not proceed as above. Instead, we start from a class/method of the Famix Java model and we will load its FAST representation.

We will add the model to the Playground

Add Model on Playground

We got this:

argoUML034 := MooseModel root at: 1.

We pick any model class from the model:

class := argoUML034 allModelClasses anyOne.

And finally we generate the AST using generateFastJava:

class generateFastJava

One nice way to explore a FAST model is to use the source code and the tree extensions of the inspector. It allows one to navigate in a FAST model and see the code corresponding to each node.

To use it, we start from the Java model loaded above. Then, we select a model method entity. On the right-hand pane of the inspector, select the Tree tab, on the left-hand pane, select the source code extension. The source code is highlighted and the area selected corresponds to the entity selected in the right-hand panel. ( from FAST-Pharo article )

Navigating Through AST

In this post, we saw how to load the AST of a Pharo/Java model using FAST. The FAST model is useful when we need to understand more details about our model (for example identifiers, expression statements .. etc) which are not provided by Famix.