Skip to content

Blog

Creating an importer for an alien grammar

In this blog-post, we see some tricks to create a visitor for an alien AST. This visitor can allow, for example, to generate a Famix model from an external AST.

In a previous blog-post, we saw how to create a parser from a tree-sitter grammar. This parser gives us an AST (Abstract Syntax Tree) which is a tree of nodes representing any given program that the parser can understand. But the structure is decided by the external tool and might not be what we want. For example it will not be a Famix model.

Let see some tricks to help convert this alien grammar into something that better fits our needs.

Let’s first look at what a “Visitor” is. If you already know, you can skip this part.

When dealing with ASTs or Famix models, visitors are very convenient tools to walk through the entire tree/model and perform some actions.

The Visitor is a design pattern that allows to perform some actions on a set of interconnected objects, presumably all from a family of classes. Typically, the classes all belong to the same inheritance hierarchy. In our case, the objects will all be nodes in an AST. For Famix, the objects would be entities from a Famix meta-model.

In the Visitor pattern, all the classes have an #accept: method. Each #accept: in each class will call a visiting method of the visitor that is specific to it. For example the classes NodeA and NodeB will respectively define:

NodeA >> accept: aVisitor
aVisitor visitNodeA: self.
NodeB >> accept: aVisitor
aVisitor visitNodeB: self.

Each visiting method in the visitor will with the element it receives, knowing what is its class: in #visitNodeA: the visitor knows how to deal with a NodeA instance and similarly for #visitNodeB:.

The visitor pattern is a kind of ping-pong between the visiting and #accept: methods: ping-pong of visiting ans accept methods

Typically, all the node are interconnected in a tree or a graph. To walk through the entire structure, it is expected that each visiting method take care of visiting the sub-objects of the current object. For example we could say that NodeA has a property child containing another node:

NodeVisitor >> visitNodeA: aNodeA
"do some stuff"
aNodeA child accept: self

It is easy to see that if child contains a NodeB, this will trigger the visiting method visitNodeB: on it. If it’s a instance of some other class, similarly it will trigger the appropriate visiting method. To visit the entire structure one simply calls accept: on the root of the tree/graph passing it the visitor.

Visitors are very useful with ASTs or graphs because once all the accept: methods are implemented, we can define very different visitors that will "do some stuff" (see above) on all the object in the tree/graph.

Several of the “Famix-tools” blog-posts are based on visitors.

In a preceding blog-post we saw how to create an AST from a Perl program using the Tree-Sitter Perl grammar.

We will use this as an example to see how to create a visitor on this external AST. Here “external” means it was created by an external tool and we don’t have control on the structure of the AST. If we want to create a Famix-Perl model from a Tree-Sitter AST, we will need to convert the nodes in the Tree-Sitter AST into Famix entities.

We will use a simple Perl program as example:

package Person;
sub new {
my $class = shift;
my $self = {
_firstName => shift,
_lastName => shift,
_ssn => shift,
};
# Print all the values just for clarification.
print "First Name is $self->{_firstName}\n";
print "Last Name is $self->{_lastName}\n";
print "SSN is $self->{_ssn}\n";
bless $self, $class;
return $self;
}
sub setFirstName {
my ( $self, $firstName ) = @_;
$self->{_firstName} = $firstName if defined($firstName);
return $self->{_firstName};
}
sub getFirstName {
return $self->{_firstName};
}

(Note: In Perl, “package” is used to create classes. Therefore in our example, “new”, “setFirstName”, and “getFirstName” are some kind of Perl methods.)

Following the instructions in the previous post, you should be able to get a Tree-Sitter AST like this one:

External AST from Tree-Sitter

To have a visitor for this AST, we first need to have an accept: method in all the classes of the AST’s nodes. Fortunately this is all taken care of by the Pharo Tree-Sitter project. In TSNode one finds:

accept: aTSVisitor
^ aTSVisitor visitNode: self

And a class TSVisitor defines:

visitNode: aTSNode
aTSNode collectNamedChild do: [ :child |
child accept: self ]

Which is a method ensuring that all children of a TSNode will be visited. Thanks guys!

But less fortunately, there are very few different nodes in a Tree-Sitter AST. Actually, all the nodes are instances of TSNode. So the “subroutine_declaration_statement”, “block”, “expression_statement”, “return_expression”,… of our example are all of the same class, which is not very useful for a visitor.

This happens quite often. For example a parser dumping an AST in XML format will contain mostly XMLElements. If it is in JSON, they are all “objects” without any native class specification in the format. 😒

Fortunately, people building ASTs usually put inside a property with an indication of the type of each node. For Tree-Sitter, this is the “type” property. Every TSnode has a type which is what is displayed in the screenshot above.

How can we use this to help visiting the AST in a meaningfull way (from a visitor point a view)? We have no control on the accept: method in TSNode, it will always call visitNode:. But we can add an extra indirection to call different visiting methods according to the type of the node.

So, our visitor will inherit from TSVisitor but it will override the visitNode: method. The new method will take the type of the node, build a visiting method name from it, and call the method on the node.

Let’s decide that all our visiting methods will be called “visitPerl<some-type>”. For example for a “block”, the method will be visitPerlBlock:, for a “return_expression” it will be `visitPerlReturn_expression:”.

This is very easily done in Pharo with this method:

visitNode: aTSNode
| selector |
selector := 'visitPerl' , aTSNode type capitalized , ':'.
^self perform: selector asSymbol with: aTSNode

This method builds the new method name in a temporary variable selector and then calls it using perform:with:.

Note that the type name is capitalized to match the Pharo convention for method names. We could have removed all the underscores (_) but it would have required a little bit of extra work. This is not difficult with string manipulation methods. You could try it… (or you can continue reading and find the solution further down.)

With this simple extra indirection in #visitNode:, we can now define separate visiting method for each type of TSNode. For example to convert the AST to a Famix model, visitPerlPackage: would create a FamixPerlClass, and visitPerlSubroutine_declaration_statement: will create a FamixPerlMethod. (Of course it is a bit more complex than that, but you got the idea, right?)

Our visitor is progressing but not done yet. If we call astRootNode accept: TreeSitterPerlVisitor new with the root node of the Tree-Sitter AST, it will immediately halt on a DoesNotUnderstand error because the method visitPerlSource_file: does not exist in the visitor.

We can create it that way:

visitPerlSource_file: aTSNode
^self visitPerlAbstractNode: aTSNode.
visitPerlAbstractNode: aTSNode
^super visitNode: aTSNode

Here we introduce a visitPerlAbstractNode: that is meant to be called by all visiting methods. From the point of view of the visitor, we are kind of creating a virtual inheritance hierarchy where each specific TSNode will “inherit” from that “PerlAbstractNode”. This will be useful in the future when we create sub-classes of our visitor.

By calling super visitNode:, in visitPerlAbstractNode: we ensure that the children of the “source_file” will be visited. And… we instantly get a new halt with DoesNotUnderstand: visitPerlPackage_statement:. Again we define it:

visitPerlPackage_statement: aTSNode
^self visitPerlAbstractNode: aTSNode

This is rapidly becoming repetitive and tedious. There are a lot of methods to define (25 for our example) and they are all the same.

Let’s improve that. We will use the Pharo DoesNotUnderstand mechanism to automate everything. When a message is sent that an object that does not understand it, then the message doesNotUnderstand: is sent to this object with the original message (not understood) as parameter. The default behavior is to raise an exception, but we can change that. We will change doesNotUnderstand: so that it creates the required message automatically for us. This is easy all we need to do is create a string:

visitPerl<some-name>: aTSNode
^self visitPerlAbstractNode: aTSNode

We will then ask Pharo to compile this method in the Visitor class and to execute it. et voila!

Building the string is simple because the selector is the one that was not understood originally by the visitor. We can get it from the argument of doesNotUnderstand:.

So we define the method like that:

doesNotUnderstand: aMessage
| code |
code := aMessage selector , ' aTSNode
^super visitNode: aTSNode'.
self class compile: code classified: #visiting.
self perform: aMessage selector with: aMessage arguments first

First we generate the source code of the method in the code variable. Then we compile it in the visitor’s class. Last we call the new method that was just created. Here to call it, we use perform:with: again, knowing that our method has only one argument (so only one “with:” in the call).

For more security, it can be useful to add the following guard statement at the beginning of our doesNotUnderstand: method:

(aMessage selector beginsWith: 'visitPerl')
ifFalse: [ super doesNotUnderstand: aMessage ].

This ensures that we only create methods that begins with “visitPerl”, if for any reason, some other message is not understood, it will raise an exception as usual.

Now visiting the AST from our example creates all the visiting methods automatically: All the visiting method created

Of course this visitor does not do anything but walking through the entire AST. Let’s say it is already a good start and we can create specific visitors from it.

For example we see in the screen shot above that there is a TreeSitterPerlDumpVisitor. It just dumps on the Transcript the list of node visited. For this, it only needs to define:

visitPerlAbstractNode: aTSNode
('visiting a ', aTSNode type) traceCr.
super visitPerlAbstractNode: aTSNode.

Et voila! (number 2)

Note: Redefining doesNotUnderstand: is a nice trick to quickly create all the visiting methods, but it is recommended that you remove it once the visitor is stable, to make sure you catch all unexpected errors in the future.

This is all well and good, but the visiting methods have one drawback: They visit the children of a node in an unspecified order. For example, an “assignment_expression” has two children, the variable assigned and the expression assigned to it. We must rely on Tree-Sitter to visit them in the right order so that the first child is always the variable assigned and the second child is always the right-hand-side expression.

It would be better to have a name for these children so as to make sure that we know what we are visiting at any time.

In this case, Tree-Sitter helps us with the collectFieldNameOfNamedChild method of TSNode. This method returns an OrderedDictionary where the children are associated to a (usually) meaningful key. In the case of “assignment_expression” the dictionary has two keys: “left” and “right” each associated to the correct child. It would be better to call them instead of blindly visit all the children.

So we will change our visitor for this. The visitNode: method will now call the visiting method with the dictionnary of keys/children as second parameter, the dictionnary of fields. This departs a bit from the traditional visitor pattern where the visiting methods usually have only one argument, the node being visited. But the extra information will help make the visiting methods simpler:

visitNode: aTSNode
| selector |
selector := String streamContents: [ :st |
st << 'visitPerl'.
($_ split: aTSNode type) do: [ :word | st << word capitalized ].
st << ':withFields:'
].
^self
perform: selector asSymbol
with: aTSNode
with: aTSNode collectFieldNameOfNamedChild

It looks significantly more complex, but we also removed the underscores (_) in the visiting method selector (first part of the #visitNode: method). So for “assignment_expression”, the visiting method will now be: visitPerleAssignmentExpression:withFields:.

From this, we could have the following template for our visiting methods:

visitPerlAssignmentExpression: aTSNode withFields: fields
^{
self visitKey: 'left' inDictionnary: fields.
self visitKey: 'right' inDictionnary: fields.
}

Where visitKey: inDictionnary: takes care of the fact that several nodes may be associated to the same key. Here it is:

visitKey: aKey inDictionnary: childrenDictionnary
| child |
child := childrenDictionnary at: aKey ifAbsent: [^nil].
^child isCollection
ifTrue: [ child collect: [ :c | c accept: self ] ]
ifFalse: [ child accept: self ]

The doesNotUnderstand: method to generate all this is also more complex because there is more to generate. Here it is:

doesNotUnderstand: aMessage
(aMessage selector beginsWith: 'visitPerl')
ifFalse: [ super doesNotUnderstand: aMessage ].
self class
compile: (self createVisitMethod: aMessage)
classified: #visiting.
self
perform: aMessage selector
with: aMessage arguments first
with: aMessage arguments second

The code generation has been extracted in a separate method for the sake of readability:

createVisitMethod: aMessage
| fields aTSNode |
aTSNode := aMessage arguments first.
fields := aMessage arguments second.
^String streamContents: [ :str |
str
<< 'visitPerl'.
($_ split: aTSNode type) do: [ :word | str << word capitalized ].
str
<< ': aTSNode withFields: fields
^{
'.
fields keysDo: [ :key |
str
<< ' self visitKey: ''' ;
<< key ;
<< ''' inDictionnary: fields.' ;
cr
].
str
<< ' }' ;
cr
]

Again, it may look a bit complex, but this is only building a string with the needed source code. Go back to the listing of #visitPerlAssignmentExpression: above to see that:

  • we first build the selector of the new visiting method with its parameter;
  • then we put a return and start a dynamic array;
  • after that we create a call to #visitKey:inDictionnary for each field;
  • and finally, we close the dynamic array.

Et voila! (number 3).

This is it. If we call again this visitor on an AST from Tree-Sitter, it will generate all the new visiting methods with explicit field visiting. For example: Explicit visiting of a node&#x27;s fields

The implementation of all this can be found in the https://github.com/moosetechnology/Famix-Perl repository on github. All that’s left to do is create a sub-class of this visitor and override the visiting methods to do something useful with each node type.

That’s all for today folks.

Creating a Parser based on Tree-Sitter grammar

Moose is a huge consumer of language parsers. Relying on external tools help us with this.

We are always looking into integrating new programming languages into the platform. There are two main requirements for this:

  • create a parser of the language, to “understand” the source code
  • create a meta-model for the language, to be able to represent and manipulate the source code

Creating the meta-model has already been covered in an other blogpost: /blog/2021-02-04-coasters

In this post, we will be looking at how to use a Tree-Sitter grammar to help build a parser for a language. We will use the Perl language example for this.

Note: Creating a parser for a language is a large endehavour that can easily take 3 to 6 months of work. Tree-Sitter, or any other grammar tool, will help in that, but it remains a long task.

We do not explain in detail here how to install tree-sitter or a new Tree-Sitter grammar. I found this page (https://dcreager.net/2021/06/getting-started-with-tree-sitter/) useful in this sense.

For this blog post, we will use the Perl grammar in https://github.com/tree-sitter-perl/tree-sitter-perl.

Do the following:

  • clone the repository on your disk
  • go in the directory
  • do make (note: it gave me some error, but the library file was generated all the same)
  • (on Linux) it creates a libtree-sitter-perl.so dynamic library file. This must be moved in some standard library path (I chose /usr/lib/x86_64-linux-gnu/ because this is where the libtree-sitter.so file was).

Pharo uses FFI to link to the grammar library, that’s why it’s a good idea to put it in a standard directory. You can also put this library file in the same directory as your Pharo image, or in the directory where the Pharo launcher puts the virtual machines.

The subclasses of FFILibraryFinder can tell you what are the standard directories on your installation. For example on Linux, FFIUnix64LibraryFinder new paths returns a list of paths that includes '/usr/lib/x86_64-linux-gnu/' where we did put our grammar.so file.

We use the Pharo-Tree-Sitter project (https://github.com/Evref-BL/Pharo-Tree-Sitter) of Berger-Levrault, created by Benoit Verhaeghe, a regular contributor to Moose and this blog. You can import this project in a Moose image following the README instructions.

Metacello new
baseline: 'TreeSitter';
repository: 'github://Evref-BL/Pharo-Tree-Sitter:main/src';
load.

The README file of Pharo-Tree-Sitter gives an example of how to use it for Python:

parser := TSParser new.
tsLanguage := TSLanguage python.
parser language: tsLanguage.
[...]

We want to have the same thing for Perl, so we will need to define a TSLanguage class >> #perl method. Let’s take a look at how it’s done in Python:

TSLanguage class >> #python
^ TSPythonLibrary uniqueInstance tree_sitter_python

It’s easy to do something similar for perl:

TSLanguage class >> #perl
^ TSPerlLibrary uniqueInstance tree_sitter_perl

But we need to define the TSPerlLibrary class. Again let’s look at how it’s done for Python and copy that:

  • create a TreeSitter-Perl package
  • create a TSPerlLibrary class in it inheriting from FFILibrary
  • define the class method:
    tree_sitter_perl
    ^ self ffiCall: 'TSLanguage * tree_sitter_perl ()'
  • and define the class methods for FFI (here for Linux):
    unix64LibraryName
    ^ FFIUnix64LibraryFinder findAnyLibrary: #( 'libtree-sitter-perl.so' )

Notice that we gave the name of the dynamic library file created above (libtree-sitter-perl.so). If this file is in a standard library directory, FFI will find it.

We can now experiment “our” parser on a small example:

parser := TSParser new.
tsLanguage := TSLanguage perl.
parser language: tsLanguage.
string := '# this is a comment
my $var = 5;
'.
tree := parser parseString: string.
tree rootNode

This gives you the following window:

"A first Tree-Sitter AST for Perl"

That looks like a very good start!

But we are still a long way from home. Let’s look at a node of the tree for fun.

node := tree rootNode firstNamedChild will give you the first node in the AST (the comment). If we inspect it, we see that it is a TSNode

  • we can get its type: node type returns the string 'comment'
  • node nextSibling returns the next TSNode, the “expression-statement”
  • node startPoint and node endPoint tell you where in the source code this node is located. It returns instances of TSPoint:
    • node startPoint row = 0 (0 indexed)
    • node startPoint column = 0
    • node endPoint row = 0
    • node endPoint column = 19 That is to say the node is on the first row, extending from column 0 to 19. With this, one could get the text associated to the node from the original source code.

That’s it for today. In a following post we will look at doing something with this AST using the Visitor design pattern.

See you latter

First look at GitProjectHealth

When it comes to understand a software system, we are often focusing on the software artifact itself. What are the classes? How they are connected with each other?

In addition to this analysis of the system, it can be interesting to explore how the system evolves through time. To do so, we can exploit its git history. In Moose, we developed the project GitProjectHealth that enables the analysis of git history for projects hosted by GitHub, GitLab, or BitBucket. The project also comes with a set of metrics one could use directly.

GitProjectHealth is available in the last version of Moose, it can be easily installed using a Metacello script in a playground.

Metacello new
repository: 'github://moosetechnology/GitProjectHealth:main/src';
baseline: 'GitLabHealth';
onConflict: [ :ex | ex useIncoming ];
onUpgrade: [ :ex | ex useIncoming ];
onDowngrade: [ :ex | ex useLoaded ];
load

For this first blog post, we will experiment GitProjectHealth on the Famix project. Since this project is a GitHub project, we first create a GitHub token that will give GitProjectHealth the necessary authorization.

Then, we import the moosetechnology group (that hosts the Famix project).

glhModel := GLHModel new.
githubImporter := GithubModelImporter new
glhModel: glhModel;
privateToken: '<private token>';
yourself.
githubImporter withCommitsSince: (Date today - 100 days).
group := githubImporter importGroup: 'moosetechnology'.

This first step allows us to get first information on projects. For instance, by inspecting the group, we can select the “Group quality” view and see the group projects and the last status of their pipelines.

Group Quality view for moosetechnology

Then, by navigating to the Famix project and its repository, you can view the Commits History.

alt text.

It is also possible to explore the recent commit distribution by date and author

commit distribution.

In this visualization, we discover that the most recent contributors are “Clotilde Toullec” and “CyrilFerlicot”. The “nil” refers to a commit authors that did not fill GitHub with their email. It is anquetil (probably the same person as “Nicolas Anquetil”). The square without name is probably someone that did not fill correctly the local git config for username.

A popular metric when looking at git history is the code churn. Code churn refer to edit of code introduced in the past. It corresponds to the percentage of code introduced in a commit and then modified in other comments during a time period (e.g in the next week). However many code churn definitions exit.

The first step is thus to discover what commits modified my code. To do so, we implemented in GitProjectHealth information about diff in commit.

To extract this information, we first ask GitProjectHealth to extract more information for the commits of the famix project.

famix := group projects detect: [ :project | project name = 'Famix' ].
"I want to go deeper in analysis for famix repository, so I complete commit import of this project"
githubImporter withCommitDiffs: true.
famix repository commits do: [ :commit | githubImporter completeImportedCommit: commit ].

Then, when inspecting a commit, it is possible to switch to the “Commits tree” view.

Commit Tree

Here how to read to above example

  • The orange square “Remove TClassWithVisibility…” is the inspected commit.
  • The gray square is the parent commit of the selected ones.
  • The red squares are subsequent commits that modify at least one file in common with the inspected commit
  • The green squares are commits that modifies other part of the code

Based on this example, we see that Clotilde Toullec modifies code introduced in selected commits in three next commits. Two are Merged Pull Request. This can represent linked work or at least actions on the same module of the application.

Can we go deeper in the analysis?

It is possible to go even deeper in the analysis by connecting GitProjectHealth with other analysis. This is possible by connecting metamodels. For instance, it is possible to link GitProjectHealth with Jira system, of Famix models. You can look at the first general documentation, or stay tune for the next blog post about GitProjectHealth!

Control Flow Graph for FAST Fortran

A Control Flow Graph analysis for FAST Fortran

Section titled “A Control Flow Graph analysis for FAST Fortran”

Control Flow Graphs (CFG) are a common tool for static analyzis of a computation unit (eg. a method) and find some errors (unreachable code, infinite loops)

It is based on the concept of Basic Block: a sequence of consecutive statements in which flow of control can only enter at the beginning and leave at the end. Only the last statement of a basic block can be a branch statement and only the first statement of a basic block can be a target of a branch.

There are two distinctive basic blocks:

  • Start Block: The entry block allows the control to enter into the control flow graph. There should be only one start block.
  • Final Block: Control flow leaves through the exit block. There may be several final blocks.

The package FAST-Fortran-Analyses in https://github.com/moosetechnology/FAST-Fortran contains classes to build a CFG of a Fortran program unit (a main program, a function, or a subroutine).

We must first create a FAST model of a Fortran program. For this we need an external parser. We currently use fortran-src-extras from https://github.com/camfort/fortran-src-extras.

To run it on a fortran file you do:

fortran-src-extras serialize -t json -v77l encode <fortran-file.f>

This will produce a json AST of the program that we can turn into a FAST-Fortran AST.

If you have fortran-src-extras installed on your computer, all this is automated in FAST-Fortran

<fortran-file.f> asFileReference
readStreamDo: [ :st |
FortranProjectImporter new getFASTFor: st contents ]

This script will create an array of ASTs from the <fortran-file.f> given fortran file. If there are several program units in the file, there will be several FAST models in this array. In the example below, there is only one program, so the list contains only the AST for this program.

We will use the following Fortran-77 code:

PROGRAM EUCLID
* Find greatest common divisor using the Euclidean algorithm
PRINT *, 'A?'
READ *, NA
IF (NA.LE.0) THEN
PRINT *, 'A must be a positive integer.'
STOP
END IF
PRINT *, 'B?'
READ *, NB
IF (NB.LE.0) THEN
PRINT *, 'B must be a positive integer.'
STOP
END IF
IA = NA
IB = NB
1 IF (IB.NE.0) THEN
ITEMP = IA
IA = IB
IB = MOD(ITEMP, IB)
GOTO 1
END IF
PRINT *, 'The GCD of', NA, ' and', NB, ' is', IA, '.'
STOP
END

From the FAST model above, we will now create a Control-Flow-Graph:

<FAST-model> accept: FASTFortranCFGVisitor new

The class FASTFortranCFGVisitor implements an algorithm to compute basic blocks from https://en.wikipedia.org/wiki/Basic_block.

This visitor goes throught the FAST model and creates a list of basic blocks that can be inspected with the #basicBlocks method.

There is a small hierarchy of basic block classes:

  • FASTFortranAbstractBasicBlock, the root of the hierarchy. It contains #statements (which are FAST statement nodes). It has methods to test its nature: isStart, isFinal, isConditional. It defines an abstract method #nextBlocks that returns a list of basic blocks that this one can directly reach. Typically there are 1 or 2 next blocks, but Fortran can have more due to “arithmetic IF”, “computed GOTO” and “assigned GOTO” statements.
  • FASTFortranBasicBlock, a common basic block with no branch statement. If it is final, its #nextBlocks is empty, otherwise it’s a list of 1 block.
  • FASTFortranConditionalBasicBlock, a conditional basic block. It may reach several #nextBlocks, each one associated with a value, for example true and false. The method #nextBlockForValue: returns the next block associated to a given value. In our version of CFG, a conditional block may only have one statement (a conditional statement).

You may have noticed that our blocks are a bit different from the definition given at the beginning of the blog-post:

  • our “common” blocs cannot have several next, they never end with a conditional statement;
  • our conditional blocks can have only one statement.

For the program above, the CFG has 10 blocks.

  • the first block is a common block and contains 2 statements, the PRINT and the READ;
  • its next bloc is a conditional block for the IF. It has 2 next blocs:
    • true leads to a common block with 2 statements, the PRINT and the STOP. This is a final block (STOP ends the program);
    • false leads to the common block after the IF

As a first analysis tool, we can visualize the CFG. Inspecting the result of the next script will open a Roassal visualization on the CFG contained in the FASTFortranCFGVisitor.

FASTFortranCFGVisualization on: <aFASTFortranCFGVisitor>

For the program above, this gives the visualization below.

  • the dark dot is the starting block (note that it is a block and contains statements);
  • the hollow dots are final blocks;
  • it’s not the case here, but a block may also be start and final (if there are no conditional blocks in the program) and this would be represented by a “target”, a circle with a dot inside;
  • a grey square is a comon block;
  • a blue square is a conditional block;
  • hovering the mouse on a block will bring a pop up with the list of its statements (this relies on the FASTFortranExporterVisitor)

"Viualizing the Control Flow Graph"

One can see that:

  • the start block has 2 associated statements (PRINT and READ);
  • there are several final blocks, due to the STOP statements;
  • there is a loop at the bottom left of the graph where the last blue conditional block is “IF (IB.NE.0)” and the last statement of the grey block (true value of the IF), is a GOTO.

There are little analyses for now on the CFG, but FASTFortranCFGChecker will compute a list of unreachableBlocks that would represent dead code.

Control flow graphs may also be used to do more advanced analyses and possibly refactor code. For example, we mentioned the loop at the end of our program implemented with a IF statement and a GOTO. This could be refactored into a real WHILE loop that would be easier to read.

This is left as an exercise for the interested people 😉

Building a control flow graph is language dependant to identify the conditional statements, where they lead, and the final statements.

But much could be done in FAST core based on FASTTReturnStatement and a (not yet existing at the time of writing) FASTTConditionalStatement.

Inspiration could be taken from FASTFortranCFGVisitor and the process is not overly complicated. It would probably be even easier for modern languages that do not have the various GOTO statements of Fortran.

Once the CFG is computed, the other tools (eg. the visualization) should be completely independant of the language.

All hands on deck!

Some tools on FAST models

The package FAST-Core-Tools in repository https://github.com/moosetechnology/FAST offers some tools or algorithms that are running on FAST models.

These tools may be usable directly on a specific language FAST meta-model, or might require some adjustements by subtyping them. They are not out-of-the-shelf ready to use stuff, but they can provide good inspiration for whatever you need to do.

Writing test for FAST can be pretty tedious because you have to build a FAST model in the test corresponding to your need. It often has a lot of nodes that you need to create in the right order with the right properties.

This is where FASTDumpVisitor can help by visiting an existing AST and “dump” it as a string. The goal is that executing this string in Pharo should recreate exactly the same AST.

Dumping an AST can also be useful to debug an AST and checking that it has the right properties.

To use it, you can just call FASTDumpVisitor visit: <yourAST> and print the result. For example:

FASTDumpVisitor visit:
(FASTJavaUnaryExpression new
operator: '-' ;
expression:
(FASTJavaIntegerLiteral new
primitiveValue: '5'))

will return the string: FASTJavaUnaryExpression new expression:(FASTJavaIntegerLiteral new primitiveValue:'5');operator:'-' which, if evaluated, in Pharo will recreate the same AST as the original.

Note: Because FAST models are actually Famix models (Famix-AST), the tools works also for Famix models. But Famix entities typically have more properties and the result is not so nice:

FASTDumpVisitor visit:
(FamixJavaMethod new
name: 'toto' ;
parameters: {
FamixJavaParameter new name: 'x' .
FamixJavaParameter new name: 'y'} ).

will return the string: FamixJavaMethod new parameters:{FamixJavaParameter new name:'x';isFinal:false;numberOfLinesOfCode:0;isStub:false.FamixJavaParameter new name:'y';isFinal:false;numberOfLinesOfCode:0;isStub:false};isStub:false;isClassSide:false;isFinal:false;numberOfLinesOfCode:-1;isSynchronized:false;numberOfConditionals:-1;isAbstract:false;cyclomaticComplexity:-1;name:'toto'.

By definition an AST (Abstract Syntax Tree) is a tree (!). So the same variable can appear several time in an AST in different nodes (for example if the same variable is accessed several times).

The idea of the class FASTLocalResolverVisitor is to relate all uses of a symbol in the AST to the node where the symbol is defined. This is mostly useful for parameters and local variables inside a method, because the local resover only looks at the AST itself and we do not build ASTs for entire systems.

This local resolver will look at identifier appearing in an AST and try to link them all together when they correspond to the same entity. There is no complex computation in it. It just looks at names defined or used in the AST.

This is dependant on the programming language because the nodes using or defining a variable are not the same in all languages. For Java, there is FASTJavaLocalResolverVisitor, and for Fortran FASTFortranLocalResolverVisitor.

The tool brings an extra level of detail by managing scopes, so that if the same variable name is defined in different loops (for example), then each use of the name will be related to the correct definition.

The resolution process creates:

  • In declaration nodes (eg. FASTJavaVariableDeclarator or FASTJavaParameter),a property #localUses will list all referencing nodes for this variable;
  • In accessing nodes, (eg. FASTJavaVariableExpression), a property #localDeclarations will lists the declaration node corresponding this variable.
  • If the declaration node was not found a FASTNonLocalDeclaration is used as the declaration node.

Note: That this looks a bit like what Carrefour does (see /blog/2022-06-30-carrefour), because both will bind several FAST nodes to the same entity. But the process is very different:

  • Carrefour will bind a FAST node to a corresponding Famix node;
  • The local resolver binds FAST nodes together.

So Carrefour is not local, it look in the entire Famix model to find the entity that matches a FAST node. In Famix, there is only one Famix entity for one software entity and it “knows” all its uses (a FamixVariable has a list of FamixAccess-es). Each FAST declaration node will be related to the Famix entity (the FamixVariable) and the FAST use nodes will be related to the FamixAccess-es.

On the other hand, the local resolver is a much lighter tool. It only needs a FAST model to work on and will only bind FAST nodes between themselves in that FAST model.

For round-trip re-engineering, we need to import a program in a model, modify the model, and re-export it as a (modified) program. A lot can go wrong or be fogotten in all these steps and they are not trivial to validate.

First, unless much extra information is added to the AST, the re-export will not be syntactically equivalent: there are formatting issues, indentation, white spaces, blank lines, comments that could make the re-exported program very different (apparently) from the original one.

The class FASTDifferentialValidator helps checking that the round-trip implementation works well. It focuses on the meaning of the program independently of the formatting issues. The process is the follwing:

  • parse a set of (representative) programs
  • model them in FAST
  • re-export the programs
  • re-import the new programs, and
  • re-create a new model

Hopefully, the two models (2nd and last steps) should be equivalent This is what this tool checks.

Obviously the validation can easily be circumvented. Trivially, if we create an empty model the 1st time, re-export anything, and create an empty model the second time, then the 2 models are equivalent, yet we did not accomplish anything. This tool is an help for developers to pinpoint small mistakes in the process.

Note that even in the best of conditions, there can still be subtle differences between two equivalent ASTs. For example the AST for “a + b + c” will often differ from that of “a + (b + c)”.

The validator is intended to run on a set of source files and check that they are all parsed and re-exported correctly. It will report differences and will allow to fine tune the comparison or ignore some differences.

It goes through all the files in a directory and uses an importer, an exporter, and a comparator. The importer generates a FAST model from some source code (eg. JavaSmaCCProgramNodeImporterVisitor); the exporter generates source code from a model (eg. FASTJavaExportVisitor); the comparator is a companion class to the DifferentialValidator that handle the differences between the ASTs.

The basic implementation (FamixModelComparator) does a strict comparison (no differences allowed), but it has methods for accepting some differences:

  • #ast: node1 acceptableDifferenceTo: node2: If for some reason the difference in the nodes is acceptable, this method must return true and the comparison will restart from the parent of the two nodes as if they were the same.
  • #ast: node1 acceptableDifferenceTo: node2 property: aSymbol. This is for property comparison (eg. the name of an entity), it should return nil if the difference in value is not acceptable and a recovery block if it is acceptable. Instead of resuming from the parent of the nodes, the comparison will resume from an ancestor for which the recovery block evaluates to true.