The backend I analyzed follows a common pattern.
In the git repository, there is a folder api containing the microservices, and a folder lib with resources for each microservice.
There is also an additional project called lib-common.
Thus, the microservice home is composed of a project named api-home and a project named lib-home.
Directorysrc
Directoryapi
Directoryapi-home
Directorysrc/
…
Directorylib
Directorylib-home
Directorysrc/
…
Directorylib-common/
…
We wanted to check that dependencies were correctly implemented in the project:
no api project should directly depend on another api (API calls are allowed, but not classic Java dependencies)
each api project can depend on its equivalent lib project
Moose provides ready-to-use visualizations to represent dependencies. In my case, I chose to use the Architectural map.
This visualization presents the entities of the model (packages, classes, methods) as a tree and displays the associations between them (i.e., the dependencies).
I first asked this visualization to display all the classes. It works, but does not allow us to distinguish the different microservices.
The main problem is that too much information is displayed and we cannot see the microservices.
To fix this, I used Moose’s tag feature.
A tag allows you to associate a color and a name to an entity.
So I tagged the classes of my system depending on their location in the repository.
To do this, in a Moose Playground, I used the following script (adapt it to your context 😉):
model allTaggedEntities do: [ :entity| entity removeTags ].
(sa fileName beginsWith: './services/api-A') ifTrue: [ class tagWithName: 'A' ].
(sa fileName beginsWith: './services/api-B') ifTrue: [ class tagWithName: 'B' ].
(sa fileName beginsWith: './services/api-C') ifTrue: [ class tagWithName: 'C' ].
(sa fileName beginsWith: './libraries/lib-A') ifTrue: [ class tagWithName: 'lib-A' ].
(sa fileName beginsWith: './libraries/lib-common') ifTrue: [ class tagWithName: 'lib-common' ].
(sa fileName beginsWith: './libraries/lib-B') ifTrue: [ class tagWithName: 'lib-B' ].
(sa fileName beginsWith: './libraries/lib-C') ifTrue: [ class tagWithName: 'lib-C' ].
]
].
(model allWithSubTypesOf: FamixJavaType) reject: [ :type| type tags isEmpty ]
The result is not perfect yet because entities are not grouped by tag.
To fix this, simply select the tag to add option in the architectural map settings.
You then get a clear visualization of the links between the microservice projects and the libraries they use. We see that no api is linked to an incorrect lib project.
We also notice that microservice B is linked to lib-B as well as lib-common.
Maybe this link to lib-common should be removed? But that’s another story…
Analyzing source code starts with parsing and for this you need semantic understanding of how symbols in the code relate to each other.
In this post, we’ll walk through how to build a C code importer using the TreeSitterFamixIntegration framework.
The TreeSitterFamixIntegration stack provides tools to ease the development of Famix importers using tree-sitter.
This package offers some great features for parsing such as (but not limited to):
Useful methods for source management (getting source text, positions, setting sourceAnchor of a famix entity).
Error handling to help catch and report parsing issues
a better TreeSitter node inspector (which is very helpful when debugging)
Utility to efficiently import and attach single-line and multi-line comments to their corresponding entities.
Context tracking for symbol scope (no more context push and pop 😁)
There is a detailed documentation you can check that explain every features.
First, we need to load the C metamodel. This metamodel provides the Famix classes that represent C entities such as functions, structs, variables, etc.
The FamixCimporter class is the entry point for our importer. It will handle the parsing of C files into Abstract Syntax Trees (AST).
This class will inherit from FamixTSAbstractImporter (defined in the TreeSitterFamixIntegration project), which provides the necessary methods for importing and parsing C files using Tree-sitter.
FamixTSAbstractImporter <<#FamixCImporter
slots: {};
package: 'Famix-C-Importer'
Now, let’s override some methods to set up our importer:
"Should return a TreeSitter language such as TSLanguage python"
^ TSLanguage cLang
This method returns the Tree-sitter language we want to use for parsing. In this case, we are using the C language. You can find the available languages in the Pharo-Tree-Sitter package.
This method calls importFile: on all C files recursively found in a directory.
We will add more logic to this method later but for now, it serves as a starting point for our importer.
The isCFile: method checks if the file has a .c or .h extension.
FamixCImporter >> isCFile: aFileReferencemon
^#( 'c''h' ) includes: aFileReference extension
The importFile: method is defined in the FamixTSAbstractImporter class (provided by the TreeSitter-Famix-Integration project).
It parses the file content to create an AST and then passes the visitor (the FamixCVisitor that we previously defined) to walk through the AST.
The FamixCVisitor class is responsible for walking through the parsed AST and creating Famix entities. It will inherit from FamixTSAbstractVisitor, which provides the necessary methods for visiting Tree-sitter nodes.
FamixTSAbstractVisitor <<#FamixCVisitor
slots: {};
package: 'Famix-C-Importer'
For this class, we will just need to override one method:
It returns the Famix metamodel class that will be used to create Famix entities. In this case, we are using FamixCModel which is in the Famix-Cpp package.
Now that we have our importer and visitor classes set up, we can already test it.
To test our importer, we can create a simple C file and import it using the FamixCImporter class.
test.c
#include<stdio.h>
int aGlobalVar =1;
intmain() {
int aLocalVar;
aLocalVar = aGlobalVar +2;
}
To import this file, we can use the following code in the Playground (cmd + O + P to open it):
Before running the above code, open the Transcript to see the logs (cmd + O + T to open it).
Then select all the code and run it by inspecting it (cmd + I or click the “Inspect” button). You will get something similar to this.
The above screenshot shows what is inside our model. We can see that there is pretty much nothing there yet apart from the SourceLanguages which is added by default by TreeSitterFamixIntegration.
Now if we look at the Transcript, we can see that the importer has imported the file but we didn’t implement the visitor methods yet for every node in the AST, so no Famix entities were created.
If you want to inspect the corresponding AST of our test file, you can do something similar to what is in this other blog post on tree-sitter.
Let’s go back to our FamixCImporter class and from there we will create a CompilationUnit and HeaderFile entities. We need to do that there because we have to check if the file is a header file or a source file.
visitor model newCompilationUnitNamed: aFileReference basename.
]
ifFalse: [
visitor model newHeaderFileNamed: aFileReference basename.
].
visitor
useCurrentEntity: fileEntity
during: [ self importFile: aFileReference ] ]
ifFalse: [
aFileReference children do: [ :each|
self importFileReference: each
].
^self ]
We use the useCurrentEntity:during: to provide a context for the visitor. This is same as pushing the fileEntity to a context, visit children and then popping it from the context. And it will set the current entity to the fileEntity.
Now try importing a whole directory containing C files. You should see that the importer creates a FamixCHeaderFile for each header file and a FamixCCompilationUnit for each source file.
To set the source anchor for any Famix entity, we can use the setSourceAnchor: aFamixEntity from: aTSNode method provided by the FamixTSAbstractVisitor class. This method takes a Famix entity and a Tree-sitter node.
We can use it to set the source anchor for our fileEntity . Go to visitTranslationUnit: in the FamixCVisitor class and add the following code:
Next, we will create FamixCFunction entities for each function declaration in the C file. We will do this in the visitFunctionDefinition: method of the FamixCVisitor class.
But first we need to know where the function name is located to create the FamixCFunction entity. Create the method and put a halt there to inspect the node.
visitFunctionDefinition: aNode
self halt.
self visitChildren: aNode.
If we look at the function definition node, we can see that the function name is in the identifier node, which is a child of the function declarator node.
To get that name, there are two ways:
visit the function_declarator until the identifier returns its name using self visit: aNode
get it by child field name using aNode _fieldName that returns the child node with the given field name. And you don’t need to implement the _fieldName method because it is already handled by the framework.
For simplicity, and to show other available features in the framework, we will use the second way.
Let’s inspect the function definition node to see what fields it has.
So if we do aNode _declarator it will return the function declarator node
And if we do aNode _declarator from the function_declarator it will give us the identifier that we want.
Now we can create the function entity and set its name and source anchor.
The self currentEntity returns the compilation unit entity which is the parent of the function entity.
And before visiting the children, we set the current entity to the newly created function entity using useCurrentEntity:during:. This will allow us to create other entities that are related to this function, such as parameters and local variables.
The difference between local and global variables is that local variables are declared inside a function, while global variables are declared outside any function.
To create the variable entities, we will create the visitDeclaration: method in the FamixCVisitor class. This method is called for each variable declaration in the C file.
FamixCVisitor >> visitDeclaration: aNode
"fields: type - declarator"
| varNameentity |
self visit: aNode _type.
varName :=self visit: aNode _declarator.
entity :=self currentEntity isFunction
ifTrue: [
(model newLocalVariableNamed: varName)
parentBehaviouralEntity: self currentEntity;
yourself ]
ifFalse: [
(model newGlobalVariableNamed: varName)
parentScope: self currentEntity;
yourself ].
self setSourceAnchor: entity from: aNode.
The visitDeclaration: method does the following:
Visits the variable’s type. This will allow us to parse its type information.
Retrieves the variable name by visiting the declarator field. If the variable is initialized, this will be an init_declarator node; otherwise, it will be an identifier. We should implement visit methods for both cases to extract the name correctly.
FamixCVisitor >> visitInitDeclarator: aNode
"fields: declarator - value"
self visit: aNode _value.
^self visit: aNode _declarator "variable name is in the declarator node"
FamixCVisitor >> visitIdentifier: aNode
^ aNode sourceText "returns the name of the variable"
Creates a variable entity, either a local variable or a global variable, depending on whether the current entity is a function or not.
Sets the source anchor for the variable entity using the setSourceAnchor:from: method.
In this section, we will implement the symbol resolution for our C importer. This will allow us to resolve references to variables and functions in our C code.
As an example, we will resolve the reference to the local variable aLocalVar in the main function, which will be represented as a famix write access entity.
To create the write access entity, we will implement the visitAssignmentExpression: method in the FamixCVisitor class. This method is called for each assignment expression.
visitAssignmentExpression: aNode
"fields: left - right"
| accessleftVarName |
leftVarName :=self visit: aNode _left.
access := model newAccess accessor: self currentEntity;
The resolve: aResolvable foundAction: aBlockClosure method is provided by the FamixTSAbstractVisitor class.
It takes two arguments:
aResolvable: an instance of SRIdentifierResolvable. This resolvable is created with the identifier (the variable name) and the expected kinds of entities (in this case, either a local variable or a global variable). The identifier: method sets the identifier to resolve, and the expectedKind: method sets the expected kinds of entities that can be resolved.
aBlockClosure: a block that will be executed when the resolvable is resolved (we found the variable). In this case we set the variable of the access entity to the resolved variable.
The SRIdentifierResolvable is a generic resolver that can be used to resolve identifiers. However, in some cases, we may need to create a custom resolver to handle specific cases. In that case, we can create a class that inherits from SRResolvable and override the resolveInScope:currentEntity: method to implement our custom resolution logic.
For more information about the symbol resolver, you can check the documentation.
The TreeSitterFamixIntegration package provides a utility to parse comments and attach them to the corresponding Famix entities. This is done using the FamixCCommentVisitor class.
To parse comments, we will create the FamixCCommentVisitor class that will inherit from FamixTSAbstractCommentVisitor. And we just need to override the visitNode: method.
We use the addMultilineCommentNode: and addSingleLineCommentNode: methods provided by the FamixTSAbstractCommentVisitor class to add the comment to the model.
For a detailed explanation of how to use the comment visitor, you can check the documentation.
Last thing to do is to use the comment visitor somewhere in our importer. We can do that everytime we finish visiting every children of translation unit node.
In this blog post, we have seen how to build a Famix importer for C code using the TreeSitterFamixIntegration framework. We have covered the following topics:
Setting up the environment and creating the importer and visitor classes.
Creating Famix entities for compilation units, functions, and variables.
Implementing symbol resolution for local and global variables.
Parsing comments and attaching them to the corresponding Famix entities.
This is just a starting point for building an importer with this stack. You have to implement more tests and methods to handle other entities. The TreeSitterFamixIntegration framework provides a lot of other utilities we didn’t cover to help you with that.
How do we represent the relation between a generic entity, its type parameters and the entities that concretize it? The Famix metamodel has evolved over the years to improve the way we represent these relations. The last increment is described in a previous blogpost.
We present here a new implementation that eases the management of parametric entities in Moose.
The major change between this previous version and the new implementation presented in this post is this:
We do not represent the parameterized entities anymore.
What’s wrong with the previous parametrics implementation?
The major issue with the previous implementation was the difference between parametric and non-parametric entities in practice, particularly when trying to trace the inheritance tree.
Here is a concrete example: getting the superclass of the superclass of a class.
For a non-parametric class, the sequence is straightforward: ask the inheritance for the superclass, repeat.
For a parametric class (see the little code snippet below), there was an additional step, navigating through the concretization:
importjava.util.ArrayList; "public class ArrayList<E> { /* ... */ }"
This has caused many headaches to developers who wanted to browse a hierarchy: how do we keep track of the full hierarchy when it includes parametric classes? How to manage both situations without knowing if the classes will be parametric or not?
The same problem occurred to browse the implementations of parametric interfaces and the invocations of generic methods.
Each time there was a concretization, a parametric entity was created. This created duplicates of virtually the same entity: one for the generic entity and one for each parameterized entity.
Let’s see an example:
publicMyClass implements List<Float> {
publicList<Integer>getANumber() {
List<Number> listA;
List<Integer> listB;
}
}
For the interface List<E>, we had 6 parametric interfaces:
One was the generic one: #isGeneric >>> true
3 were the parameterized interfaces implemented by ArrayList<E>, its superclass AbstractList<E> and MyClass. They were different because the concrete types were different: E from ArrayList<E>, E from AbstractList<E>and Float.
2 were declared types: List<Number> and List<Integer>.
When deciding of a new implementation, our main goal was to create a situation in which the dependencies would work in the same way for all entities, parametric or not.
That’s where we introduce parametric associations. These associations only differ from standard associations by one property: they trigger a concretization.
Here is the new Famix metamodel traits that represent concretizations:
There is a direct relation between a parametric entity and its type parameters.
A concretization is the association between a type parameter and the type argument that replaces it.
A parametric association triggers one or several concretizations, according to the number of type parameters the parametric entity has. Example: a parametric association that targets Map<K,V> will trigger 2 concretizations.
The parametric entity is the target of the parametric association. It is always generic. As announced, we do not represent parameterized entities anymore.
Coming back to the entities’ duplication example above, we now represent only 1 parametric interface for List<E>and it is the target of the 5 parametric associations.
This metamodel evolution is the occasion of another major change: the replacement of the direct relation between a typed entity and its type. This new association is called Entity typing.
The choice to replace the existing relation by a reified association is made to represent the dependency in coherence with the rest of the metamodel.
With this new association, we can now add parametric entity typings.
In a case like this:
publicArrayList<String> myAttribute;
we have an “entity typing” association between myAttribute and ArrayList. This association is parametric: it triggers the concretization of E in ArrayList<E> by String.
In the previous implementation, the bounds of type parameters were implemented as inheritances: in the example above, Number would be the superclass of T.
Since this change, bounds were introduced for wildcards.
We have now the occasion to also apply them to type parameters.
In the new implementation, Number is the upper bound of T.
This diagram sums up the new parametrics implementation in Famix traits and Java metamodel.
Please note that this is not the full Java metamodel but only a relevant part.
The representation of parametric entities is a challenge that will most likely continue as Famix evolves. The next question will probably be this one: should Concretization really be an association?
An association is the reification of a dependency. Yet, there is no dependency between a type argument and the type parameter it replaces. Each can exist without the other. The dependency is in fact between the source of the parametric association and the type parameter.
MySpecializedList has a superclass (ArrayList<E>) and also depends on String, as a type argument. However, String does not depend on E neither E on String.
The next iteration of the representation of parametric entities will probably cover this issue. Stay tuned!
In this blog-post, we see some tricks to create a visitor for an alien AST.
This visitor can allow, for example, to generate a Famix model from an external AST.
In a previous blog-post, we saw how to create a parser from a tree-sitter grammar.
This parser gives us an AST (Abstract Syntax Tree) which is a tree of nodes representing any given program that the parser can understand.
But the structure is decided by the external tool and might not be what we want.
For example it will not be a Famix model.
Let see some tricks to help convert this alien grammar into something that better fits our needs.
Let’s first look at what a “Visitor” is.
If you already know, you can skip this part.
When dealing with ASTs or Famix models, visitors are very convenient tools to walk through the entire tree/model and perform some actions.
The Visitor is a design pattern that allows to perform some actions on a set of interconnected objects, presumably all from a family of classes.
Typically, the classes all belong to the same inheritance hierarchy.
In our case, the objects will all be nodes in an AST.
For Famix, the objects would be entities from a Famix meta-model.
In the Visitor pattern, all the classes have an #accept: method.
Each #accept: in each class will call a visiting method of the visitor that is specific to it.
For example the classes NodeA and NodeB will respectively define:
NodeA >> accept: aVisitor
aVisitor visitNodeA: self.
NodeB >> accept: aVisitor
aVisitor visitNodeB: self.
Each visiting method in the visitor will with the element it receives, knowing what is its class: in #visitNodeA: the visitor knows how to deal with a NodeA instance and similarly for #visitNodeB:.
The visitor pattern is a kind of ping-pong between the visiting and #accept: methods:
Typically, all the node are interconnected in a tree or a graph.
To walk through the entire structure, it is expected that each visiting method take care of visiting the sub-objects of the current object.
For example we could say that NodeA has a property child containing another node:
NodeVisitor >> visitNodeA: aNodeA
"do some stuff"
aNodeA child accept: self
It is easy to see that if child contains a NodeB, this will trigger the visiting method visitNodeB: on it.
If it’s a instance of some other class, similarly it will trigger the appropriate visiting method.
To visit the entire structure one simply calls accept: on the root of the tree/graph passing it the visitor.
Visitors are very useful with ASTs or graphs because once all the accept: methods are implemented, we can define very different visitors that will "do some stuff" (see above) on all the object in the tree/graph.
Several of the “Famix-tools” blog-posts are based on visitors.
In a preceding blog-post we saw how to create an AST from a Perl program using the Tree-Sitter Perl grammar.
We will use this as an example to see how to create a visitor on this external AST.
Here “external” means it was created by an external tool and we don’t have control on the structure of the AST.
If we want to create a Famix-Perl model from a Tree-Sitter AST, we will need to convert the nodes in the Tree-Sitter AST into Famix entities.
(Note: In Perl, “package” is used to create classes. Therefore in our example, “new”, “setFirstName”, and “getFirstName” are some kind of Perl methods.)
Following the instructions in the previous post, you should be able to get a Tree-Sitter AST like this one:
To have a visitor for this AST, we first need to have an accept: method in all the classes of the AST’s nodes.
Fortunately this is all taken care of by the Pharo Tree-Sitter project.
In TSNode one finds:
accept: aTSVisitor
^ aTSVisitor visitNode: self
And a class TSVisitor defines:
visitNode: aTSNode
aTSNode collectNamedChild do: [ :child|
child accept: self ]
Which is a method ensuring that all children of a TSNode will be visited.
Thanks guys!
But less fortunately, there are very few different nodes in a Tree-Sitter AST.
Actually, all the nodes are instances of TSNode.
So the “subroutine_declaration_statement”, “block”, “expression_statement”, “return_expression”,… of our example are all of the same class, which is not very useful for a visitor.
This happens quite often.
For example a parser dumping an AST in XML format will contain mostly XMLElements.
If it is in JSON, they are all “objects” without any native class specification in the format. 😒
Fortunately, people building ASTs usually put inside a property with an indication of the type of each node.
For Tree-Sitter, this is the “type” property.
Every TSnode has a type which is what is displayed in the screenshot above.
How can we use this to help visiting the AST in a meaningfull way (from a visitor point a view)?
We have no control on the accept: method in TSNode, it will always call visitNode:.
But we can add an extra indirection to call different visiting methods according to the type of the node.
So, our visitor will inherit from TSVisitor but it will override the visitNode: method.
The new method will take the type of the node, build a visiting method name from it, and call the method on the node.
Let’s decide that all our visiting methods will be called “visitPerl<some-type>”.
For example for a “block”, the method will be visitPerlBlock:, for a “return_expression” it will be `visitPerlReturn_expression:”.
This is very easily done in Pharo with this method:
visitNode: aTSNode
| selector |
selector :='visitPerl', aTSNode type capitalized ,':'.
^self perform: selector asSymbol with: aTSNode
This method builds the new method name in a temporary variable selector and then calls it using perform:with:.
Note that the type name is capitalized to match the Pharo convention for method names.
We could have removed all the underscores (_) but it would have required a little bit of extra work.
This is not difficult with string manipulation methods.
You could try it… (or you can continue reading and find the solution further down.)
With this simple extra indirection in #visitNode:, we can now define separate visiting method for each type of TSNode.
For example to convert the AST to a Famix model, visitPerlPackage: would create a FamixPerlClass, and visitPerlSubroutine_declaration_statement: will create a FamixPerlMethod.
(Of course it is a bit more complex than that, but you got the idea, right?)
Our visitor is progressing but not done yet.
If we call astRootNode accept: TreeSitterPerlVisitor new with the root node of the Tree-Sitter AST, it will immediately halt on a DoesNotUnderstand error because the method visitPerlSource_file: does not exist in the visitor.
We can create it that way:
visitPerlSource_file: aTSNode
^self visitPerlAbstractNode: aTSNode.
visitPerlAbstractNode: aTSNode
^super visitNode: aTSNode
Here we introduce a visitPerlAbstractNode: that is meant to be called by all visiting methods.
From the point of view of the visitor, we are kind of creating a virtual inheritance hierarchy where each specific TSNode will “inherit” from that “PerlAbstractNode”.
This will be useful in the future when we create sub-classes of our visitor.
By calling super visitNode:, in visitPerlAbstractNode: we ensure that the children of the “source_file” will be visited.
And… we instantly get a new halt with DoesNotUnderstand: visitPerlPackage_statement:.
Again we define it:
visitPerlPackage_statement: aTSNode
^self visitPerlAbstractNode: aTSNode
This is rapidly becoming repetitive and tedious. There are a lot of methods to define (25 for our example) and they are all the same.
Let’s improve that.
We will use the Pharo DoesNotUnderstand mechanism to automate everything.
When a message is sent that an object that does not understand it, then the message doesNotUnderstand: is sent to this object with the original message (not understood) as parameter.
The default behavior is to raise an exception, but we can change that.
We will change doesNotUnderstand: so that it creates the required message automatically for us.
This is easy all we need to do is create a string:
visitPerl<some-name>: aTSNode
^self visitPerlAbstractNode: aTSNode
We will then ask Pharo to compile this method in the Visitor class and to execute it.
et voila!
Building the string is simple because the selector is the one that was not understood originally by the visitor.
We can get it from the argument of doesNotUnderstand:.
So we define the method like that:
doesNotUnderstand: aMessage
| code |
code := aMessage selector ,' aTSNode
^super visitNode: aTSNode'.
selfclass compile: code classified: #visiting.
self perform: aMessage selector with: aMessage arguments first
First we generate the source code of the method in the code variable.
Then we compile it in the visitor’s class.
Last we call the new method that was just created.
Here to call it, we use perform:with: again, knowing that our method has only one argument (so only one “with:” in the call).
For more security, it can be useful to add the following guard statement at the beginning of our doesNotUnderstand: method:
(aMessage selector beginsWith: 'visitPerl')
ifFalse: [ super doesNotUnderstand: aMessage ].
This ensures that we only create methods that begins with “visitPerl”, if for any reason, some other message is not understood, it will raise an exception as usual.
Now visiting the AST from our example creates all the visiting methods automatically:
Of course this visitor does not do anything but walking through the entire AST.
Let’s say it is already a good start and we can create specific visitors from it.
For example we see in the screen shot above that there is a TreeSitterPerlDumpVisitor.
It just dumps on the Transcript the list of node visited.
For this, it only needs to define:
visitPerlAbstractNode: aTSNode
('visiting a ', aTSNode type) traceCr.
super visitPerlAbstractNode: aTSNode.
Et voila! (number 2)
Note: Redefining doesNotUnderstand: is a nice trick to quickly create all the visiting methods, but it is recommended that you remove it once the visitor is stable, to make sure you catch all unexpected errors in the future.
This is all well and good, but the visiting methods have one drawback:
They visit the children of a node in an unspecified order.
For example, an “assignment_expression” has two children, the variable assigned and the expression assigned to it.
We must rely on Tree-Sitter to visit them in the right order so that the first child is always the variable assigned and the second child is always the right-hand-side expression.
It would be better to have a name for these children so as to make sure that we know what we are visiting at any time.
In this case, Tree-Sitter helps us with the collectFieldNameOfNamedChild method of TSNode.
This method returns an OrderedDictionary where the children are associated to a (usually) meaningful key.
In the case of “assignment_expression” the dictionary has two keys: “left” and “right” each associated to the correct child.
It would be better to call them instead of blindly visit all the children.
So we will change our visitor for this.
The visitNode: method will now call the visiting method with the dictionnary of keys/children as second parameter, the dictionnary of fields.
This departs a bit from the traditional visitor pattern where the visiting methods usually have only one argument, the node being visited.
But the extra information will help make the visiting methods simpler:
visitNode: aTSNode
| selector |
selector := String streamContents: [ :st|
st <<'visitPerl'.
($_ split: aTSNode type) do: [ :word| st << word capitalized ].
st <<':withFields:'
].
^self
perform: selector asSymbol
with: aTSNode
with: aTSNode collectFieldNameOfNamedChild
It looks significantly more complex, but we also removed the underscores (_) in the visiting method selector (first part of the #visitNode: method).
So for “assignment_expression”, the visiting method will now be: visitPerleAssignmentExpression:withFields:.
From this, we could have the following template for our visiting methods:
Again, it may look a bit complex, but this is only building a string with the needed source code. Go back to the listing of #visitPerlAssignmentExpression: above to see that:
we first build the selector of the new visiting method with its parameter;
then we put a return and start a dynamic array;
after that we create a call to #visitKey:inDictionnary for each field;
and finally, we close the dynamic array.
Et voila! (number 3).
This is it.
If we call again this visitor on an AST from Tree-Sitter, it will generate all the new visiting methods with explicit field visiting.
For example:
The implementation of all this can be found in the https://github.com/moosetechnology/Famix-Perl repository on github.
All that’s left to do is create a sub-class of this visitor and override the visiting methods to do something useful with each node type.
In this post, we will be looking at how to use a Tree-Sitter grammar to help build a parser for a language.
We will use the Perl language example for this.
Note: Creating a parser for a language is a large endehavour that can easily take 3 to 6 months of work.
Tree-Sitter, or any other grammar tool, will help in that, but it remains a long task.
do make (note: it gave me some error, but the library file was generated all the same)
(on Linux) it creates a libtree-sitter-perl.so dynamic library file.
This must be moved in some standard library path (I chose /usr/lib/x86_64-linux-gnu/ because this is where the libtree-sitter.so file was).
Pharo uses FFI to link to the grammar library, that’s why it’s a good idea to put it in a standard directory.
You can also put this library file in the same directory as your Pharo image, or in the directory where the Pharo launcher puts the virtual machines.
The subclasses of FFILibraryFinder can tell you what are the standard directories on your installation.
For example on Linux, FFIUnix64LibraryFinder new paths returns a list of paths that includes '/usr/lib/x86_64-linux-gnu/' where we did put our grammar.so file.
We use the Pharo-Tree-Sitter project (https://github.com/Evref-BL/Pharo-Tree-Sitter) of Berger-Levrault, created by Benoit Verhaeghe, a regular contributor to Moose and this blog.
You can import this project in a Moose image following the README instructions.
Notice that we gave the name of the dynamic library file created above (libtree-sitter-perl.so).
If this file is in a standard library directory, FFI will find it.
We can now experiment “our” parser on a small example:
parser := TSParser new.
tsLanguage := TSLanguage perl.
parser language: tsLanguage.
string :='# this is a comment
my $var = 5;
'.
tree := parser parseString: string.
tree rootNode
This gives you the following window:
That looks like a very good start!
But we are still a long way from home.
Let’s look at a node of the tree for fun.
node := tree rootNode firstNamedChild will give you the first node in the AST (the comment).
If we inspect it, we see that it is a TSNode
we can get its type: node type returns the string 'comment'
node nextSibling returns the next TSNode, the “expression-statement”
node startPoint and node endPoint tell you where in the source code this node is located.
It returns instances of TSPoint:
node startPoint row = 0 (0 indexed)
node startPoint column = 0
node endPoint row = 0
node endPoint column = 19
That is to say the node is on the first row, extending from column 0 to 19.
With this, one could get the text associated to the node from the original source code.
That’s it for today.
In a following post we will look at doing something with this AST using the Visitor design pattern.