In order to be able to work with Moose there is a prerequisite we cannot avoid: we need a model to analyze. This can be archieved in 2 principal ways:
Importing an existing JSON/MSE file containing a model
Importing a model via a Moose importer such as the Pharo importer or Python importer
While doing this, we create a lot of entities and set a lot of relations. But this can take some time. I found out that this time was even bigger than I anticipated while profiling a JSON import.
Here is the result of the profiling of a JSON of 330MB on a Macbook pro M1 from 2023:
Form this profiling we can see that we spend 351sec for this import. We can find more information in this report:
On this screenshot we can see some noise due to the fact that the profiler was not adapted to the new event listening loop of Pharo. But in the leaves we can also see that most of the time is spent in FMSlotMultivaluedLink>>#indexOf:startingAt:ifAbsent:.
This is used by a mecanism of all instance variables that are FMMany because those we do not want duplicated elements. Thus, we check if the collection contains the element before adding it.
But during the import of a JSON file, we should have no duplicates making this check useless. This also explains why we spend so much time in this method: we always are in the worst case scenario: there is no element matching.
In order to optimize the creation of a model when we know we will not create any duplicates, we can disable the check.
For this, we can use a dynamic variable declaring that we should check for duplicated elements by default, but allowing to disable the check during the execution of some code.
Now let’s try to import the same JSON file with the optiwization enabled:
We can see that the import time went from 351sec to 113sec!
We can also notice that we do not have one bottleneck in our parsing. This means that it will be harder to optimize more this task (even if some people still have some ideas on how to do that).
This optimization has been made for the import of JSON but it can be used in other contexts.
For example, in the Moose Python importer, the implementation is sure to never produce a duplicate. Thus, we could use the same trick this way:
FamixPythonImporter >> import
FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value: false during: [ super import ]
When developping algorithm on top of the Moose platform, we can easily hurt a wall during testing.
To do functional (and sometimes unit) testing, we need to work on a Moose model. Most of the time we are getting this model in two ways:
We produce a model and save the .json to recreate this model in the tests
We create a model by hand
But those 2 solutions have drawbacks:
Keeping a JSON will not follow the evolutions of Famix and the model produce will not be representative of the last version of Famix
Creating a model by hand has the drawback of taking the risk that this model will not be representative of what we could manipulate in reality. For example, we might not think about setting the stubs or the source anchors
In order to avoid those drawbacks I will describe my way of managing such testing cases in this article. In order to do this, I will explain how I set up the tests of a project to build CallGraph of Java projects.
The idea I had for testing callgraphs is to implement real java projects in a resources folder in the git of the project. Then, we can parse them when launching the tests and manipulate the produced model. This would ensure that we always have a model up to date with the latest version of Famix. If tests breaks, this means that our famix model evolved and that our project does not work anymore for this language.
Now that we have the dependency running, we can use this project. We will explain the minimal steps here but you can find the full documantation here.
The usage of GitBridge begins with the definition of our FamixCallGraphBridge:
GitBridge <<#FamixCallGraphBridge
slots: {};
package: 'Famix-CallGraph-Tests'
Now that this class exists we can access our git folder using FamixCallGraphBridge current root.
Let’s add some syntactic suggar:
FamixCallGraphBridge class>>#resources
^self root /'resources'
FamixCallGraphBridge class>>#sources
^self resources /'sources'
We can now access our java projects doing FamixCallGraphBridge current sources.
This step is almost done, but in order for our tests to work in a github action (for example), we need two little tweaks.
In our smalltalk.ston file, we need to register our project in Iceberg (because GitBridge uses Iceberg to access the root folder).
SmalltalkCISpec {
#loading : [
SCIMetacelloLoadSpec {
#baseline : 'FamixCallGraph',
#directory : 'src',
#registerInIceberg : true"<== This line"
}
]
}
Also, in our github action we need to be sure that the checkout action will get enough info for git bridge to run and not the minimal ammount (which is the default) adding a fetch-depth: option.
I am using this technic to tests multiple projects such as parsers or call graph builders. In those projects I do touch my model and the setup can take time. So I optimize this setup in order to build a model only once for all the test case using a TestResource.
In order to do this we can remove the slots we added to FamixAbstractJavaCallGraphBuilderTestCase and create a test resource that will hold them
It is possible to do the same thing for other languages than java but maybe not exactly in the same way than in this blogpost for the section “Parse and import your model”. But this article is meant to be an inspiration!
I hope this helps improve the robustness of our projects :)
If you’re here, you’re probably interested in creating a new FAST metamodel and expanding Moose to represent the AST (Abstract Syntax Tree) of an additional language.
In this post, we explain to you how to generate a “First version” of a new FAST-Language metamodel using the project Pharo-Tree-Sitter.
To be able to understand that, we assume you are already familiar with:
Tree-Sitter
Pharo-Tree-Sitter
FAST
Metamodel generators
Tree-Sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. It is able to parse a large variety of programming languages such as Java, C++, C#, Python and many others.
Pharo-Tree-Sitter is a project developed in Pharo that integrates the original Tree-Sitter parsers and allows visualizing their results (such as ASTs) directly in Pharo. It relies on the FFI protocol, which requires the corresponding libraries depending on the OS (.dll, .so, or .pylib) to be present in Pharo’s VM folders.
The project supports parsing several languages, and for some of them (like Python, TypeScript, and C), the library generation is automated. You can find more details in the repository’s README.
This is the project that we will use to generate a new FAST-Language metamodel, so you need to download it into your Pharo image.
FAST means Famix AST. Contrary to Famix that represent application at a high abstraction level, FAST uses a low-level representation: the AST.
FAST defines a set of traits that can be used to create new meta-models compatible with Moose tools.
When developing a new FAST-Language metamodel, you will rely on these FAST traits to structure your metamodel. However, this does not apply to the “First version” described in this post, but rather to the upgraded versions when you evolve and refine it.
Metamodel generator is a Pharo library used to create new metamodels such as FAST-Java, Famix-Java, or FAST-Fortran.
The generation of any new version of a FAST-Language metamodel can only be achieved through the metamodel generator.
As you will see in this post, Pharo-Tree-Sitter enables you to define a new metamodel generator. Once executed, it produces the corresponding FAST-Language metamodel. We will explain this process in more detail in the following sections.
Download Pharo-Tree-Sitter and get the correspondent libraries
Once downloaded, you need to make sure that Pharo-Tree-Sitter is able to parse the language that you intend to create the metamodel for.
If it is not included, you need to follow the instructions in the readme file of this repository and add the new language.
For this blog post we will assume that the language is already supported and we will continue with “Python” 🐍🐍🐍.
To be able to continue, and if this is the first time you’re using this project (Pharo-Tree-Sitter), you need to launch the tests of python in package “TreeSitter-Tests” class “TSParserPythonTest”.
This is needed to launch the process of downloading the original tree-sitter and tree-sitter-python projects from GitHub, generating the correspondent libraries and moving them to the correspondent VM folder based on the image version you create: for example Moose 12.
If you create another image of another version, you need to launch the tests again to make sure the libraries are again moved to the correspondent folder.
Now that you have the libraries, you can parse python code and get an AST, but not FAST-Python model.
So in the next step we explain how this can be possible.
Create the first version of the metamodel (FAST-Python in our example)
This package contains two main classes: “TSFASTBuilder” and “TSFASTImporter”.
For our task we will rely on the first one.
The second is used to make the transition between an AST generated by TreeSitter and a FAST-Language model.
“TSFASTBuilder” contains a set of methods responsible for generating a new metamodel generator:
#tsLanguage: is used to set an instance of TSLanguage, which is TSLanguage python in our case.
#createMetamodelGeneratorClass is responsible for creating a new package and a class inside. By default, the class name will be “FASTLanguageNameMetamodelGenerator” which is “FASTPythonMetamodelGenerator” and the package name is “FAST-LanguageName-Model-Generator”.
This method also calls another one “typesToReify”, which gets all the symbols from the initial TreeSitter project (using an FFI call), and add them as slots in the class definition. These symbols represent the nodes of the language in question like “class” for Python.
#addPrefixMethodIn: adds #prefix method on the class side of the metamodel generator class. By default it is FASTLanguage.
#addPackageNameMethodIn: adds #packageName method on the class side of the metamodel generator class. By default it’s ‘FAST-Language-Model’.
#addSubmetamodelsMethodIn: adds #submetamodels method on the class side of the metamodel generator class, and by default it contains FASTMetamodelGenerator.
#addDefineClassIn: adds #defineClasses method. In this method slots are defined, starting by #entity then all the symbols imported from TreeSitter.
#addDefineTraitsIn: adds #defineTraits method. By default FASTTEntity trait is created.
#addDefineHierarchyIn: adds #defineHierarchy method. By default only #entity relation is defined with FASTTEntity.
#addDefineRelationsIn: adds #defineRelations method. By default only #entity relations are defined with genericChildren and genericParent.
Voilà, now that you understand how it works, we will show you how to generate one for Python:
tsb := TSFASTBuilder new.
tsb languageName: 'Python'.
tsb tsLanguage: TSLanguage python.
tsb build.
This will generate the metamodel generator. Now that the generator is created you can use it to generate the metamodel:
FASTPythonMetamodelGenerator new generate.
Now you can access the packages and classes created: ‘FAST-Python-Model’ and ‘FAST-Python-Model-Generator’.
From now on you have to handle the metamodel manually. You have to add missing traits (including FAST Traits), properties that should be imported from TreeSitter… You benefit from the importer to handle the parsing on the metamodel side. You can create a package for tools having a #parse method doing this for example:
N.B: We recommend you to parse many python examples (you can find a lot in the main project of TreeSitter-Python), using Pharo-Tree-Sitter project. Once parsed you can inspect in Pharo the properties for each node using #collectFieldNameOfNamedChild and find the properties for each one. Then you can add them in #defineRelations of the metamodel.
The backend I analyzed follows a common pattern.
In the git repository, there is a folder api containing the microservices, and a folder lib with resources for each microservice.
There is also an additional project called lib-common.
Thus, the microservice home is composed of a project named api-home and a project named lib-home.
Directorysrc
Directoryapi
Directoryapi-home
Directorysrc/
…
Directorylib
Directorylib-home
Directorysrc/
…
Directorylib-common/
…
We wanted to check that dependencies were correctly implemented in the project:
no api project should directly depend on another api (API calls are allowed, but not classic Java dependencies)
each api project can depend on its equivalent lib project
Moose provides ready-to-use visualizations to represent dependencies. In my case, I chose to use the Architectural map.
This visualization presents the entities of the model (packages, classes, methods) as a tree and displays the associations between them (i.e., the dependencies).
I first asked this visualization to display all the classes. It works, but does not allow us to distinguish the different microservices.
The main problem is that too much information is displayed and we cannot see the microservices.
To fix this, I used Moose’s tag feature.
A tag allows you to associate a color and a name to an entity.
So I tagged the classes of my system depending on their location in the repository.
To do this, in a Moose Playground, I used the following script (adapt it to your context 😉):
model allTaggedEntities do: [ :entity| entity removeTags ].
(sa fileName beginsWith: './services/api-A') ifTrue: [ class tagWithName: 'A' ].
(sa fileName beginsWith: './services/api-B') ifTrue: [ class tagWithName: 'B' ].
(sa fileName beginsWith: './services/api-C') ifTrue: [ class tagWithName: 'C' ].
(sa fileName beginsWith: './libraries/lib-A') ifTrue: [ class tagWithName: 'lib-A' ].
(sa fileName beginsWith: './libraries/lib-common') ifTrue: [ class tagWithName: 'lib-common' ].
(sa fileName beginsWith: './libraries/lib-B') ifTrue: [ class tagWithName: 'lib-B' ].
(sa fileName beginsWith: './libraries/lib-C') ifTrue: [ class tagWithName: 'lib-C' ].
]
].
(model allWithSubTypesOf: FamixJavaType) reject: [ :type| type tags isEmpty ]
The result is not perfect yet because entities are not grouped by tag.
To fix this, simply select the tag to add option in the architectural map settings.
You then get a clear visualization of the links between the microservice projects and the libraries they use. We see that no api is linked to an incorrect lib project.
We also notice that microservice B is linked to lib-B as well as lib-common.
Maybe this link to lib-common should be removed? But that’s another story…
Analyzing source code starts with parsing and for this you need semantic understanding of how symbols in the code relate to each other.
In this post, we’ll walk through how to build a C code importer using the TreeSitterFamixIntegration framework.
The TreeSitterFamixIntegration stack provides tools to ease the development of Famix importers using tree-sitter.
This package offers some great features for parsing such as (but not limited to):
Useful methods for source management (getting source text, positions, setting sourceAnchor of a famix entity).
Error handling to help catch and report parsing issues
a better TreeSitter node inspector (which is very helpful when debugging)
Utility to efficiently import and attach single-line and multi-line comments to their corresponding entities.
Context tracking for symbol scope (no more context push and pop 😁)
There is a detailed documentation you can check that explain every features.
First, we need to load the C metamodel. This metamodel provides the Famix classes that represent C entities such as functions, structs, variables, etc.
The FamixCimporter class is the entry point for our importer. It will handle the parsing of C files into Abstract Syntax Trees (AST).
This class will inherit from FamixTSAbstractImporter (defined in the TreeSitterFamixIntegration project), which provides the necessary methods for importing and parsing C files using Tree-sitter.
FamixTSAbstractImporter <<#FamixCImporter
slots: {};
package: 'Famix-C-Importer'
Now, let’s override some methods to set up our importer:
"Should return a TreeSitter language such as TSLanguage python"
^ TSLanguage cLang
This method returns the Tree-sitter language we want to use for parsing. In this case, we are using the C language. You can find the available languages in the Pharo-Tree-Sitter package.
This method calls importFile: on all C files recursively found in a directory.
We will add more logic to this method later but for now, it serves as a starting point for our importer.
The isCFile: method checks if the file has a .c or .h extension.
FamixCImporter >> isCFile: aFileReferencemon
^#( 'c''h' ) includes: aFileReference extension
The importFile: method is defined in the FamixTSAbstractImporter class (provided by the TreeSitter-Famix-Integration project).
It parses the file content to create an AST and then passes the visitor (the FamixCVisitor that we previously defined) to walk through the AST.
The FamixCVisitor class is responsible for walking through the parsed AST and creating Famix entities. It will inherit from FamixTSAbstractVisitor, which provides the necessary methods for visiting Tree-sitter nodes.
FamixTSAbstractVisitor <<#FamixCVisitor
slots: {};
package: 'Famix-C-Importer'
For this class, we will just need to override one method:
It returns the Famix metamodel class that will be used to create Famix entities. In this case, we are using FamixCModel which is in the Famix-Cpp package.
Now that we have our importer and visitor classes set up, we can already test it.
To test our importer, we can create a simple C file and import it using the FamixCImporter class.
test.c
#include<stdio.h>
int aGlobalVar =1;
intmain() {
int aLocalVar;
aLocalVar = aGlobalVar +2;
}
To import this file, we can use the following code in the Playground (cmd + O + P to open it):
Before running the above code, open the Transcript to see the logs (cmd + O + T to open it).
Then select all the code and run it by inspecting it (cmd + I or click the “Inspect” button). You will get something similar to this.
The above screenshot shows what is inside our model. We can see that there is pretty much nothing there yet apart from the SourceLanguages which is added by default by TreeSitterFamixIntegration.
Now if we look at the Transcript, we can see that the importer has imported the file but we didn’t implement the visitor methods yet for every node in the AST, so no Famix entities were created.
If you want to inspect the corresponding AST of our test file, you can do something similar to what is in this other blog post on tree-sitter.
Let’s go back to our FamixCImporter class and from there we will create a CompilationUnit and HeaderFile entities. We need to do that there because we have to check if the file is a header file or a source file.
visitor model newCompilationUnitNamed: aFileReference basename.
]
ifFalse: [
visitor model newHeaderFileNamed: aFileReference basename.
].
visitor
useCurrentEntity: fileEntity
during: [ self importFile: aFileReference ] ]
ifFalse: [
aFileReference children do: [ :each|
self importFileReference: each
].
^self ]
We use the useCurrentEntity:during: to provide a context for the visitor. This is same as pushing the fileEntity to a context, visit children and then popping it from the context. And it will set the current entity to the fileEntity.
Now try importing a whole directory containing C files. You should see that the importer creates a FamixCHeaderFile for each header file and a FamixCCompilationUnit for each source file.
To set the source anchor for any Famix entity, we can use the setSourceAnchor: aFamixEntity from: aTSNode method provided by the FamixTSAbstractVisitor class. This method takes a Famix entity and a Tree-sitter node.
We can use it to set the source anchor for our fileEntity . Go to visitTranslationUnit: in the FamixCVisitor class and add the following code:
Next, we will create FamixCFunction entities for each function declaration in the C file. We will do this in the visitFunctionDefinition: method of the FamixCVisitor class.
But first we need to know where the function name is located to create the FamixCFunction entity. Create the method and put a halt there to inspect the node.
visitFunctionDefinition: aNode
self halt.
self visitChildren: aNode.
If we look at the function definition node, we can see that the function name is in the identifier node, which is a child of the function declarator node.
To get that name, there are two ways:
visit the function_declarator until the identifier returns its name using self visit: aNode
get it by child field name using aNode _fieldName that returns the child node with the given field name. And you don’t need to implement the _fieldName method because it is already handled by the framework.
For simplicity, and to show other available features in the framework, we will use the second way.
Let’s inspect the function definition node to see what fields it has.
So if we do aNode _declarator it will return the function declarator node
And if we do aNode _declarator from the function_declarator it will give us the identifier that we want.
Now we can create the function entity and set its name and source anchor.
The self currentEntity returns the compilation unit entity which is the parent of the function entity.
And before visiting the children, we set the current entity to the newly created function entity using useCurrentEntity:during:. This will allow us to create other entities that are related to this function, such as parameters and local variables.
The difference between local and global variables is that local variables are declared inside a function, while global variables are declared outside any function.
To create the variable entities, we will create the visitDeclaration: method in the FamixCVisitor class. This method is called for each variable declaration in the C file.
FamixCVisitor >> visitDeclaration: aNode
"fields: type - declarator"
| varNameentity |
self visit: aNode _type.
varName :=self visit: aNode _declarator.
entity :=self currentEntity isFunction
ifTrue: [
(model newLocalVariableNamed: varName)
parentBehaviouralEntity: self currentEntity;
yourself ]
ifFalse: [
(model newGlobalVariableNamed: varName)
parentScope: self currentEntity;
yourself ].
self setSourceAnchor: entity from: aNode.
The visitDeclaration: method does the following:
Visits the variable’s type. This will allow us to parse its type information.
Retrieves the variable name by visiting the declarator field. If the variable is initialized, this will be an init_declarator node; otherwise, it will be an identifier. We should implement visit methods for both cases to extract the name correctly.
FamixCVisitor >> visitInitDeclarator: aNode
"fields: declarator - value"
self visit: aNode _value.
^self visit: aNode _declarator "variable name is in the declarator node"
FamixCVisitor >> visitIdentifier: aNode
^ aNode sourceText "returns the name of the variable"
Creates a variable entity, either a local variable or a global variable, depending on whether the current entity is a function or not.
Sets the source anchor for the variable entity using the setSourceAnchor:from: method.
In this section, we will implement the symbol resolution for our C importer. This will allow us to resolve references to variables and functions in our C code.
As an example, we will resolve the reference to the local variable aLocalVar in the main function, which will be represented as a famix write access entity.
To create the write access entity, we will implement the visitAssignmentExpression: method in the FamixCVisitor class. This method is called for each assignment expression.
visitAssignmentExpression: aNode
"fields: left - right"
| accessleftVarName |
leftVarName :=self visit: aNode _left.
access := model newAccess accessor: self currentEntity;
The resolve: aResolvable foundAction: aBlockClosure method is provided by the FamixTSAbstractVisitor class.
It takes two arguments:
aResolvable: an instance of SRIdentifierResolvable. This resolvable is created with the identifier (the variable name) and the expected kinds of entities (in this case, either a local variable or a global variable). The identifier: method sets the identifier to resolve, and the expectedKind: method sets the expected kinds of entities that can be resolved.
aBlockClosure: a block that will be executed when the resolvable is resolved (we found the variable). In this case we set the variable of the access entity to the resolved variable.
The SRIdentifierResolvable is a generic resolver that can be used to resolve identifiers. However, in some cases, we may need to create a custom resolver to handle specific cases. In that case, we can create a class that inherits from SRResolvable and override the resolveInScope:currentEntity: method to implement our custom resolution logic.
For more information about the symbol resolver, you can check the documentation.
The TreeSitterFamixIntegration package provides a utility to parse comments and attach them to the corresponding Famix entities. This is done using the FamixCCommentVisitor class.
To parse comments, we will create the FamixCCommentVisitor class that will inherit from FamixTSAbstractCommentVisitor. And we just need to override the visitNode: method.
We use the addMultilineCommentNode: and addSingleLineCommentNode: methods provided by the FamixTSAbstractCommentVisitor class to add the comment to the model.
For a detailed explanation of how to use the comment visitor, you can check the documentation.
Last thing to do is to use the comment visitor somewhere in our importer. We can do that everytime we finish visiting every children of translation unit node.
In this blog post, we have seen how to build a Famix importer for C code using the TreeSitterFamixIntegration framework. We have covered the following topics:
Setting up the environment and creating the importer and visitor classes.
Creating Famix entities for compilation units, functions, and variables.
Implementing symbol resolution for local and global variables.
Parsing comments and attaching them to the corresponding Famix entities.
This is just a starting point for building an importer with this stack. You have to implement more tests and methods to handle other entities. The TreeSitterFamixIntegration framework provides a lot of other utilities we didn’t cover to help you with that.