In Java, we can define behavior that is executed exclusively at the initialization of an instance. For now, our metamodel represented these behaviors as methods. This evolution represents them as Initializers.
We consider as initializers the following elements:
Constructors: they are called when creating a new instance. When a constructor is called, if no explicit call is defined, it implicitly calls the default no-argument constructor, that calls the no-argument constructor in the superclass. We do not represent implicit constructors and these invocations.
Initialization blocks: blocks that are executed when a new instance is created. They are copied by the Java compiler into each constructor and avoid code duplication. We do not represent this implicit invocation.
In Famix: the <Initializer> method: we create a method to hold all attribute definitions in a type.
The main motivation for this change is to adapt the metamodel to the needs of building call graphs.
Call graphs must be able to create the implicit invocations described above and to distinguish between the 3 types of initializers.
Another motivation is to differentiate between initializers and actual methods.
In analyses, we often need to focus on methods and initializers can add noise when treated as actual methods, especially the <Initializer> method.
We introduce FamixJavaInitializer, a subclass of FamixJavaMethod.
An Initiliazer has 2 properties:
#isInitializationBlock: boolean, false by default.
#isConstructor: boolean, derived. In java, a constructor is an initializer with the same name as its parent type, with no declared type (or void as declared type).
We do not merge all initializers as we did before, but we still merge similar initializers, that will always be called together.
In a Java model, a type (TWithMethods) can define a maximum of 4 initializers (besides constructors):
An instance initialization block, that is the merge of all instance initialization blocks.
An instance-side <Initializer> method, similar to the one we created before.
A static initialization block, merge of all static initialization blocks (isClassSideis true).
A static <Initializer>method (isClassSideis true) for static attributes definition.
When inspecting a Java model, initializers can now be found under Initializers and Model initializers.
In order to be able to work with Moose there is a prerequisite we cannot avoid: we need a model to analyze. This can be archieved in 2 principal ways:
Importing an existing JSON/MSE file containing a model
Importing a model via a Moose importer such as the Pharo importer or Python importer
While doing this, we create a lot of entities and set a lot of relations. But this can take some time. I found out that this time was even bigger than I anticipated while profiling a JSON import.
Here is the result of the profiling of a JSON of 330MB on a Macbook pro M1 from 2023:
Form this profiling we can see that we spend 351sec for this import. We can find more information in this report:
On this screenshot we can see some noise due to the fact that the profiler was not adapted to the new event listening loop of Pharo. But in the leaves we can also see that most of the time is spent in FMSlotMultivaluedLink>>#indexOf:startingAt:ifAbsent:.
This is used by a mecanism of all instance variables that are FMMany because those we do not want duplicated elements. Thus, we check if the collection contains the element before adding it.
But during the import of a JSON file, we should have no duplicates making this check useless. This also explains why we spend so much time in this method: we always are in the worst case scenario: there is no element matching.
In order to optimize the creation of a model when we know we will not create any duplicates, we can disable the check.
For this, we can use a dynamic variable declaring that we should check for duplicated elements by default, but allowing to disable the check during the execution of some code.
Now let’s try to import the same JSON file with the optiwization enabled:
We can see that the import time went from 351sec to 113sec!
We can also notice that we do not have one bottleneck in our parsing. This means that it will be harder to optimize more this task (even if some people still have some ideas on how to do that).
This optimization has been made for the import of JSON but it can be used in other contexts.
For example, in the Moose Python importer, the implementation is sure to never produce a duplicate. Thus, we could use the same trick this way:
FamixPythonImporter >> import
FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value: false during: [ super import ]
When developping algorithm on top of the Moose platform, we can easily hurt a wall during testing.
To do functional (and sometimes unit) testing, we need to work on a Moose model. Most of the time we are getting this model in two ways:
We produce a model and save the .json to recreate this model in the tests
We create a model by hand
But those 2 solutions have drawbacks:
Keeping a JSON will not follow the evolutions of Famix and the model produce will not be representative of the last version of Famix
Creating a model by hand has the drawback of taking the risk that this model will not be representative of what we could manipulate in reality. For example, we might not think about setting the stubs or the source anchors
In order to avoid those drawbacks I will describe my way of managing such testing cases in this article. In order to do this, I will explain how I set up the tests of a project to build CallGraph of Java projects.
The idea I had for testing callgraphs is to implement real java projects in a resources folder in the git of the project. Then, we can parse them when launching the tests and manipulate the produced model. This would ensure that we always have a model up to date with the latest version of Famix. If tests breaks, this means that our famix model evolved and that our project does not work anymore for this language.
Now that we have the dependency running, we can use this project. We will explain the minimal steps here but you can find the full documantation here.
The usage of GitBridge begins with the definition of our FamixCallGraphBridge:
GitBridge <<#FamixCallGraphBridge
slots: {};
package: 'Famix-CallGraph-Tests'
Now that this class exists we can access our git folder using FamixCallGraphBridge current root.
Let’s add some syntactic suggar:
FamixCallGraphBridge class>>#resources
^self root /'resources'
FamixCallGraphBridge class>>#sources
^self resources /'sources'
We can now access our java projects doing FamixCallGraphBridge current sources.
This step is almost done, but in order for our tests to work in a github action (for example), we need two little tweaks.
In our smalltalk.ston file, we need to register our project in Iceberg (because GitBridge uses Iceberg to access the root folder).
SmalltalkCISpec {
#loading : [
SCIMetacelloLoadSpec {
#baseline : 'FamixCallGraph',
#directory : 'src',
#registerInIceberg : true"<== This line"
}
]
}
Also, in our github action we need to be sure that the checkout action will get enough info for git bridge to run and not the minimal ammount (which is the default) adding a fetch-depth: option.
I am using this technic to tests multiple projects such as parsers or call graph builders. In those projects I do touch my model and the setup can take time. So I optimize this setup in order to build a model only once for all the test case using a TestResource.
In order to do this we can remove the slots we added to FamixAbstractJavaCallGraphBuilderTestCase and create a test resource that will hold them
It is possible to do the same thing for other languages than java but maybe not exactly in the same way than in this blogpost for the section “Parse and import your model”. But this article is meant to be an inspiration!
I hope this helps improve the robustness of our projects :)
If you’re here, you’re probably interested in creating a new FAST metamodel and expanding Moose to represent the AST (Abstract Syntax Tree) of an additional language.
In this post, we explain to you how to generate a “First version” of a new FAST-Language metamodel using the project Pharo-Tree-Sitter.
To be able to understand that, we assume you are already familiar with:
Tree-Sitter
Pharo-Tree-Sitter
FAST
Metamodel generators
Tree-Sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. It is able to parse a large variety of programming languages such as Java, C++, C#, Python and many others.
Pharo-Tree-Sitter is a project developed in Pharo that integrates the original Tree-Sitter parsers and allows visualizing their results (such as ASTs) directly in Pharo. It relies on the FFI protocol, which requires the corresponding libraries depending on the OS (.dll, .so, or .pylib) to be present in Pharo’s VM folders.
The project supports parsing several languages, and for some of them (like Python, TypeScript, and C), the library generation is automated. You can find more details in the repository’s README.
This is the project that we will use to generate a new FAST-Language metamodel, so you need to download it into your Pharo image.
FAST means Famix AST. Contrary to Famix that represent application at a high abstraction level, FAST uses a low-level representation: the AST.
FAST defines a set of traits that can be used to create new meta-models compatible with Moose tools.
When developing a new FAST-Language metamodel, you will rely on these FAST traits to structure your metamodel. However, this does not apply to the “First version” described in this post, but rather to the upgraded versions when you evolve and refine it.
Metamodel generator is a Pharo library used to create new metamodels such as FAST-Java, Famix-Java, or FAST-Fortran.
The generation of any new version of a FAST-Language metamodel can only be achieved through the metamodel generator.
As you will see in this post, Pharo-Tree-Sitter enables you to define a new metamodel generator. Once executed, it produces the corresponding FAST-Language metamodel. We will explain this process in more detail in the following sections.
Download Pharo-Tree-Sitter and get the correspondent libraries
Once downloaded, you need to make sure that Pharo-Tree-Sitter is able to parse the language that you intend to create the metamodel for.
If it is not included, you need to follow the instructions in the readme file of this repository and add the new language.
For this blog post we will assume that the language is already supported and we will continue with “Python” 🐍🐍🐍.
To be able to continue, and if this is the first time you’re using this project (Pharo-Tree-Sitter), you need to launch the tests of python in package “TreeSitter-Tests” class “TSParserPythonTest”.
This is needed to launch the process of downloading the original tree-sitter and tree-sitter-python projects from GitHub, generating the correspondent libraries and moving them to the correspondent VM folder based on the image version you create: for example Moose 12.
If you create another image of another version, you need to launch the tests again to make sure the libraries are again moved to the correspondent folder.
Now that you have the libraries, you can parse python code and get an AST, but not FAST-Python model.
So in the next step we explain how this can be possible.
Create the first version of the metamodel (FAST-Python in our example)
This package contains two main classes: “TSFASTBuilder” and “TSFASTImporter”.
For our task we will rely on the first one.
The second is used to make the transition between an AST generated by TreeSitter and a FAST-Language model.
“TSFASTBuilder” contains a set of methods responsible for generating a new metamodel generator:
#tsLanguage: is used to set an instance of TSLanguage, which is TSLanguage python in our case.
#createMetamodelGeneratorClass is responsible for creating a new package and a class inside. By default, the class name will be “FASTLanguageNameMetamodelGenerator” which is “FASTPythonMetamodelGenerator” and the package name is “FAST-LanguageName-Model-Generator”.
This method also calls another one “typesToReify”, which gets all the symbols from the initial TreeSitter project (using an FFI call), and add them as slots in the class definition. These symbols represent the nodes of the language in question like “class” for Python.
#addPrefixMethodIn: adds #prefix method on the class side of the metamodel generator class. By default it is FASTLanguage.
#addPackageNameMethodIn: adds #packageName method on the class side of the metamodel generator class. By default it’s ‘FAST-Language-Model’.
#addSubmetamodelsMethodIn: adds #submetamodels method on the class side of the metamodel generator class, and by default it contains FASTMetamodelGenerator.
#addDefineClassIn: adds #defineClasses method. In this method slots are defined, starting by #entity then all the symbols imported from TreeSitter.
#addDefineTraitsIn: adds #defineTraits method. By default FASTTEntity trait is created.
#addDefineHierarchyIn: adds #defineHierarchy method. By default only #entity relation is defined with FASTTEntity.
#addDefineRelationsIn: adds #defineRelations method. By default only #entity relations are defined with genericChildren and genericParent.
Voilà, now that you understand how it works, we will show you how to generate one for Python:
tsb := TSFASTBuilder new.
tsb languageName: 'Python'.
tsb tsLanguage: TSLanguage python.
tsb build.
This will generate the metamodel generator. Now that the generator is created you can use it to generate the metamodel:
FASTPythonMetamodelGenerator new generate.
Now you can access the packages and classes created: ‘FAST-Python-Model’ and ‘FAST-Python-Model-Generator’.
From now on you have to handle the metamodel manually. You have to add missing traits (including FAST Traits), properties that should be imported from TreeSitter… You benefit from the importer to handle the parsing on the metamodel side. You can create a package for tools having a #parse method doing this for example:
N.B: We recommend you to parse many python examples (you can find a lot in the main project of TreeSitter-Python), using Pharo-Tree-Sitter project. Once parsed you can inspect in Pharo the properties for each node using #collectFieldNameOfNamedChild and find the properties for each one. Then you can add them in #defineRelations of the metamodel.
The backend I analyzed follows a common pattern.
In the git repository, there is a folder api containing the microservices, and a folder lib with resources for each microservice.
There is also an additional project called lib-common.
Thus, the microservice home is composed of a project named api-home and a project named lib-home.
Directorysrc
Directoryapi
Directoryapi-home
Directorysrc/
…
Directorylib
Directorylib-home
Directorysrc/
…
Directorylib-common/
…
We wanted to check that dependencies were correctly implemented in the project:
no api project should directly depend on another api (API calls are allowed, but not classic Java dependencies)
each api project can depend on its equivalent lib project
Moose provides ready-to-use visualizations to represent dependencies. In my case, I chose to use the Architectural map.
This visualization presents the entities of the model (packages, classes, methods) as a tree and displays the associations between them (i.e., the dependencies).
I first asked this visualization to display all the classes. It works, but does not allow us to distinguish the different microservices.
The main problem is that too much information is displayed and we cannot see the microservices.
To fix this, I used Moose’s tag feature.
A tag allows you to associate a color and a name to an entity.
So I tagged the classes of my system depending on their location in the repository.
To do this, in a Moose Playground, I used the following script (adapt it to your context 😉):
model allTaggedEntities do: [ :entity| entity removeTags ].
(sa fileName beginsWith: './services/api-A') ifTrue: [ class tagWithName: 'A' ].
(sa fileName beginsWith: './services/api-B') ifTrue: [ class tagWithName: 'B' ].
(sa fileName beginsWith: './services/api-C') ifTrue: [ class tagWithName: 'C' ].
(sa fileName beginsWith: './libraries/lib-A') ifTrue: [ class tagWithName: 'lib-A' ].
(sa fileName beginsWith: './libraries/lib-common') ifTrue: [ class tagWithName: 'lib-common' ].
(sa fileName beginsWith: './libraries/lib-B') ifTrue: [ class tagWithName: 'lib-B' ].
(sa fileName beginsWith: './libraries/lib-C') ifTrue: [ class tagWithName: 'lib-C' ].
]
].
(model allWithSubTypesOf: FamixJavaType) reject: [ :type| type tags isEmpty ]
The result is not perfect yet because entities are not grouped by tag.
To fix this, simply select the tag to add option in the architectural map settings.
You then get a clear visualization of the links between the microservice projects and the libraries they use. We see that no api is linked to an incorrect lib project.
We also notice that microservice B is linked to lib-B as well as lib-common.
Maybe this link to lib-common should be removed? But that’s another story…