Blog

Speed up models creation: application to JSON/MSE parsing

Nov 18, 2025

Context

In order to be able to work with Moose there is a prerequisite we cannot avoid: we need a model to analyze. This can be archieved in 2 principal ways:

Importing an existing JSON/MSE file containing a model
Importing a model via a Moose importer such as the Pharo importer or Python importer

While doing this, we create a lot of entities and set a lot of relations. But this can take some time. I found out that this time was even bigger than I anticipated while profiling a JSON import.

Here is the result of the profiling of a JSON of 330MB on a Macbook pro M1 from 2023:

Image of a profiling

Form this profiling we can see that we spend 351sec for this import. We can find more information in this report:

Image of a profiling 2

On this screenshot we can see some noise due to the fact that the profiler was not adapted to the new event listening loop of Pharo. But in the leaves we can also see that most of the time is spent in FMSlotMultivaluedLink>>#indexOf:startingAt:ifAbsent:.

This is used by a mecanism of all instance variables that are FMMany because those we do not want duplicated elements. Thus, we check if the collection contains the element before adding it.

But during the import of a JSON file, we should have no duplicates making this check useless. This also explains why we spend so much time in this method: we always are in the worst case scenario: there is no element matching.

The optimization

In order to optimize the creation of a model when we know we will not create any duplicates, we can disable the check.

For this, we can use a dynamic variable declaring that we should check for duplicated elements by default, but allowing to disable the check during the execution of some code.

DynamicVariable << #FMShouldCheckForDuplicatedEntitiesInMultivalueLinks
  slots: {};
  tag: 'Utilities';
  package: 'Fame-Core'

FMShouldCheckForDuplicatedEntitiesInMultivalueLinks>>#default
  ^ true

And now that we have the variable, we can use it:

FMSlotMultivalueLink >> unsafeAdd: element
  (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ]
  FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value
    ifTrue: [ (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ] ]
    ifFalse: [ self uncheckUnsafeAdd: element ]

FMMultivalueLink >> unsafeAdd: element
  (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ]
  FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value
    ifTrue: [ (self includes: element) ifFalse: [ self uncheckUnsafeAdd: element ] ]
    ifFalse: [ self uncheckUnsafeAdd: element ]

And the last step is to disable the check during the MSE/JSON parsing:

FMMSEParser >> basicRun
  self Document.
  self atEnd ifFalse: [ ^ self syntaxError ]
  FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value: false during: [
      self Document.
      self atEnd ifFalse: [ ^ self syntaxError ] ]

Result of the optimization

Now let’s try to import the same JSON file with the optiwization enabled:

Image of a profiling

Image of a profiling 2

We can see that the import time went from 351sec to 113sec!

We can also notice that we do not have one bottleneck in our parsing. This means that it will be harder to optimize more this task (even if some people still have some ideas on how to do that).

Use this optimization in your project

This optimization has been made for the import of JSON but it can be used in other contexts. For example, in the Moose Python importer, the implementation is sure to never produce a duplicate. Thus, we could use the same trick this way:

FamixPythonImporter >> import
  FMShouldCheckForDuplicatedEntitiesInMultivalueLinks value: false during: [ super import ]

Testing your algo on a java project

Oct 8, 2025

Cyril Ferlicot-Delbecque

Research engineer at Inria

When developping algorithm on top of the Moose platform, we can easily hurt a wall during testing.

To do functional (and sometimes unit) testing, we need to work on a Moose model. Most of the time we are getting this model in two ways:

We produce a model and save the .json to recreate this model in the tests
We create a model by hand

But those 2 solutions have drawbacks:

Keeping a JSON will not follow the evolutions of Famix and the model produce will not be representative of the last version of Famix
Creating a model by hand has the drawback of taking the risk that this model will not be representative of what we could manipulate in reality. For example, we might not think about setting the stubs or the source anchors

In order to avoid those drawbacks I will describe my way of managing such testing cases in this article. In order to do this, I will explain how I set up the tests of a project to build CallGraph of Java projects.

The idea

The idea I had for testing callgraphs is to implement real java projects in a resources folder in the git of the project. Then, we can parse them when launching the tests and manipulate the produced model. This would ensure that we always have a model up to date with the latest version of Famix. If tests breaks, this means that our famix model evolved and that our project does not work anymore for this language.

Basic setup

Create your java code

The first step to build tests is to write some example java code.

I will start with a minimal example:

public class Main {
    public static void main(String[] args) {

        System.out.println("Hello World!");
    }
}

I’ll save this file in the git repository of my project under Famix-CallGraph/resources/sources/example1/Main.java.

Now that we have the source code, we need a way to access it in our project.

Use GitBridge

In order to access our resources, we will use GitBrigde.

You can install it by executing:

Metacello new
  githubUser: 'jecisc' project: 'GitBridge' commitish: 'v1.x.x' path: 'src';
  baseline: 'GitBridge';
  load

But we should add it to our baseline:

BaselineOfFamixCallGraph >> #gitBridge: spec

  spec baseline: 'GitBridge' with: [ spec repository: 'github://jecisc/GitBridge:v1.x.x/src' ]

BaselineOfFamixCallGraph >> #baseline: spec

  <baseline>
  spec for: #common do: [
    "Dependencies"
    self gitBridge: spec.

    "Packages"
    spec
      package: 'Famix-CallGraph';
      package: 'Famix-CallGraph-Tests' with: [ spec requires: #( 'Famix-CallGraph' 'GitBridge' ) ]. "<== WE ADD GITBRIDGE HERE!"
     ].

  spec for: #NeedsFamix do: [
    self famix: spec.

    spec package: 'Famix-CallGraph' with: [ spec requires: #( Famix ) ] ]

Now that we have the dependency running, we can use this project. We will explain the minimal steps here but you can find the full documantation here.

The usage of GitBridge begins with the definition of our FamixCallGraphBridge:

GitBridge << #FamixCallGraphBridge
  slots: {};
  package: 'Famix-CallGraph-Tests'

Now that this class exists we can access our git folder using FamixCallGraphBridge current root.

Let’s add some syntactic suggar:

FamixCallGraphBridge class >> #resources

  ^ self root / 'resources'

FamixCallGraphBridge class >> #sources

  ^ self resources / 'sources'

We can now access our java projects doing FamixCallGraphBridge current sources.

This step is almost done, but in order for our tests to work in a github action (for example), we need two little tweaks.

In our smalltalk.ston file, we need to register our project in Iceberg (because GitBridge uses Iceberg to access the root folder).

SmalltalkCISpec {
  #loading : [
    SCIMetacelloLoadSpec {
      #baseline : 'FamixCallGraph',
      #directory : 'src',
     #registerInIceberg : true   "<== This line"
    }
  ]
}

Also, in our github action we need to be sure that the checkout action will get enough info for git bridge to run and not the minimal ammount (which is the default) adding a fetch-depth: option.

steps:
  - uses: actions/checkout@v4
    with:
      fetch-depth: '0'

Parse and import your model

Now we need to be able to parse our project. For this, we will use a Java utility thaht is directly in Moose: FamixJavaFoldersImporter.

We can parse and receive a model doing:

model := (FamixJavaFoldersImporter importFolders: { FamixCallGraphBridge sources / 'example1' }) anyOne.

Tests implementation

Now that we can access the model it is possible to implement our tests.

I’m starting by an abstract class:

TestCase << #FamixAbstractJavaCallGraphBuilderTestCase
  slots: { #model . #graph };
  package: 'Famix-CallGraph-Tests'

Now I will create a TestCase that needs my java model

FamixAbstractJavaCallGraphBuilderTestCase << #FamixJavaCHAExample1Test
  slots: {};
  package: 'Famix-CallGraph-Tests'

And now I will create a setup importing the model and creating a call graph:

FamixAbstractJavaCallGraphBuilderTestCase >> #setUp

  super setUp.
  model := (FamixJavaFoldersImporter importFolders: { self javaSourcesFolder }) anyOne.
  graph := (FamixJavaCHABuilder entryPoints: self entryPoints) build

FamixJavaCHAExample1Test >> #javaSourcesFolder
  "Return the java folder containing the sources to parse for those tests"

  | folder |
  folder := FamixCallGraphBridge sources / 'example1'.

  folder ifAbsent: [ self error: 'Folder does not exists ' , folder pathString ].

  ^ folder

And now you have your model available for the testing!

Optimization

I am using this technic to tests multiple projects such as parsers or call graph builders. In those projects I do touch my model and the setup can take time. So I optimize this setup in order to build a model only once for all the test case using a TestResource.

In order to do this we can remove the slots we added to FamixAbstractJavaCallGraphBuilderTestCase and create a test resource that will hold them

TestResource << #FamixAbstractJavaCallGraphBuilderTestResource
  slots: { #model . #graph };
  package: 'Famix-CallGraph-Tests'

Then we can move the setup to this class

FamixAbstractJavaCallGraphBuilderTestResource >> #setUp

  super setUp.
  model := (FamixJavaFoldersImporter importFolders: { self javaSourcesFolder }) anyOne.
  graph := (FamixJavaCHABuilder entryPoints: self entryPoints) build

Personally I’m also adding a tearDown cleaning the vars because TestResources are singletons and I do not want to hold a model in memory all the time.

Then I’m creating my test resource for the example1 project.

FamixAbstractJavaCallGraphBuilderTestResource << #FamixJavaCHAExample1Resource
  slots: {};
  package: 'Famix-CallGraph-Tests'

FamixJavaCHAExample1Resource >> #javaSourcesFolder
  "Return the java folder containing the sources to parse for those tests"

  | folder |
  folder := FamixCallGraphBridge sources / 'example1'.

  folder ifAbsent: [ self error: 'Folder does not exists ' , folder pathString ].

  ^ folder

And now we can declare that the TestCase will use this resource:

FamixJavaCHAExample1Test class >> #resources
    ^ { FamixJavaCHAExample1Resource }

The model then become accessible like this:

FamixJavaCHAExample1Resource >> #model

  ^ self resources anyOne current model

Simplify your life

Here is a few tricks I use to simplify even better the setting of my tests cases

Automatic java source folder detection

The first one is to make automatic the detection of the java source folder by using the name of the test cases:

FamixAbstractJavaCallGraphBuilderTestResource >> #javaSourcesFolder
  ^ self class javaSourcesFolder

FamixAbstractJavaCallGraphBuilderTestResource class >> #javaSourcesFolder
  "Return the java folder containing the sources to parse for those tests"

  | folder |
  folder := FamixCallGraphBridge sources / ((self name withoutPrefix: 'FamixJavaCHA') withoutSuffix: 'Resource') uncapitalized.

  folder ifAbsent: [ self error: 'Folder does not exists ' , folder pathString ].

  ^ folder

We can now remove this method from all subclasses! But makes sure the name of your source folder matches the name of the tests ressource ;)

Automatic test resource detection and access

We can do the same with the detection of the test resource in the test case.

FamixAbstractJavaCallGraphBuilderTestCase class >> #resources

  ^ self environment
      at: ((self name withoutSuffix: 'Test') , 'Resource') asSymbol
      ifPresent: [ :class | { class } ]
      ifAbsent: [ {  } ]

FamixAbstractJavaCallGraphBuilderTestCase class >> #sourceResource

  ^ self resources anyOne current

FamixAbstractJavaCallGraphBuilderTestCase >> #sourceResource
  "I return the instance of the test resource I'm using to build the sources of a java project"

  ^ self class sourceResource

FamixAbstractJavaCallGraphBuilderTestCase >> #model

  ^ self sourceResource model

Et voila ! Now adding a test case ready to use on a new java project is equivalent to create a test case:

FamixAbstractJavaCallGraphBuilderTestCase << #FamixJavaCHAExample2Test
  slots: {};
  package: 'Famix-CallGraph-Tests'

And the resource associated!

FamixAbstractJavaCallGraphBuilderTestResource << #FamixJavaCHAExample2Resource
  slots: {};
  package: 'Famix-CallGraph-Tests'

Nothing much.

Easily find the sources of the tested project

A last thing I am doing to simplify thing is to implement a method to access easily the sources.

FamixJavaCHAExample1Test >> #openSources

  <script: 'self new openSources'>
  self resources anyOne javaSourcesFolder openInOSFileBrowser

Other languages than Java

It is possible to do the same thing for other languages than java but maybe not exactly in the same way than in this blogpost for the section “Parse and import your model”. But this article is meant to be an inspiration!

I hope this helps improve the robustness of our projects :)

Generation of new FAST-Language metamodel using Pharo-Tree-Sitter project

Sep 15, 2025

Aless Hosry

Research engineer at Berger-Levrault

If you’re here, you’re probably interested in creating a new FAST metamodel and expanding Moose to represent the AST (Abstract Syntax Tree) of an additional language. In this post, we explain to you how to generate a “First version” of a new FAST-Language metamodel using the project Pharo-Tree-Sitter. To be able to understand that, we assume you are already familiar with:

Tree-Sitter
Pharo-Tree-Sitter
FAST
Metamodel generators

Tree-Sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. It is able to parse a large variety of programming languages such as Java, C++, C#, Python and many others.
Pharo-Tree-Sitter is a project developed in Pharo that integrates the original Tree-Sitter parsers and allows visualizing their results (such as ASTs) directly in Pharo. It relies on the FFI protocol, which requires the corresponding libraries depending on the OS (.dll, .so, or .pylib) to be present in Pharo’s VM folders. The project supports parsing several languages, and for some of them (like Python, TypeScript, and C), the library generation is automated. You can find more details in the repository’s README. This is the project that we will use to generate a new FAST-Language metamodel, so you need to download it into your Pharo image.
FAST means Famix AST. Contrary to Famix that represent application at a high abstraction level, FAST uses a low-level representation: the AST. FAST defines a set of traits that can be used to create new meta-models compatible with Moose tools. When developing a new FAST-Language metamodel, you will rely on these FAST traits to structure your metamodel. However, this does not apply to the “First version” described in this post, but rather to the upgraded versions when you evolve and refine it.
Metamodel generator is a Pharo library used to create new metamodels such as FAST-Java, Famix-Java, or FAST-Fortran. The generation of any new version of a FAST-Language metamodel can only be achieved through the metamodel generator. As you will see in this post, Pharo-Tree-Sitter enables you to define a new metamodel generator. Once executed, it produces the corresponding FAST-Language metamodel. We will explain this process in more detail in the following sections.

Download Pharo-Tree-Sitter and get the correspondent libraries

First you need to create a Moose image and download Pharo-Tree-Sitter:

Metacello new
  baseline: 'TreeSitter';
  repository: 'github://Evref-BL/Pharo-Tree-Sitter:main/src';
  load.

Once downloaded, you need to make sure that Pharo-Tree-Sitter is able to parse the language that you intend to create the metamodel for. If it is not included, you need to follow the instructions in the readme file of this repository and add the new language. For this blog post we will assume that the language is already supported and we will continue with “Python” 🐍🐍🐍.

To be able to continue, and if this is the first time you’re using this project (Pharo-Tree-Sitter), you need to launch the tests of python in package “TreeSitter-Tests” class “TSParserPythonTest”. This is needed to launch the process of downloading the original tree-sitter and tree-sitter-python projects from GitHub, generating the correspondent libraries and moving them to the correspondent VM folder based on the image version you create: for example Moose 12. If you create another image of another version, you need to launch the tests again to make sure the libraries are again moved to the correspondent folder. Now that you have the libraries, you can parse python code and get an AST, but not FAST-Python model. So in the next step we explain how this can be possible.

Create the first version of the metamodel (FAST-Python in our example)

Don’t worry, not too much to be done, but a snippet of code needs to be written and executed. But we have to explain to you first how it is working.

Explaining package TreeSitter-FAST-Utils

This package contains two main classes: “TSFASTBuilder” and “TSFASTImporter”. For our task we will rely on the first one. The second is used to make the transition between an AST generated by TreeSitter and a FAST-Language model.

“TSFASTBuilder” contains a set of methods responsible for generating a new metamodel generator:

#tsLanguage: is used to set an instance of TSLanguage, which is TSLanguage python in our case.
#createMetamodelGeneratorClass is responsible for creating a new package and a class inside. By default, the class name will be “FASTLanguageNameMetamodelGenerator” which is “FASTPythonMetamodelGenerator” and the package name is “FAST-LanguageName-Model-Generator”. This method also calls another one “typesToReify”, which gets all the symbols from the initial TreeSitter project (using an FFI call), and add them as slots in the class definition. These symbols represent the nodes of the language in question like “class” for Python.
#addPrefixMethodIn: adds #prefix method on the class side of the metamodel generator class. By default it is FASTLanguage.
#addPackageNameMethodIn: adds #packageName method on the class side of the metamodel generator class. By default it’s ‘FAST-Language-Model’.
#addSubmetamodelsMethodIn: adds #submetamodels method on the class side of the metamodel generator class, and by default it contains FASTMetamodelGenerator.
#addDefineClassIn: adds #defineClasses method. In this method slots are defined, starting by #entity then all the symbols imported from TreeSitter.
#addDefineTraitsIn: adds #defineTraits method. By default FASTTEntity trait is created.
#addDefineHierarchyIn: adds #defineHierarchy method. By default only #entity relation is defined with FASTTEntity.
#addDefineRelationsIn: adds #defineRelations method. By default only #entity relations are defined with genericChildren and genericParent.

Voilà, now that you understand how it works, we will show you how to generate one for Python:

tsb := TSFASTBuilder new.
tsb languageName: 'Python'.
tsb tsLanguage: TSLanguage python.
tsb build.

This will generate the metamodel generator. Now that the generator is created you can use it to generate the metamodel:

FASTPythonMetamodelGenerator new generate.

Now you can access the packages and classes created: ‘FAST-Python-Model’ and ‘FAST-Python-Model-Generator’.

From now on you have to handle the metamodel manually. You have to add missing traits (including FAST Traits), properties that should be imported from TreeSitter… You benefit from the importer to handle the parsing on the metamodel side. You can create a package for tools having a #parse method doing this for example:

| parser tsLanguage importer |

Smalltalk image garbageCollect.

parser := TSParser new.
tsLanguage := TSLanguage python.
parser language: tsLanguage.

importer := TSFASTImporter new.
importer tsLanguage: tsLanguage.
importer languageName: 'Python'.
importer originString: string.

^ importer import: (parser parseString: string) rootNode "pay attention to #source: "

You can check FASTTypeScript for more details.

N.B: We recommend you to parse many python examples (you can find a lot in the main project of TreeSitter-Python), using Pharo-Tree-Sitter project. Once parsed you can inspect in Pharo the properties for each node using #collectFieldNameOfNamedChild and find the properties for each one. Then you can add them in #defineRelations of the metamodel.

That’s it for now!

Visualizing java dependencies between microservices

Sep 12, 2025

Benoit Verhaeghe

Moose expert

In July, I had to analyze the dependencies between microservices for Berger-Levrault. To do so, I chose to use the Moose tool.

Here is the simple but effective process I followed.

About the project structure

The backend I analyzed follows a common pattern. In the git repository, there is a folder api containing the microservices, and a folder lib with resources for each microservice. There is also an additional project called lib-common.

Thus, the microservice home is composed of a project named api-home and a project named lib-home.

Directorysrc
- Directoryapi
  - Directoryapi-home
    Directorysrc/
    …
- Directorylib
  - Directorylib-home
    Directorysrc/
    …
  - Directorylib-common/
    …

We wanted to check that dependencies were correctly implemented in the project:

no api project should directly depend on another api (API calls are allowed, but not classic Java dependencies)
each api project can depend on its equivalent lib project
lib projects can depend on lib-common

Let’s see how to perform this check with Moose.

Loading the project

To perform the analysis, I used Moose and followed these steps:

I installed the latest version of Moose.
I cloned the repository containing the backend to analyze.
I installed the project dependencies with:
Terminal window
```
mvn clean install
```

I used VerveineJ to generate a model of the code. To avoid version issues, I used the Docker version of VerveineJ, which gave me a model.json file:

docker run -v /path/to/my/project:/src -v /home/badetitou/.m2/repository:/dependency  ghcr.io/evref-bl/verveinej:v3.3.1 -alllocals -anchor assoc -format json -o model.json

I loaded the model into a Moose 12 image by drag-and-dropping the model.json file into the running Moose image.

Building a dependency visualization

Moose provides ready-to-use visualizations to represent dependencies. In my case, I chose to use the Architectural map. This visualization presents the entities of the model (packages, classes, methods) as a tree and displays the associations between them (i.e., the dependencies).

I first asked this visualization to display all the classes. It works, but does not allow us to distinguish the different microservices.

Unhelpful architectural map

The main problem is that too much information is displayed and we cannot see the microservices. To fix this, I used Moose’s tag feature. A tag allows you to associate a color and a name to an entity.

So I tagged the classes of my system depending on their location in the repository.

To do this, in a Moose Playground, I used the following script (adapt it to your context 😉):

model allTaggedEntities do: [ :entity | entity removeTags ].

((model allWithSubTypesOf: FamixJavaType) reject: [ :type | type sourceAnchor isNil ]) do: [ :class |
    class sourceAnchor ifNotNil: [ :sa |
        (sa fileName beginsWith: './services/api-A') ifTrue: [ class tagWithName: 'A' ].
        (sa fileName beginsWith: './services/api-B') ifTrue: [ class tagWithName: 'B' ].
        (sa fileName beginsWith: './services/api-C') ifTrue: [ class tagWithName: 'C' ].

        (sa fileName beginsWith: './libraries/lib-A') ifTrue: [ class tagWithName: 'lib-A' ].
        (sa fileName beginsWith: './libraries/lib-common') ifTrue: [ class tagWithName: 'lib-common' ].
        (sa fileName beginsWith: './libraries/lib-B') ifTrue: [ class tagWithName: 'lib-B' ].
        (sa fileName beginsWith: './libraries/lib-C') ifTrue: [ class tagWithName: 'lib-C' ].
    ]
].

(model allWithSubTypesOf: FamixJavaType) reject: [ :type | type tags isEmpty ]

The result is not perfect yet because entities are not grouped by tag. To fix this, simply select the tag to add option in the architectural map settings.

Correct architectural map

You then get a clear visualization of the links between the microservice projects and the libraries they use. We see that no api is linked to an incorrect lib project. We also notice that microservice B is linked to lib-B as well as lib-common. Maybe this link to lib-common should be removed? But that’s another story…

Building a Famix importer with TreeSitterFamixIntegration

May 11, 2025

Toky RATOLOJANHARY

Software engineer intern

Analyzing source code starts with parsing and for this you need semantic understanding of how symbols in the code relate to each other. In this post, we’ll walk through how to build a C code importer using the TreeSitterFamixIntegration framework.

Prerequisites

Basic knowledge of Famix and Moose.
Basic knowledge of what Tree-sitter is.
Familiarity with the Visitor design pattern. You can check this blog post which explains the Visitor pattern in the context of tree-sitter ASTs.

Overview of TreeSitterFamixIntegration

The TreeSitterFamixIntegration stack provides tools to ease the development of Famix importers using tree-sitter. This package offers some great features for parsing such as (but not limited to):

Useful methods for source management (getting source text, positions, setting sourceAnchor of a famix entity).
Error handling to help catch and report parsing issues
a better TreeSitter node inspector (which is very helpful when debugging)
Utility to efficiently import and attach single-line and multi-line comments to their corresponding entities.
Context tracking for symbol scope (no more context push and pop 😁)

There is a detailed documentation you can check that explain every features.

Step 1: Setting up our environment

After creating a new Moose image, let’s start by loading the necessary packages.

The C Metamodel

First, we need to load the C metamodel. This metamodel provides the Famix classes that represent C entities such as functions, structs, variables, etc.

Metacello new
    baseline: 'FamixCpp';
    repository: 'github://moosetechnology/Famix-Cpp:main';
    load

The TreeSitterFamixIntegration project

Next, we need to load the TreeSitterFamixIntegration project. It provides both pharo-tree-sitter and SRSymbolResolver.

 Metacello new
    githubUser: 'moosetechnology' project: 'TreeSitterFamixIntegration' commitish: 'main' path: 'src';
    baseline: 'TreeSitterFamixIntegration';
    load

The project structure

Now that we have the necessary packages loaded, we can create our C importer.

Create a new package named Famix-C-Importer.

The minimum classes we will have to create inside are:

FamixCimporter: This class will be responsible for importing C files and parsing them using Tree-sitter.
FamixCVisitor: This class will walk through the parsed C syntax tree and create Famix entities.
FamixCCommentVisitor: This class will handle comments and attach them to the corresponding Famix entities.

The `FamixCimporter` class

The FamixCimporter class is the entry point for our importer. It will handle the parsing of C files into Abstract Syntax Trees (AST).

This class will inherit from FamixTSAbstractImporter (defined in the TreeSitterFamixIntegration project), which provides the necessary methods for importing and parsing C files using Tree-sitter.

FamixTSAbstractImporter << #FamixCImporter
    slots: {};
    package: 'Famix-C-Importer'

Now, let’s override some methods to set up our importer:

1. `treeSitterLanguage` method

FamixCImporter >> treeSitterLanguage
    "Should return a TreeSitter language such as  TSLanguage python"

    ^ TSLanguage cLang

This method returns the Tree-sitter language we want to use for parsing. In this case, we are using the C language. You can find the available languages in the Pharo-Tree-Sitter package.

2. `visitorClass` method

FamixCImporter >> visitorClass

    ^ FamixCVisitor

It returns the visitor class that will walk through the parsed syntax tree and create Famix entities. We will define this class later.

3. `importFileReference:` method

FamixCImporter >> importFileReference: aFileReference

    aFileReference isFile
        ifTrue: [
            (self isCFile: aFileReference) ifFalse: [ ^ self ].
            self importFile: aFileReference
        ]
        ifFalse: [
            aFileReference children do: [ :each |
            self importFileReference: each
        ].
    ]

This method calls importFile: on all C files recursively found in a directory. We will add more logic to this method later but for now, it serves as a starting point for our importer.

The isCFile: method checks if the file has a .c or .h extension.

FamixCImporter >> isCFile: aFileReferencemon
    ^ #( 'c' 'h' ) includes: aFileReference extension

The importFile: method is defined in the FamixTSAbstractImporter class (provided by the TreeSitter-Famix-Integration project). It parses the file content to create an AST and then passes the visitor (the FamixCVisitor that we previously defined) to walk through the AST.

The `FamixCVisitor` class

The FamixCVisitor class is responsible for walking through the parsed AST and creating Famix entities. It will inherit from FamixTSAbstractVisitor, which provides the necessary methods for visiting Tree-sitter nodes.

FamixTSAbstractVisitor << #FamixCVisitor
    slots: {};
    package: 'Famix-C-Importer'

For this class, we will just need to override one method:

`modelClass` method

FamixCVisitor >> modelClass

    ^ FamixCModel

It returns the Famix metamodel class that will be used to create Famix entities. In this case, we are using FamixCModel which is in the Famix-Cpp package.

Let’s test our importer so far

Now that we have our importer and visitor classes set up, we can already test it. To test our importer, we can create a simple C file and import it using the FamixCImporter class.

#include <stdio.h>

int aGlobalVar = 1;
int main() {
    int aLocalVar;
    aLocalVar = aGlobalVar + 2;
}

To import this file, we can use the following code in the Playground (cmd + O + P to open it):

import c project

Before running the above code, open the Transcript to see the logs (cmd + O + T to open it).

Then select all the code and run it by inspecting it (cmd + I or click the “Inspect” button). You will get something similar to this.

Model inspector

The above screenshot shows what is inside our model. We can see that there is pretty much nothing there yet apart from the SourceLanguages which is added by default by TreeSitterFamixIntegration.

Now if we look at the Transcript, we can see that the importer has imported the file but we didn’t implement the visitor methods yet for every node in the AST, so no Famix entities were created.

transcript log

If you want to inspect the corresponding AST of our test file, you can do something similar to what is in this other blog post on tree-sitter.

translation unit AST

Step 2: Our first Famix entities

In this section we are going to see some examples of visiting methods for creating compilation unit and function entities.

CompilationUnit entities

Let’s go back to our FamixCImporter class and from there we will create a CompilationUnit and HeaderFile entities. We need to do that there because we have to check if the file is a header file or a source file.

FamixCImporter >> importFileReference: aFileReference
    aFileReference isFile
    ifTrue: [
        | fileEntity |
        (self isCFile: aFileReference) ifFalse: [ ^ self ].
        fileEntity := aFileReference extension = 'c'
                        ifTrue: [
                         visitor model newCompilationUnitNamed: aFileReference basename.
                  ]
                        ifFalse: [
                         visitor model newHeaderFileNamed: aFileReference basename.
                  ].

        visitor
          useCurrentEntity: fileEntity
          during: [ self importFile: aFileReference ] ]

    ifFalse: [
        aFileReference children do: [ :each |
          self importFileReference: each
                    ].

        ^ self ]

We use the useCurrentEntity:during: to provide a context for the visitor. This is same as pushing the fileEntity to a context, visit children and then popping it from the context. And it will set the current entity to the fileEntity.

Now try importing a whole directory containing C files. You should see that the importer creates a FamixCHeaderFile for each header file and a FamixCCompilationUnit for each source file.

Source Anchors

To set the source anchor for any Famix entity, we can use the setSourceAnchor: aFamixEntity from: aTSNode method provided by the FamixTSAbstractVisitor class. This method takes a Famix entity and a Tree-sitter node.

We can use it to set the source anchor for our fileEntity . Go to visitTranslationUnit: in the FamixCVisitor class and add the following code:

FamixCVisitor >> visitTranslationUnit: aNode
  self setSourceAnchor: self currentEntity from: aNode.
  self visitChildren: aNode "for not cutting the traversal"

Now if we import our test.c file again, we will see that the CompilationUnit entity has a source anchor.

Function entities

Next, we will create FamixCFunction entities for each function declaration in the C file. We will do this in the visitFunctionDefinition: method of the FamixCVisitor class.

But first we need to know where the function name is located to create the FamixCFunction entity. Create the method and put a halt there to inspect the node.

visitFunctionDefinition: aNode
  self halt.
  self visitChildren: aNode.

function definition ast

If we look at the function definition node, we can see that the function name is in the identifier node, which is a child of the function declarator node.

To get that name, there are two ways:

visit the function_declarator until the identifier returns its name using self visit: aNode
get it by child field name using aNode _fieldName that returns the child node with the given field name. And you don’t need to implement the _fieldName method because it is already handled by the framework.

For simplicity, and to show other available features in the framework, we will use the second way.

Let’s inspect the function definition node to see what fields it has.

function definition fields

So if we do aNode _declarator it will return the function declarator node

function declarator And if we do aNode _declarator from the function_declarator it will give us the identifier that we want.

Now we can create the function entity and set its name and source anchor.

visitFunctionDefinition: aNode

    | declaratorNode identifierNode functionName entity |

    declaratorNode := aNode _declarator.
    identifierNode := declaratorNode _declarator.
    functionName := identifierNode sourceText.


    entity := (model newFunctionNamed: functionName) functionOwner: self currentEntity.

    self setSourceAnchor: entity from: aNode.

    self useCurrentEntity: entity during: [ self visitChildren: aNode ]

The self currentEntity returns the compilation unit entity which is the parent of the function entity.

And before visiting the children, we set the current entity to the newly created function entity using useCurrentEntity:during:. This will allow us to create other entities that are related to this function, such as parameters and local variables.

Local and Global Variables

The difference between local and global variables is that local variables are declared inside a function, while global variables are declared outside any function.

Implementation

To create the variable entities, we will create the visitDeclaration: method in the FamixCVisitor class. This method is called for each variable declaration in the C file.

FamixCVisitor >> visitDeclaration: aNode
  "fields: type - declarator"

  | varName entity |

  self visit: aNode _type.

  varName := self visit: aNode _declarator.

  entity := self currentEntity isFunction
              ifTrue: [
                  (model newLocalVariableNamed: varName)
                    parentBehaviouralEntity: self currentEntity;
                    yourself ]
              ifFalse: [
                  (model newGlobalVariableNamed: varName)
                    parentScope: self currentEntity;
                    yourself ].

  self setSourceAnchor: entity from: aNode.

The visitDeclaration: method does the following:

Visits the variable’s type. This will allow us to parse its type information.
Retrieves the variable name by visiting the declarator field. If the variable is initialized, this will be an init_declarator node; otherwise, it will be an identifier. We should implement visit methods for both cases to extract the name correctly.

FamixCVisitor >> visitInitDeclarator: aNode
  "fields: declarator - value"

  self visit: aNode _value.
  ^ self visit: aNode _declarator "variable name is in the declarator node"

FamixCVisitor >> visitIdentifier: aNode

    ^ aNode sourceText "returns the name of the variable"

Creates a variable entity, either a local variable or a global variable, depending on whether the current entity is a function or not.
Sets the source anchor for the variable entity using the setSourceAnchor:from: method.

Step 3: Symbol resolution

In this section, we will implement the symbol resolution for our C importer. This will allow us to resolve references to variables and functions in our C code.

As an example, we will resolve the reference to the local variable aLocalVar in the main function, which will be represented as a famix write access entity.

Implementation

Create the write access entity

To create the write access entity, we will implement the visitAssignmentExpression: method in the FamixCVisitor class. This method is called for each assignment expression.

visitAssignmentExpression: aNode
  "fields: left - right"
  | access leftVarName |

  leftVarName := self visit: aNode _left.

  access := model newAccess accessor: self currentEntity;
          isWrite: true;
          yourself.

  self setSourceAnchor: access from: aNode.

Using SRIdentifierResolvable

Add the following code to the visitAssignmentExpression: method to resolve the variable:

visitAssignmentExpression: aNode
  "fields: left - right"
  | access leftVarName |

  leftVarName := self visit: aNode _left.

  access := model newAccess accessor: self currentEntity;
          isWrite: true;
          yourself.

  self setSourceAnchor: access from: aNode.

  self
    resolve: ((SRIdentifierResolvable identifier: leftVarName)
          expectedKind: {
            FamixCLocalVariable.
            FamixCGlobalVariable };
          yourself)

    foundAction: [ :variable :currentEntity | access variable: variable ].

The resolve: aResolvable foundAction: aBlockClosure method is provided by the FamixTSAbstractVisitor class.

It takes two arguments:

aResolvable: an instance of SRIdentifierResolvable. This resolvable is created with the identifier (the variable name) and the expected kinds of entities (in this case, either a local variable or a global variable). The identifier: method sets the identifier to resolve, and the expectedKind: method sets the expected kinds of entities that can be resolved.
aBlockClosure: a block that will be executed when the resolvable is resolved (we found the variable). In this case we set the variable of the access entity to the resolved variable.

See the SRIdentifierResolvable documentation

Custom resolver

The SRIdentifierResolvable is a generic resolver that can be used to resolve identifiers. However, in some cases, we may need to create a custom resolver to handle specific cases. In that case, we can create a class that inherits from SRResolvable and override the resolveInScope:currentEntity: method to implement our custom resolution logic.

For more information about the symbol resolver, you can check the documentation.

Step 4: Parse comments

The TreeSitterFamixIntegration package provides a utility to parse comments and attach them to the corresponding Famix entities. This is done using the FamixCCommentVisitor class.

let’s add some comments to our test.c file:

#include <stdio.h>

int aGlobalVar = 1;

/*
entry point of our programm
*/
int main() {
    int aLocalVar; // a local variable
    aLocalVar = aGlobalVar + 2;
}

Implementation

To parse comments, we will create the FamixCCommentVisitor class that will inherit from FamixTSAbstractCommentVisitor. And we just need to override the visitNode: method.

FamixCCommentVisitor >> visitNode: aNode

  aNode type = #comment ifTrue: [
      (aNode sourceText beginsWith: '/*')
        ifTrue: [ self addMultilineCommentNode: aNode ]
        ifFalse: [ self addSingleLineCommentNode: aNode ] ].

  super visitNode: aNode

We use the addMultilineCommentNode: and addSingleLineCommentNode: methods provided by the FamixTSAbstractCommentVisitor class to add the comment to the model.

For a detailed explanation of how to use the comment visitor, you can check the documentation.

Last thing to do is to use the comment visitor somewhere in our importer. We can do that everytime we finish visiting every children of translation unit node.

FamixCImporter >>visitTranslationUnit: aNode

  self setSourceAnchor: self currentEntity from: aNode.
  self visitChildren: aNode.

  FamixCCommentVisitor visitor: self importCommentsOf: aNode

We use the visitor: aFamixVisitor importCommentsOf: aNode method to import the comments of the translation unit node.

Summary

In this blog post, we have seen how to build a Famix importer for C code using the TreeSitterFamixIntegration framework. We have covered the following topics:

Setting up the environment and creating the importer and visitor classes.
Creating Famix entities for compilation units, functions, and variables.
Implementing symbol resolution for local and global variables.
Parsing comments and attaching them to the corresponding Famix entities.

This is just a starting point for building an importer with this stack. You have to implement more tests and methods to handle other entities. The TreeSitterFamixIntegration framework provides a lot of other utilities we didn’t cover to help you with that.

Blog

Context

The optimization

Result of the optimization

Use this optimization in your project

The idea

Basic setup

Create your java code

Use GitBridge

Parse and import your model

Tests implementation

Optimization

Simplify your life

Automatic java source folder detection

Automatic test resource detection and access

Easily find the sources of the tested project

Other languages than Java

Download Pharo-Tree-Sitter and get the correspondent libraries

Create the first version of the metamodel (FAST-Python in our example)

Explaining package TreeSitter-FAST-Utils

About the project structure

Loading the project

Building a dependency visualization

Prerequisites

Overview of TreeSitterFamixIntegration

Step 1: Setting up our environment

The C Metamodel

The TreeSitterFamixIntegration project

The project structure

The FamixCimporter class

1. treeSitterLanguage method

2. visitorClass method

3. importFileReference: method

The FamixCVisitor class

modelClass method

Let’s test our importer so far

Step 2: Our first Famix entities

CompilationUnit entities

Source Anchors

Function entities

Local and Global Variables

Implementation

Step 3: Symbol resolution

Implementation

Create the write access entity

Using SRIdentifierResolvable

Custom resolver

Step 4: Parse comments

Implementation

Summary

Useful links

The `FamixCimporter` class

1. `treeSitterLanguage` method

2. `visitorClass` method

3. `importFileReference:` method

The `FamixCVisitor` class

`modelClass` method