Modern, static code analysis tools for C++ and C provide a multitude of checkers out of the box, capable of detecting many different types of defect and violation. In addition, there is likely to be lots of configurable parameters that the adventurous (or instructed!) team can tinker with to try and bend some aspect of the tools operation more to their demands. Inevitably though, situations will arise where the tool simply cannot be tuned to detect a teams particular requirement(s). These could include specialist situations impossible to predict by the tool authors, or perhaps, the team simply has certain requirements that are too unique to their environment. In many of these cases, CodeSonar, our static analysis tool for C and C++, does actually provide a solution: custom checkers.

CodeSonar provides a rich API for the creation of custom checkers. This API is offered in a number of different languages – including C, C++, C#, Java, and Python. The API provides functionality allowing your custom checker to piggy back its analysis requirements on the existent analysis framework. In more detailed terms, as the analysis automatically traverses along the statically valid paths of execution through your codebase (known as symbolic execution), your checker gets to delve into the details of the currently visited source code location, where upon, you can extract the values and states of variables or pointers, and other important characteristics, which can then be used to detect your particular proprietary issues. Once such an issue has been determined, the API also provides methods for annotating the source with English commentary, as well as registering the issue so that its reported in the same way as any other built in checker.

However, some defects in code can be more stylistic in nature, more along the lines of standards compliance checks, such as many within the well known MISRA C suite of coding guidelines. These types of checks are more concerned with avoiding or mandating that certain language characteristics are used, such as the self explanatory rule 16.4 “Every switch statement shall have a default label”. In these cases, the deep capabilities of CodeSonar’s symbolic execution analysis are not required to detect the violation. Instead all that is needed is to pick out the existence (or not) of keywords or other source code tokens, the result of which will be a warning highlighting the deviation from the expected rule. In these types of case, the checker author can rely on something called the Abstract Syntax Tree (AST). The AST is the source code rearranged into a grammatically equivalent hierarchical format, where your code is organised into tree elements, along with various properties of the code, which are stored as attributes. This transformation is carried out by all compilers, but unlike most developers experience, Codesonars compiler outputs them. The purpose of this translation into the AST is to provide the source code in a form that the subsequent static analysis can traverse and analyse more easily, compared to the untokenised and unstructured file based source code.

As an example of such a checker, let’s consider the case where it’s disallowed to declare C++ class member variables as public. We will be using the C++ version of the API, as it offers several advantages over the C based API (easier debugging, simpler memory management to name two). What our custom checker simply needs to do is look in each class definition in the code base, and each time a member variable is declared with public access, issue a warning. So, the first thing we need to do is understand the AST hierarchy enough to locate the offending variables. Fortunately, CodeSonar provides a routine to dump out the AST for the currently visited location. Before we show that, here is a simple example program that contains two failures of this custom rule:

class classA
 {
public:
 int intA; // issue a warning here
private:
 int intB;
 };
class classB
 {
 public:
 int intC; // and another one here
private:
 int intD;
 };
 int main(void)
 {
 return 0;
 }

Yes, this is a simple example, but as with many simple bugs, as soon as the constituent parts are distributed through longer and more densely packed code, even multiple files, they become much harder to manually detect.

With the above in mind, here is an abbreviated snapshot of the corresponding AST (please click to enlarge plus there is a download of this AST and code at the end of this post):

Snapshot of AST

Snapshot of AST

The actual dump of the AST contains much much more detail; there are many more nodes, children and attributes (it’s very revealing to see how augmented the original source code is by the compiler), but this is enough to clearly see how the AST matches up to our example source code. At the root of the AST tree for the source file we show the first two children, source-file and file-scope. Source-file is just file related generalities such as name and directory. File-scope contains what we really care about. As file-scope implies, this particular tree contains children declared at the, wait for it…. file scope! I only show the types child here; there are several additional child nodes beneath file-scope. Regarding the types child (called “types:(cc:ast-list)”) following the AST logic, we are now being presented with the detailed set of types declared at file-scope. Again, the AST contains more types children than expected, but I’m almost only showing just the relevant parts. Following the next “+” after “types:(cc:ast-list)” is the first type which is a class called type-info, which is another example of something that the compiler injects into the source on our behalf. The next two children of types are our two classes “classA” and “classB”. In each case, we can see a further set of children detailing the class members (intA, intB, intC & intD). It is at this level that we also get the all important attributes (again, reduced for clarity), including the access modifier applied to the parent member variable.

With this appreciation of the AST in place, we can now write our custom checker. As mentioned in a previous blog post (“writing custom code checkers in codesonar”:  https://www.scl.com/writing-custom-code-checkers-in-codesonar/ ), there is a degree of necessary boiler place code which I’ll skip here. I’ll just step into the interesting bit. For ease of explanation, I have annotated the code below with explanatory comments:

 // Remember I mentioned "visiting" earlier? Well, here it is mentioned
 // right at the top. A visitor function is basically a call back that 
 // the analysis will invoke for each occurrence of the level of 
 // granularity of exposure your checker requires. My visitor callback 
 // (the () operator overload) only needs to be invoked for each new 
 // compilation unit ( essentially your source file(s) ) that the 
 // analysis first encounters. There are finer grained visitors such 
 // as procedure and program point.
 class my_compunit_visitor : public cs::visitor<cs::compunit>
 {
    void operator()(cs::compunit cu)
 {
 // The C++ API does away with function return codes, instead relying on 
 // exceptions try
 {
 // check the analysis is stepping into a source file authored by the user; 
 // there are other files injected in by Codesonar if (!cu.is_user())
 return;
// Get the AST for the current compilation unit
 vector<cs::ast_field> f = cu.get_ast(cs::ast_family::C_UNNORMALIZED).fields();
// check there is a field of type PRIMARY_SCOPE. This equates the the file-scope 
// child discussed above if (cu.get_ast(cs::ast_family::C_UNNORMALIZED).has_field(cs::ast_ordinal::UC_PRIMARY_SCOPE) )
 {
 // get hold of the field-scope field ......
 cs::ast_field field = cu.get_ast(cs::ast_family::C_UNNORMALIZED)[cs::ast_ordinal::UC_PRIMARY_SCOPE];
// ......and retrieve all of its children
 vector<cs::ast_field> fields = field.as_ast().fields();
// let's loop through them ......
 for (vector<cs::ast_field>::iterator it = fields.begin() ; it != fields.end(); ++it)
 {
 cs::ast_field a = (*it);
// ..... looking for the types field. -426 is the identifier matching "types"
 if (a.ordinal().unwrap() == -426) // types
 {
 // Now step through the children of the types field.
 vector<cs::ast_field> types = a.as_ast().fields();
 for (vector<cs::ast_field>::iterator it1 = types.begin() ; it1 != types.end(); ++it1)
 {
 // Now lets iterate through the children of the types field, which will 
 // include the declared classes
 cs::ast_field a1 = (*it1);
 vector<cs::ast_field> classes = a1.as_ast().fields(); 
// real classes will have a fields child, containing class member detail
 for (vector<cs::ast_field>::iterator it2 = classes.begin() ; it2 != classes.end(); 
++it2)
 {
 cs::ast_field a2 = (*it2);
// the "class" node in the AST contains several types of children ("source-correspondence" and "fields" in the above AST);
 // we're interested specifically in the class members (called "fields" in the AST)
 if (a2.ordinal().unwrap() == -209) // found the members field
 {
 // More iterating required. Loop through the children of the members field (called 
"fields:(cc:ast-list)" in the AST)
 vector<cs::ast_field> members = a2.as_ast().fields();
 for (vector<cs::ast_field>::iterator it3 = members.begin() ; it3 != members.end(); 
++it3)
 {
 // one of the attributes of each member is called position (not shown in the AST). 
 // This corresponds to the line number of this
 // member variable in the unpreprocessed source code.
 // store the current line in case its required later.
 string posAttr = (*it3).as_ast()[cs::ast_ordinal::UC_POSITION].as_string();
 int posStart = posAttr.find_first_of("0123456789");
 string str = posAttr.substr(posStart, string::npos);
 int linePos = atoi( str.c_str() );
// loop through the children of the current member variable, getting hold of 
each child called "(cc:field)"
 vector<cs::ast_field> field2 = (*it3).as_ast().fields();
 for (vector<cs::ast_field>::iterator it4 = field2.begin() ; it4!= field2.end(); ++it4)
 {
 cs::ast_field a3 = (*it4);
// again, there can be several types of children discovered; we want to loop 
// through them all and look for the members (called "source-correspondence")
 if (a3.ordinal().unwrap() == -398) // found the members field
 {
// Having found the "source-correspondence" field, loop through its children, 
// which are the leaf attributes of the AST, some of which correspond to 
// source level attributes of the member variable.
 vector<cs::ast_field> attribs = (*it4).as_ast().attributes();
 for (vector<cs::ast_field>::iterator it5 = attribs.begin() ; it5!= attribs.end(); ++it5)
 {
 cs::ast_field a4 = (*it5);
 string attrib = (*it5).as_string();
// look for the existence of the public attribute
 if (attrib.find("public") != string::npos)
 {
 // must have found an occurrence of the string "public"
 // so report a codesonar warning, using the earlier member variable line position.
 incorrect_signature.report(cu.get_sfileinst(), linePos, "Coding rule violation: 
 member declared with public access" );
 }
 }
 }
 }
 }
 }
 }
 }
 }
 }
 }
 } catch( const cs::result &r )
 {
 cout << "*** my_procedure_visitor exception: " << r << endl;
}
}
};

After compiling, and configuring CodeSonar to use the above new checker, CodeSonar will report occurrences of this new warning exactly as any other out of the box checker would.

Image

And in terms of the actual warning detail, it would appear as follows:

Rob Blog pic2

 

If you would like to download a txt file of the code for the example programs and the AST in this coding rule violations blog, please click here