How to Setup Semgrep Rules for Optimal SAST Scanning

Aviram Shmueli writer profile image
By Aviram Shmueli
Jit Logo
Edited by Jit Team

Updated June 18, 2024.

how to setup semrep rules for optimal sat scanning

DevOps teams are all too familiar with the frustration of finding a bug in their code that could have been caught earlier. Or worse, they have had to deal with the consequences of a security vulnerability that slipped through the cracks. There is no surprise then that tools like Semgrep are a devs’ best friend. 

Semgrep is considered the future of static analysis, and with its growing community of users and over 2500 rules in the Semgrep Registry, it's available to everyone. In this article, we'll explore the basics of Semgrep, how to run rules and set up optimal SAST scanning, and even how to write your own rules to catch those pesky bugs and security vulnerabilities.

An introduction to Semgrep

Semgrep is a popular open-source static analysis tool that identifies and prevents security vulnerabilities in source code. Initially developed by Facebook in 2009 for internal use, Semgrep has become a widely used tool among software developers and security professionals. Semgrep's unique selling point is its ease of use and flexibility in writing custom rules to detect specific security issues.

This tool is handy for software developers performing static analysis in their development workflow. It can quickly identify potential security issues and prevent security breaches and related problems. As an open-source tool, Semgrep has a growing community of contributors who help maintain and improve the tool, ensuring it stays up-to-date with the latest developments in the field.

Running rules with Semgrep

Semgrep rules are written in a simple, declarative language that specifies what code patterns to look for and what actions to take when a pattern is found. They can detect security vulnerabilities, code smells, and style violations.

Semgrep rules can be stored in various ways, including in YAML files, your code repository, or Semgrep's rule registry. They are categorized by the type of issue they detect, and you can filter them by language, file type, and other attributes.

For example, the following Semgrep rule detects the use of insecure cryptographic functions:

name: Insecure Cryptography description: Detects use of insecure cryptographic functions patterns: - pattern: MD5\( - pattern: SHA1\( - pattern: DES\( - pattern: RC4\( - pattern: PBKDF1\( - pattern: cbc​

Types of Semgrep rules

Semgrep Existing Rules: These are the default rules that are included in the Semgrep rule registry. The Semgrep development team creates and maintains these rules covering various potential security vulnerabilities and coding errors. Developers can run these rules as-is or customize them to fit their specific codebase better.

Local Rules (Ephemeral and YAML-defined): Local rules are custom rules that developers can create to scan their codebase for specific issues. There are two types of local rules: ephemeral and YAML-defined.

  • YAML-defined rules are defined in a YAML file and can be reused across multiple scans. You can customize them to scan for specific issues in a codebase, making them a powerful tool for catching potential problems early in development.

Setting up Semgrep Rules for Optimal SAST Scanning

Semgrep rules are designed to identify specific patterns of code that create a potential security vulnerability. The rules include a set of regular expressions or syntax trees to match patterns of code that indicate security vulnerabilities.

Here's an example of setting up a Semgrep rule to detect potential path traversal vulnerabilities in Python code. Path traversal attacks, or directory traversal attacks, aim to access sensitive files and directories stored outside the web root folder.

Semgrep rules

This rule is a security check to identify potential path traversal vulnerabilities in Python code. 

How should we read this?

  • The rule has an ID of "path-traversal."
  • The rule applies to the Python language.
  • The rule will generate a warning message if a potential path traversal vulnerability is detected.
  • The severity of the warning is defined as a "WARNING.”
  • The pattern being checked for uses the os.path.join function in a way that includes unsanitized user input ($INPUT). The code constructs a file path by concatenating user input with other path elements without first validating or sanitizing the user input. This allows attackers to manipulate the path and access files or directories outside the intended scope.

Overall, this rule helps developers identify and prevent potential path traversal by catching unsafe use of the os.path.join function.

It's essential to acknowledge that there's no one-size-fits-all tool for code security. Each solution has its strengths and weaknesses, and the effectiveness of a security tool largely depends on the specific use case, programming language, tool customization, and development environment. Semgrep is a powerful and versatile choice for Static Application Security Testing (SAST), offering customizable rules and support for multiple programming languages. However, it's crucial to understand that there may still be some miss detections (False negatives) or false alerts (False positives), like with any other SAST tool.

Semgrep's Rule Board

Semgrep's Rule Board is a powerful tool that allows developers to access a vast library of pre-existing rules to scan their code for potential vulnerabilities. To use Semgrep's Rule Board, developers can simply add the desired ruleset to their configuration file, and the tool will automatically download and run those rules during the scanning process.

For example, to add a ruleset for scanning Django code for potential security issues, developers can add the following line to their configuration file:

rules:   -​

This will download the django ruleset from Semgrep's Rule Board and apply it to the scanning process.

The Rule Board also allows developers to create and share their rulesets, making it a collaborative platform for improving code security across the development community. Once a custom ruleset is created, it can be added to the configuration file using the same syntax as pre-existing rulesets.

Writing your own Semgrep rules

Writing your own Semgrep rules can be a powerful way to customize your SAST scanning process and target specific issues in your code. To get started, you'll need to have some familiarity with the Semgrep syntax and be able to identify the types of problems you want to scan for.

To write your own Semgrep rules, you'll need to start by creating a new rule file. You can do this by running the semgrep --init command and selecting the language you want to create a rule for. This will generate a new rule file with some basic boilerplate code that you can modify to suit your needs.

Once you've made your rule file, you can start writing rules that target specific issues in your code. For example, you might create a rule that scans for SQL injection vulnerabilities by looking for instances where user input is concatenated directly into a SQL query.

To write this type of rule, you would use the Semgrep syntax to define a pattern that matches the vulnerable code. For example, use sql_concat to check instances where user input is concatenated directly into a SQL query.

custom smegrep rule

​For more information about Semgrep configuration files, refer to the official Semgrep documentation on writing rules and creating configuration files. To set up your custom rules for optimal SAST scanning, you should consider organizing them by category and reviewing them regularly to ensure they are up-to-date and effective. You should also consider integrating Semgrep with your custom rules into your CI/CD pipeline to ensure they run consistently and thoroughly.

Running a SAST scan with Semgrep

Running a SAST scan with Semgrep is a simple process that requires just a few commands in the terminal. In this tutorial, we will walk through the steps to run a scan using Semgrep.

Step 1: Install Semgrep. The first step is to install Semgrep using the following command:

$ curl -L | bash​

This will download and install the latest version of Semgrep.

Step 2: Create a Semgrep configuration file. The next step is to create a configuration file for Semgrep. This file specifies which rules should be run during the scan and which files to scan. Here is an example configuration file:

smegrep configuration

Step 3: Run the Semgrep scan. Once the configuration file has been created, the Semgrep scan can be run using the following command:

$ semgrep --config=<your_config> <your_code_directory>​

This command runs Semgrep with the configuration file and scans all files in the current directory.

Step 4: View the results. After the scan, Semgrep will output a list of any issues found. These issues have details about their location and the rule that triggered them. Please note that the output format may vary depending on the version of Semgrep you are using and your specific configuration.

Here is an example output:

example output 1

Another possible output format:

example output 2

In addition to the previously mentioned output formats, Semgrep also supports JSON. Here's an example JSON output:

example output 3

To learn more about Semgrep’s output formats and how to customize them, you can refer to the official documentation on output formats.

Streamline your SAST scanning with Jit

There you have it - Semgrep is the future of static analysis, and when integrated with Jit, it's faster and more efficient than ever. With Jit, you can seamlessly incorporate Semgrep with Jit’s custom rules into your DevSecOps toolchain in the IDE and as part of the CI, increasing development velocity with continuous security. Start for free.