How to use Semgrep to Uncover Log4j Vulnerabilities

Aviram Shmueli writer profile image
By Aviram Shmueli
Jit Logo
Edited by Jit Team

Updated March 5, 2024.

how to use semrep to uncover log / vulnab

The 9th of December 2021 brought one of the most widespread vulnerabilities in the modern technological era. If your business was affected by the Log4j vulnerability, you likely remember this day for all the wrong reasons. This vulnerability allows arbitrary code execution on the Apache Log4j library, used by some of the biggest names in Tech. Think Apple, Microsoft, Cisco, IBM - the list goes on. 

Checkpoint recorded over 800,000 exploitation attempts within 72 hours after the Log4j vulnerability became public. The risks were so dangerous that the Common Vulnerability Scoring System (CVSS) ranked Log4j 10 out of 10 in severity. That’s the first time a 10/10 is NOT what you want to hear. 

But over a year has passed, and developers have been working overtime to develop solutions. Tools such as Semgrep play a vital role in today's operations, making it fast and efficient to identify bugs and dependency vulnerabilities while enforcing code standards. In this article, we'll talk about Log4j and how to use Semgrep to identify Log4j and similar vulnerabilities proactively.



What are Log4j vulnerabilities?

Apache Log4j is an open-source logging library developed by Apache that is widely used in Java-based applications. The library enables developers to seamlessly generate log files for debugging and monitoring in web applications.

The vulnerability named Log4Shell exists within this library in versions 2.0-beta9 through 2.15.0, meaning that any application running a vulnerable version of this library potentially allows attackers to execute arbitrary code remotely.



The danger of this vulnerability lies in the ease and nature of its exploitation. Attackers can exploit it remotely without gaining prior access to the web application by sending specially crafted log messages that trigger the code execution. Plus, numerous proof-of-concepts became available right after the first public release of this vulnerability, making it easy for anyone to exploit it. 

Attackers can then perform various malicious actions, such as data theft, system hijacking, and denial-of-service attacks. Given its potential for extensive harm, the severity of this vulnerability cannot be overstated, so identifying and addressing it is essential to prevent disastrous outcomes.

Using Semgrep to uncover Log4j vulnerabilities

Semgrep is an open-source static analysis tool that has existed since 2020. It's a source code scanner (SAST)  that finds security vulnerabilities in code. The tool analyzes code locally, unlike other SAST tools, making it ideal for developers who want to perform quick security checks during the development process.

Semgrep supports popular languages such as Java, Python, and JavaScript and contains pre-built rules that users may select from. Users can also create customized rules for specific use cases to evaluate the code against, as shown in the custom rule configuration below. 

The example below demonstrates the interface for creating custom rules for various languages in the Semgrep platform.



With Jit, you can easily integrate Semgrep into your CI/CD pipeline, automate your SAST scans, and detect vulnerabilities early in development, avoiding the disastrous consequences that come with the exploitation of Log4j and other vulnerabilities.

After the vulnerability’s public release, Semgrep’s community quickly added a new rule to its registry to detect Log4j-related vulnerabilities. In this tutorial, we guide you through using Semgrep to detect Log4j vulnerabilities and explain how Semgrep and Semgrep rules work. 

Setting up Semgrep

1. Install Semgrep

You can easily install Semgrep locally by installing the Semgrep CLI on either Mac, Linux, or Docker. If installing Semgrep locally is not an option, you can directly integrate Semgrep into the supported CI pipeline tools platforms such as GitHub Actions, GitLab CI/CD, Jenkins, Azure Pipelines, and more.

2. Setting up Semgrep rules

Once the installation or integration is complete, you can choose the rules from the rule registry or create custom rules in YAML for customized detection capabilities.

3. Run Semgrep

Once you've set up the rules, you can run Semgrep on your code base to detect vulnerabilities.

4. Review and fix issues

Once Semgrep completes its scan, you can review the results and fix any detected issues. Semgrep provides detailed reports highlighting the lines of code with detected vulnerabilities.

There are multiple ways that you can run Semgrep, depending on the requirement. This tutorial highlights the steps that you can use to deploy and use Semgrep locally.

Deploying Semgrep locally 

Install Semgrep by using the following command on a Linux terminal.

python3 -m pip install semgrep

Check the version of Semgrep you installed, which also verifies if there were any issues with the installation. 

semgrep --version

You can use the following command to run a quick scan using Semgrep on the source code stored locally (replace the path with the path to where your source code is held). 

semgrep --config=auto /home/lahiru/code/

This command assesses the code specified and provides the results locally.



How Semgrep rules work

Semgrep has a pre-built set of rules covering the most commonly found vulnerabilities. However, you can also create custom rules using YAML to customize the detection capabilities. 

In the following example, this rule uses a simple pattern that directly matches the System.out.println() method call with a single argument of any string literal. The $msg variable in the pattern will match any string literal argument passed to the method. This rule should generate a warning message for any System.out.println() usage in Java code.

rules:

  - id: system-out-println

    metadata:

      severity: warning

      description: "Avoid using System.out.println() in production code"

    pattern: |

      System.out.println($msg)

    languages:

      - java

Once you save this rule as a . YAML file, you can use the following command to run an assessment using your custom rule:

semgrep --config rule.yaml

Uncovering Log4j vulnerabilities with Semgrep

Using the Log4j vulnerability detection rule from the rule library, you can create a custom rule locally or use the command shown in the Semgrep CLI.



rules:

 - id: log4j2_tainted_argument

   patterns:

     - pattern: $LOGGER.$METHOD(...);

     - pattern-inside: |

         import org.apache.log4j.$PKG;

         ...

         Logger $LOGGER;

         ...

     - pattern-not: $LOGGER.$METHOD("...");

   message: log4j $LOGGER.$METHOD tainted argument

   languages:

     - java

   severity: WARNING

The custom rule definition above allows developers to look for a specific pattern that makes the code vulnerable to the Log4j vulnerability. The breakdown of this custom rule definition is as follows:

  1. “rules”: This is the start of the YAML block defining the rules to be executed by Semgrep.
  2. “Id”: This is the ID of the defined rule. It is a unique identifier for this particular rule.
  3. “Patterns”: This is a list of patterns that Semgrep will search for in the code.
  4. “Pattern: $LOGGER.$METHOD(...);”:  This pattern matches any Log4j method call (e.g., info, debug, error, etc.) on a Logger object (represented by $LOGGER and $METHOD).
  5. “Pattern-inside”: This field specifies that the main pattern (the previous one) must appear inside this pattern. Let’s analyze it:
  1. “import org.apache.log4j.$PKG;”: This line looks for an import statement that includes the log4j2 package ($PKG).
  2. “Logger $LOGGER;”: This line looks for the declaration of a logger object ($LOGGER).
  1. “Pattern-not: $LOGGER.$METHOD("...");”: This field specifies the code pattern that you want to exclude from your match. In this case, it means that if the code matches a pattern where any method with a constant string is called on any object (which is what $LOGGER.$METHOD("..."); stands for), it will not be flagged by the rule (as “Message”).  This message will be displayed if the pattern is found. It indicates that the logger object's method is used with potentially tainted arguments.
  2. “Languages”: This rule applies only to Java code.
  3. “Severity”: This sets the severity level of the rule to WARNING, which means that Semgrep will report it as a potential issue but not necessarily a critical security vulnerability.

If you create a custom YAML rule file locally, you can use the following command to run the assessment using the custom rule:

semgrep --config /home/lahiru/SemgrepTest/rule.yaml

No more Log4j chaos 

Log4j was a wake-up call for organizations to adopt a security-first mindset. Thankfully, despite the chaos and concern that Log4j caused, the Dev and security communities quickly developed adaptive solutions such as Semgrep’s. 

By orchestrating Semgrep with Jit and using Jit’s custom rules, you can easily incorporate SAST scans into your CI/CD pipeline, automate these, and get real-time notifications of vulnerabilities, so you can fix issues before they become a problem.

Start with Jit for free.