License check by scancode-toolkit


In this post, we introduce a method of automating the process of Python codes license scanning by using:

  • Scancode-toolkit: To perform code license scanning.
  • Jenkins: Run this test automatically based on changes of code on target branch.


Prepare environment

Before we start, make sure the system satisfy below mentioned requirements:

Create Jenkins job

From home page of Jenkins:

  • Click "New Item"
  • Enter a name of the new job, for example: "Check-License-Scan-Code"
  • Choose "Freestyle project" then click "OK" to go to next steps


Configure Jenkins job

After clicking "OK" button in the previous step, we will see a GUI that contains steps to configure Jenkins job. We will go over step by step.

  • In "General" step, we should add a description for this Jenkins job and configure log rotation to prevent too many logs.


  • In "Source Code Management" step, you have to do following actions:
    • Enter your repository (From GitHub, GitLab, Bitbucket, Backlog, .etc)
    • Add credentials that can pull codes from a remote branch and select it.
    • Provide a branch that you want to check (In this example, I want to scan stage branch).

Besides, there are many other additional behaviors that you can add more. For example, your project has submodules and need to scan them as well, you can add "Advanced sub-modules behaviors" and do configure. Our project has sub-modules so I added this behavior and do recursively.


  • In "Build Triggers" step, there are many options that you can run this job but I will choose "Poll SCM". This will allow me to configure to run this job like a crontab in Linux system. For example, I will run this job once an hour (Using Poll SCM then the job will run if there is a new change of code only). In "Build Environment" step, you can choose some options for building like "Delete workspace before build starts", Add timestamps to the Console Output", .etc. It's up to you. My suggestion is that you should add "Add timestamps to the Console Output" then you can know the build time when checking the output log again.


  • "Build" step: Because I installed bandit on Jenkins server, so I will choose "Execute shell" for this step. I will use bandit to scan all source folder from git repository and try to find if there is any "High Severity" item. If there is any item, I will use "exit 1" to mark that this build is failed. Otherwise, the build is success. Then we can use "Post-build Actions" to define response actions with each type of status.

Here is content of script that I will run:

mkdir -p $result_dir
export LD_LIBRARY_PATH=/usr/lib64
/home/jenkins/toolkits/scancode-toolkit-3.1.1/scancode --license --copyright --summary-with-details --processes 2 --json-pp $result_dir/result.json --html $result_dir/result.html $target_dir
JSON=`cat $result_dir/result.json`
echo $JSON | tr '\r\n' ' ' | jq '.files[].licenses[].matched_rule.licenses | join(",")' > $parsed_file


while IFS= read line
	licenses=$(echo $line | sed "s/\"//g" | tr "," "\n")
	for license in $licenses
	    if [[ "$license" !=  *"$gpl"* ]]; then
	if [[ $sub_license_valid_count == 0 ]]; then
done <"$parsed_file"

if [ $invalid_license_count -gt 0 ];then
    exit 1
    exit 0


"Post-build Actions" will support us to define actions that we want to do when the build is failed. There are many actions that we can choose. Following my opinion, I mostly use "E-mail Notification" then I will know when this job is failed to check and make it works as it should be. All's done! Now you can click "Save" and then click "Build Now" to build this job. If it's failed, you can go to the failed build and check "Console Output" to find out the reason and fix this. Otherwise, you can sleep well because there is no high severity issues on your code.

Here is an example of an output that you can see on "Console Output" on a build item (You can see it when the job is running, it will be loaded in real-time).


Thanks for your reading to the end of this post!