Python and Jenkins

jenkins_logo

..
NOTE from the future:
Whoops, this was written and forgot in the depths of my drafts. So I just reviewed it and said, hey, It is acceptable to post 😀
..

The third (and probably last) post about this lovely Jenkins guy. It seems that people is right, lately I’m Jenkins man.

Most of what I do in Jenkins can be done with the Groovy Scripting language itself, usually via the Scriptler plugin to keep things organized.

I am a command line guy, and sometimes I just want to get a plain text file with the results for something, instead of firing up Jenkins, going to a build, checking the artifact or output.

In this post I’ll present how to combine a basic Groovy script, with a more in-depth analysis with a python script:

Objective

I have to manage 400+ Jobs every day. Some days are sunshine and ice creams and no job failed. Other days the planets align and I have a stupid amount of failed jobs. Going through 200+ failing jobs one by one checking what is wrong with them is too slow, even more because most of those errors will be the same, and most of them will be known errors.

What I want to get every morning is a small report with:

  • Top errors
  • Failed jobs with the found errors
  • Jobs that need manual check

Plan

Rome wasn’t built in a day, and similarly, the process that I follow took some time to form. I will have two main scripts:

  • A groovy script that classifies all the jobs in: Failed, succeeded, not run.
  • A python script to check the errors.

The groovy script works in Jenkins itself, the Python script will get the result of that groovy script and check all those failed jobs.

Groovy!

For the groovy script I use the Scriptler plugin. It is an easy way to manage your Jenkins scripts.

My groovy script has to loop all the jobs and start filtering from there. The build urls will be stored in a different list according to the build status. To loop all the jobs:

import hudson.model.*

//List storage
def failed  = []
def succeed = []
def errors  = []
def notrun  = []

for(item in Hudson.instance.items) {}

We’ll discard the disabled builds, just continuing the loop

  //Inside the loop
  if (item.disabled){
    //disabled << item.getAbsoluteUrl()
    continue
  }

What I am insterested in are the Nightly Jobs. In my company we only trigger jobs in three different ways: manually, git change, time trigger. The time triggers are configured to run around midnight, and I am insterested in those builds because they are configured to run everything. For that, I will loop through all the jobs builds until I find the first one that is time triggered:

 for (build in item.builds){
    try{
      def cause = build.getEnvironment()['BUILD_CAUSE']
      if (cause == "TIMERTRIGGER"){
          //we'll do stuff here
      }
    }
    catch(Exception e){
        //something bad happened that we have to handle
    }
 }

It is not a very good practice to catch all Exception, this could clearly be more focused on the type of error that I want to catch.

I also have to be careful. Some jobs are not disabled, but don’t run nightly anymore (this is dirty, but it can happen). So I will actually check when did this build happened. And if it’s too old, I’ll classify it as Not run.

  //if it's time trigger
  //Get the time difference
  def aday  = new TimeDuration( 1, 0, 0, 0, 0 )
  def cdate = new Date()
  TimeDuration timeDiff = TimeCategory.minus( cdate, build.getTimestamp().time )
  if (timeDiff > aday){
      //save it as not run
      notrun << build.getAbsoluteUrl()
  }

At this Level I already filtered everything that I wanted to filter. So, if the build is enabled, the cause is a TIMETRIGGER, and the build is not older than one day, I can classify this output:

  //Classify the build
  else{
      if (build.result == hudson.model.Result.SUCCESS){
        succeed << build.getAbsoluteUrl()
      }
      else{
        failed << build.getAbsoluteUrl()
      }
  }

When the loop is over, what I will do is simply print the output lists line
by line. So the result will be something like:

**********
OK(350)
**********
jenkins-build-ok-url-1
jenkins-build-ok-url-2
..
jenkins-build-ok-url-350

**********
FAIL(50)
**********
jenkins-build-fail-url-1
jenkins-build-fail-url-2
..
jenkins-build-fail-url-50

**********
NOTRUN(10)
**********
jenkins-build-nrun-url-1
jenkins-build-nrun-url-2
..
jenkins-build-nrun-url-10

**********
ERROR(5)
**********
jenkins-build-err-url-1
jenkins-build-err-url-2
..
jenkins-build-err-url-5

This is a clear and clean output. It is more or less parse friendly and easy to read. Could be more parseable if there was simply a job per line with its status, but then it would be more difficult to read. With this script I can just have a quick look to what failed and what succeeded.

The full script is here (collapsed). With basic comments and Scriptler metadata:

/*** BEGIN META {
  "name" : "List status of Nightly jobs",
  "comment" : "For all our nightly runs (the only ones triggered by a TIMERTRIGGER) classify according to their last result",
  "parameters" : [],
  "core": "1.549",
  "authors" : [
    { name : "Jordi Castells" }
  ]
} END META**/

/*
*
* List Nightly jobs in Jenkins separed by its status as following:
* - OK       Enabled jobs with last build successful
* - FAIL     Enabled jobs with last build failed
* - ERROR    Jobs that generated an error in the script
*
* @author Jordi Castells
*
*/
import jenkins.model.*
import hudson.model.*
import hudson.triggers.*
import groovy.time.* 

def failed  = []
def succeed = []
def errors  = []
def notrun  = []
  
ji = jenkins.model.Jenkins.instance
def aday  = new TimeDuration( 1, 0, 0, 0, 0 )
def cdate = new Date()

for(item in Hudson.instance.items) {

  //DO not care about disabled items
  if (item.disabled){
    //disabled << item.getAbsoluteUrl()
    continue
  }
 
  for (build in item.builds){
    try{
      def cause = build.getEnvironment()['BUILD_CAUSE']
      if (cause == "TIMERTRIGGER"){
        TimeDuration timeDiff = TimeCategory.minus( cdate, build.getTimestamp().time )
        if (timeDiff > aday){
            notrun << build.getAbsoluteUrl()
        }
        else{
            if (build.result == hudson.model.Result.SUCCESS){
              succeed << build.getAbsoluteUrl()
            }
            else{
              failed << build.getAbsoluteUrl()
            }
        }

        break;
      }
    }
    catch(Exception e){
      errors << item
      break;
    }
  }
}


def print_list(title, list){
  println ""
  println "*"*10
  println title + "(" + list.size() + ")"
  println "*"*10
  println ""
  for (item in list) println item
}

print_list("OK", succeed)
print_list("FAIL", failed)
print_list("NOTRUN", notrun)
print_list("ERROR", errors)


From this is quite easy to get the failed list, and even the number of failed/succeeded in case I want to create a nice graph.

Jenkins Job

This groovy script can be added as a Jenkins job that runs every morning after the nightly runs have finished. In my case I set it up to run at 8am every day, just before I get to the office.

Python time

Now we have Jenkins ready, with a list of failing jobs. For the next task, I locally create a script that will parse the outputs of all this failing jobs looking for errors.

For this I use python 2.7 and the library jenkinsapi

>> Parse the Jenkins job output

To start, what we’ll do is create JenkinsAPI instance and call a nice function that will return us a list of builds to work with.

#J is the Jenkins instance
J = Jenkins("https://jenkins.whatever.net/")
#Get a list with Jenkins build instances
failed_builds = get_nightly_jobs_builds("failed")

What this get_nightly_jobs_builds does is:

  • Connect to Jenkins
  • Retrieve last output of the nightlyJobs groovy job
  • Parse that output separing failed, succeeded etc
  • Return a list with builds URLs

*Connect to Jenkins*
This is fairly easy given the jenkinsapi

job = J[args.jenkins_job] #the jenkins job url

*Retrieve last output*
Again very easy, we’ll get this using the jenkinsapi function.

lb = job.get_last_build().get_console()

*Parse the output*
We know that the output is ordered and organized. When I find a line called FAIL(number) I know that the following urls are the urls pointing to the failed builds.

My approach here is a simple state machine. We’ll change the state when we find any of the specific type lines, and add the urls to a different list. Mostly the same that we’ve done already in the groovy script.

First, I define some regular expressions. Matching a name (FAIL,OK,ERROR,NOTRUN) followed by a number [0-9*]. For more information on regular expression just check the python documents.

ko_re   = re.compile("FAIL\(([0-9]*)\)")
ok_re   = re.compile("OK\(([0-9]*)\)")
err_re  = re.compile("ERROR\(([0-9]*)\)")
nrun_re = re.compile("NOTRUN\(([0-9]*)\)")

We’ll use the regular expressions to change the state of the function:

    STATE = "STATELESS"

    for l in lb.split("\n"):
        ko_match   = ko_re.match(l.strip())
        ok_match   = ok_re.match(l.strip())
        err_match  = err_re.match(l.strip())
        nrun_match = nrun_re.match(l.strip())

        if ko_match:
            STATE = "KO"
        if ok_match:
            STATE = "OK"
        if err_match:
            STATE = "ERR"
        if nrun_match:
            STATE = "NRUN"

Notice that I strip the line. Even if it’s not necessary in this case it’s a good practice to sanitize our input.

When the state is set we know what to do when finding a url. Add it to the list of failed/succeeded/whatever urls:

    STATE = "STATELESS"

    for l in lb.split("\n"):
        if l.startswith("https"):
            if STATE == "OK":
                succeed_jobs.append(l)
            if STATE == "KO":
                failed_jobs.append(l)
            if STATE == "NRUN":
                notrun_jobs.append(l)

        #state setting here

There is not much to do apart from the other function that will create Jenkins build instances from the urls. For that we have to parse the URL to obtain the job name and the job build. Having this two pieces of information it’s quite easy to get the build instance using jenkinsapi:

    try:
        job = J[jobname]
    except jenkinsapi.custom_exceptions.UnknownJob as UJ :
        print "[SCRIPT ERROR] UNKNOWN JOB %s" % jobname
        continue
    
    try:
        build = job.get_build(buildnum)
    except:
        print "[SCRIPT ERROR] UNKNOWN BUILD %s:%s" % (jobname, buildnum)

The full code for what is explained in this section is in the following collapsed set. You will notice that this is not exactly it’s written in the previous examples since I add caches to avoid reconnecting or regenerating the build instances. For a single run this won’t be a problem, but if I want to get the list multiple times connecting multiple times to Jenkins will make this script very slow. Note also that this is not the cleanest code ever 😀 it just did its job.

# Global variables for cache of this function
__get_nightly_jobs_builds_cache = {}
def get_nightly_jobs_builds(switch = "all"):
    """ Returns a list with job builds objects. Can be filtered
        with the switch parameter: failed, succeed, notrun or all

        It uses the jenkins-X17-NightlyJobsSummary last build console
        to retrieve the list of failed jobs
    """
    global __get_nightly_jobs_builds_cache

    # RETRIEVE FROM CACHE
    if (switch in __get_nightly_jobs_builds_cache.keys()):
        return __get_nightly_jobs_builds_cache[switch]


    # FILL the cache
    for cache_switch in ("succeed", "failed", "notrun"):
        job_urls  = get_nightly_jobs_builds_urls(cache_switch)
        buildlist = []

        for job in job_urls:
            jobname = job.split("/")[4]
            buildnum = int(job.split("/")[5])

            try:
                job = J[jobname]
            except jenkinsapi.custom_exceptions.UnknownJob as UJ :
                print "[SCRIPT ERROR] UNKNOWN JOB %s" % jobname
                continue

            try:
                build = job.get_build(buildnum)
            except:
                print "[SCRIPT ERROR] UNKNOWN BUILD %s:%s" % (jobname, buildnum)
            buildlist.append(build)


        # SAVE IN CACHE
        __get_nightly_jobs_builds_cache[cache_switch] = buildlist

    failed  = __get_nightly_jobs_builds_cache["succeed"]
    succeed = __get_nightly_jobs_builds_cache["failed"]
    notrun  = __get_nightly_jobs_builds_cache["notrun"]
    __get_nightly_jobs_builds_cache["all"] = failed + succeed + notrun

    # RETURN FROM CACHE
    return __get_nightly_jobs_builds_cache[switch]


# Global variables for cache of this function
__get_nightly_jobs_builds_urls_cache = {}
def get_nightly_jobs_builds_urls(switch = "all"):
    """ Returns a list with job builds url. By default all the jobs, but
        can be filtered with switch: failed, succeed, notrun, or all

        It uses the jenkins-X17-NightlyJobsSummary last build console
        to retrieve the list of failed jobs
    """

    global __get_nightly_jobs_builds_urls_cache

    # RETRIEVE FROM CACHE
    if (switch in __get_nightly_jobs_builds_urls_cache.keys()):
        return __get_nightly_jobs_builds_urls_cache[switch]

    job = J[args.jenkins_job]
    lb = job.get_last_build().get_console()

    ko_re = re.compile("FAIL\(([0-9]*)\)")
    ok_re = re.compile("OK\(([0-9]*)\)")
    err_re = re.compile("ERROR\(([0-9]*)\)")
    nrun_re = re.compile("NOTRUN\(([0-9]*)\)")


    # READ THE LAST NIGHTLY failed_jobs_urls
    STATE = "STATELESS"

    failed_jobs  = []
    succeed_jobs = []
    notrun_jobs  = []
    for l in lb.split("\n"):
        if l.startswith("https"):
            if STATE == "OK":
                succeed_jobs.append(l)
            if STATE == "KO":
                failed_jobs.append(l)
            if STATE == "NRUN":
                notrun_jobs.append(l)


        ko_match  = ko_re.match(l.strip())
        ok_match  = ok_re.match(l.strip())
        err_match = err_re.match(l.strip())
        nrun_match = nrun_re.match(l.strip())

        if ko_match:
            STATE = "KO"
        if ok_match:
            STATE = "OK"
        if err_match:
            STATE = "ERR"
        if nrun_match:
            STATE = "NRUN"

    if switch == "failed":
        urllist = failed_jobs
    if switch == "succeed":
        urllist = succeed_jobs
    if switch == "notrun":
        urllist = notrun_jobs
    else:
        urllist = succeed_jobs + failed_jobs + notrun_jobs


    # SAVE IN CACHE
    __get_nightly_jobs_builds_urls_cache["failed"]  = failed_jobs
    __get_nightly_jobs_builds_urls_cache["succeed"] = succeed_jobs
    __get_nightly_jobs_builds_urls_cache["notrun"]  = notrun_jobs
    __get_nightly_jobs_builds_urls_cache["all"]     = failed_jobs + succeed_jobs

    return urllist

>> Check for errors

Now we have a list of failed builds. We have to parse the output for each of build searching for known errors so I can focus on the unknown problems.

For that, We need to define a set of know errors. This set will grow over time, and it will be defined in an external file errcodes.txt. What this file defines is a code for each error and a basic regular expression to match it.

For example, for checking for common Sonar errors, I would use something like:

#
#
# This file contains the definitions of encountered Jenkins jobs errors and a
# regex to match
#
#
SONAR_LIBDOWNLOAD_FAIL - "Can not execute SonarQube analysis: Fail to download libraries from server"
SONAR_MISSING_DEP - "Can not execute SonarQube analysis: Plugin .*? or one of its dependencies could not be resolved"
#More examples

The generic idea here is to use this regular expressions to match lines in the output. Will it take some time? yes. Does it work? yes :-D. It is quite a brute force approach, but I’ll let Jenkins handle it while I’m still brushing my teeth at home.

The errors file has to be read into memory to have all the regular expressions ready. What I’ll do is store everything under a python dictionary with: ID -> regex.

def read_errors_regex_dictionary(filename):
    errdict = {}
    with open(filename) as errfile:
        for line in errfile:
            if line.startswith("#"):
                continue
            name, data = line.split("-",1)
            name = name.rstrip().lstrip()
            regex_str = data[data.find("\"")+1:data.rfind("\"")]
            regex = re.compile(regex_str)
            errdict[name] = regex

    return errdict

With this ready, it’s just a matter of looping each line of each build looking for the defined errors and organize as you see fit. My script does various things:

  • Keep track of top appearing errors
  • Keep track of Which errors appear where
  • Keep track of failed errors with no matching error
    # INIT COUNTERS
    # Dictionary with the count of how many jobs have this error
    ERROR_COUNTER = {}
    for errname in errors_regex.keys():
        ERROR_COUNTER[errname] = 0

    # Dictionary with a list of the job errors
    JOB_ERRORS    = {}
    for build in failed_builds:
        jobname = build.job.name
        JOB_ERRORS[jobname] = []

    # CHECK THE JOB CONSOLE FOR COMMON ERRORS
    for build in failed_builds:
        jobname  = build.job.name
        bconsole = build.get_console()
    
        for line in bconsole.split("\n"):
            for errid,regex in errors_regex.items():
                if regex.search(line):
                    # If the regex matches, add this as an error for that job
                    # And increase the total count
                    if errid not in JOB_ERRORS[jobname]:
                        JOB_ERRORS[jobname].append(errid)
                        ERROR_COUNTER[errid] += 1

This is quite easy and you can add different behaviours now that we have the error matchers and the build logs for all the failed jobs.

What next?

We have the two scripts ready and willing to run. There is a little bit of complexity but not a lot.

What you can do next is organize the python script as it fits you better. In my case I wanted a directory with failure reports for each day, each report into its own file. For that I’ve written a small bash script that runs my python script every morning.

You can embed this script into Jenkins itself, or even write it alltogether in a same Groovy script for Scriptler.

Now, I spend 5 minutes every morning to know what is wrong and which projects need attention. Instead of 5 hours chasing Jenkins builds, making my life easier.

Conclusion

Seems like a lot of text, but this gives an example how Jenkins is easily manageable i very different environment. This case, Groovy and Python. I still think that Jenkins gives you a lot of power with Groovy, but I prefer the simpler external APIs that are quite clear and mostly used for querying results or firing builds.

Advertisements

4 Comments

Filed under code, tools

4 responses to “Python and Jenkins

  1. raju

    Hi,

    this is really nice post

    and i’m looking to find the cause of the jenkins job failures from console using groovy script.

  2. Great information, and I am trying to do something very similar, without the groovy portion. Unfortunately, I am new to both python and Jenkins. It looks like the puzzle is missing some pieces. Would appreciate some hints offline or via LinkedIN http://www.linkedin.com/in/michaelresnickitpro. Thanks !

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s