"A little progress everyday adds up to big results"

Sunday 11 November 2018

Fixing a Java bug at runtime without restarting the JVM process

    1. Have you missed something while developing a Java application, that resulted in a bug?
    2. Is that buggy application already running in production?
    3. Are you concerned about the impact of the scheduled (or unscheduled) downtime to deploy your fix?
    4. Do you want to fix the bug in production without redeploying or restarting the application?
    If your answers to all the above questions are "yes", then this post is for you! 
    Let us start with a sample (buggy) Java application

    package com.jrf.sampleapp;
    
    import java.io.InputStreamReader;
    import java.lang.String;
    import java.util.Scanner;
    
    public class SimpleCalculator {
     private static String add(long addend, long augend) {
      return "Sum is " + (addend+augend);
     }
    
     private static String subtract(long minuend, long subtrahend) {
      return "Difference is " + (minuend-subtrahend);
     }
    
     private static String multiply(long multiplier, long multiplicand) {
      return "Product is " + (multiplier*multiplicand);
     }
    
     private static String divide(long dividend, long divisor) {
      return "Ratio is " + (dividend/divisor);
     }
    
     public static void main(String[] args) {
      long input1, input2;
      String command;
    
      Scanner scanner = new Scanner(new InputStreamReader(System.in));
      command = scanner.next();
    
      while(!command.equalsIgnoreCase("exit")) {
       input1 = scanner.nextLong();
       input2 = scanner.nextLong();
    
       switch(command.toLowerCase()) {
        case "add":
         System.out.println(add(input1, input2));
         break;
    
        case "subtract":
         System.out.println(subtract(input1, input2));
         break;
    
        case "multiply":
         System.out.println(multiply(input1, input2));
         break;
    
        case "divide":
         System.out.println(divide(input1, input2));
         break;
    
        default:
         System.out.println("Use \"add x y | subtract x y | multiply x y | divide x y | exit\"");
         break;
       }
    
       command = scanner.next();
      }
     }
    }
    
    
    This is a simple console Java application to perform the basic binary mathematical operations repeatedly until the "exit" command is issued. When we run this, we get the following output:

    Sum is 11
    subtract 3 4
    Difference is -1
    multiply 8 7
    Product is 56
    divide 12 3
    Ratio is 4
    divide 100 0
    Exception in thread "main" java.lang.ArithmeticException: / by zero
     at com.jrf.sampleapp.SimpleCalculator.divide(SimpleCalculator.java:21)
     at com.jrf.sampleapp.SimpleCalculator.main(SimpleCalculator.java:49)
    
    
    Oops!!! The divisor zero check is missing in the "divide" method, resulting in unexpected termination of the application with an ArithmeticException. Now, assume that you had realized your mistake after deploying the application, but, before someone issued a zero divisor. Could you have avoided the mishap without bringing your application down? Perhaps you could have!!! Here is how. 

    Basically, we want to dynamically patch the Java class at runtime, with absolute zero downtime. We all know that a JVM executes bytecode resulting from a compilation. So, we are trying to modify the bytecode at runtime. Here comes our Java's superhero, Instrumentation, to the rescue!  

    Instrumentation is a way to inject/modify bytecode at runtime. Instrumentation can be done either at the time of loading a class or after it is loaded. Here we are going to use the latter since our application's main class "SimpleCalculator" has already been loaded. Since a lot of resources on Java Instrumentation are available online, let us directly jump into writing an agent to fix the bug.

    package com.jrf.agent;
    
    import java.io.ByteArrayInputStream;
    import java.lang.instrument.ClassFileTransformer;
    import java.security.ProtectionDomain;
    
    import javassist.ClassPool;
    import javassist.CtClass;
    import javassist.CtMethod;
    
    public class JavaRuntimeFixApplier implements ClassFileTransformer {
     public byte[] transform(ClassLoader loader, String className,
       Class classBeingRedefined, ProtectionDomain protectionDomain,
       byte[] classfileBuffer) {
      byte[] byteCode = classfileBuffer;
    
      try {
       byteCode = retransform(className, byteCode);
      } catch (Throwable e) {
       e.printStackTrace();
      }
    
      return byteCode;
     }
    
     private byte[] retransform(String className, byte[] original) throws Exception {
      byte[] modifiedByteCode = original;
    
      switch(className) {
       case "com/jrf/sampleapp/SimpleCalculator":
        ClassPool classPool = ClassPool.getDefault();
        CtClass ctClass = classPool.makeClass(new ByteArrayInputStream(modifiedByteCode));
        CtMethod method = ctClass.getDeclaredMethod("divide", 
         new CtClass[] {CtClass.longType, CtClass.longType});
     
        method.insertBefore("if ($2 == 0) {" +
           "return \"Cannot divide by zero\";" +
              "}");
     
        modifiedByteCode = ctClass.toBytecode();
        ctClass.detach();
        break;
      }
    
      return modifiedByteCode;
     }
    }
    
    
    This is the main transformer class, which performs the bytecode modification. To compile the new source code

    if ($2 == 0) {
     return "Cannot divide by zero";
    }
    
    
    and generate the corresponding bytecode, we have used the Javassist library. The transformer code is pretty self-explanatory. If the fully qualified class path matches "com/jrf/sampleapp/SimpleCalculator", we are inserting the above code at the beginning of the "divide" method that takes 2 long arguments. The "$2" in the above code represents the 2nd argument ("divisor") in the "divide" method. The main Agent Class, that applies this transformer goes like this:

    package com.jrf.agent;
    
    import java.lang.instrument.Instrumentation;
    import java.lang.instrument.UnmodifiableClassException;
    
    public class JavaRuntimeFixerAgent {
     public static void agentmain(String args, Instrumentation instrumentation) {
      try {
       addTransformers(args, instrumentation);
      } catch (Exception e) {
       e.printStackTrace();
      }
     }
    
     private static void addTransformers(String args, Instrumentation instrumentation) 
      throws ClassNotFoundException, UnmodifiableClassException {
    
      instrumentation.addTransformer(new JavaRuntimeFixApplier(), true);
       
      if (!instrumentation.isRetransformClassesSupported()) {
       System.out.println("Class retransformation is not supported in this version of JVM");
       return;
      }
    
      Class clazz = Class.forName("com.jrf.sampleapp.SimpleCalculator");
      instrumentation.retransformClasses(clazz);
     }
    }
    
    
    The important thing to note here is that some JVMs do not support class bytecode redefinition once they are loaded. In such cases, this approach would not work.

    These 2 classes need to be packaged into a JAR before it is loaded into the target application's memory. The JAR's manifest file looks like

    Agent-Class: com.jrf.agent.JavaRuntimeFixerAgent
    Can-Redefine-Classes: true
    Can-Retransform-Classes: true
    Class-Path: javassist.jar
    
    
    Modifying the classes at runtime requires the "Can-Redefine-Classes" and "Can-Retransform-Classes" properties to be true. The first line defines the class, whose "agentmain" method is where to start executing the agent from.

    Now that we have the application running and the agent to fix the application handy. What next!? To apply the fix by loading the agent in the application's JVM. In order to do so, let us define a loader class as follows

    package com.jrf.loader;
    
    import com.sun.tools.attach.AgentInitializationException;
    import com.sun.tools.attach.AgentLoadException;
    import com.sun.tools.attach.AttachNotSupportedException;
    import com.sun.tools.attach.VirtualMachine;
    import java.io.*;
    
    public class JavaRuntimeFixLoader {
     public static void main(String[] args) throws AgentInitializationException, 
      AgentLoadException, AttachNotSupportedException, IOException {
    
      if (args.length != 2) {
       System.out.println("Usage: java JavaRuntimeFixLoader  ");
       System.exit(-1);
      }
      
      VirtualMachine jvm = VirtualMachine.attach(args[1]);
      jvm.loadAgent(args[0]);
      jvm.detach();
     }
    }
    
    
    Just run this loader with either the absolute path to the agent JAR or the agent JAR's path relative to the target application and the target application's process id. If you do not see any error in both the loader and the target application, congratulations! Your fix is successfully (and magically) patched at runtime. Let us test it.

    add 5 6
    Sum is 11
    subtract 3 4
    Difference is -1
    multiply 8 7
    Product is 56
    divide 12 3
    Ratio is 4
    divide 100 0
    Cannot divide by zero
    divide 100 10
    Ratio is 10
    exit
    

    Using this approach, at runtime, one can 
    • Overwrite methods 
    • Perform argument checks at the beginning of methods 
    • Surround a method under a try-catch block 
    • Overwrite complete classes 
    • and do many more... 
    all without recompiling, redeploying or restarting the original application. 

    The complete source code of the example explained here is available at https://github.com/sureshprakash/JavaRuntimeFixer. In one terminal run the "make sampleapp" command to compile and run the sample calculator application. After this, in another terminal run the "make instrument" command that will compile the agent and apply the necessary fix at runtime.

    Let us appreciate the power of Java's Instrumentation!

    No comments:

    Post a Comment