Obfuscating Source Code from AI (in Java)
With the advent of AI, open source code is at risk of becoming a plaything for Copilot and other systems to play with however they choose (at least until the court systems get done playing out existing court cases).
Even with restrictive copy-left licenses, it leaves many of us with few resources to protect ourselves from thieving corporate entities.
Until today.
Mixing Bytecode with Source Code
So on a whim, I was checking out Java bytecode decompiling and was wondering if I could put decompiled code into my source code and if it would compile.
I mean, in THEORY it should be possible but I had never tried it. And if done properly, I could decompile the compiled source and then put back in the bytecode as the source on a different branch. Thus allowing two different branches :
- one with actual source code
- one with bytecode as source code
Then I could release the bytecode as source to confuse the AI.
First, How do I get the Bytecode from the ‘.class’ file?
There are alot of good tools out there but I’m a fan of the CLI:
java -classpath "asm.jar;asm-util.jar;yourjar.jar" org.objectweb.asm.util.Textifier org.domain.package.YourClass
or
java -classpath "asm.jar;asm-util.jar" org.objectweb.asm.util.ASMifier org/domain/package/YourClass.class
That should give you the bytecode for any class file in a compiled jar.
But how do I use it as Source Code?
Below is an example of some source I use for reading IP Addresses:
protected String getClientIpAddress() {
HttpServletRequest request = getRequest()
String[] IP_HEADER_CANDIDATES = [
"X-Forwarded-For",
"Proxy-Client-IP",
"WL-Proxy-Client-IP",
"HTTP_X_FORWARDED_FOR",
"HTTP_X_FORWARDED",
"HTTP_X_CLUSTER_CLIENT_IP",
"HTTP_CLIENT_IP",
"HTTP_FORWARDED_FOR",
"HTTP_FORWARDED",
"HTTP_VIA",
"REMOTE_ADDR"
];
for (String header : IP_HEADER_CANDIDATES) {
String ip = request.getHeader(header);
if (ip != null && ip.length() != 0 && !"unknown".equalsIgnoreCase(ip)) {
return ip;
}
}
return request.getRemoteAddr();
}
And here is the same code after replacing it with the compiled bytecode:
protected String getClientIpAddress() {
// Byte code:
// 0: aload_0
// 1: <illegal opcode> invoke : (Lio/beapi/api/service/SessionService;)Ljava/lang/Object;
// 6: <illegal opcode> cast : (Ljava/lang/Object;)Ljavax/servlet/http/HttpServletRequest;
// 11: astore_1
// 12: aload_1
// 13: pop
// 14: bipush #11
// 16: anewarray java/lang/String
// 19: dup
// 20: iconst_0
// 21: ldc 'X-Forwarded-For'
// 23: aastore
// 24: dup
// 25: iconst_1
// 26: ldc 'Proxy-Client-IP'
// 28: aastore
// 29: dup
// 30: iconst_2
// 31: ldc 'WL-Proxy-Client-IP'
// 33: aastore
// 34: dup
// 35: iconst_3
// 36: ldc 'HTTP_X_FORWARDED_FOR'
// 38: aastore
// 39: dup
// 40: iconst_4
// 41: ldc 'HTTP_X_FORWARDED'
// 43: aastore
// 44: dup
// 45: iconst_5
// 46: ldc 'HTTP_X_CLUSTER_CLIENT_IP'
// 48: aastore
// 49: dup
// 50: bipush #6
// 52: ldc 'HTTP_CLIENT_IP'
// 54: aastore
// 55: dup
// 56: bipush #7
// 58: ldc 'HTTP_FORWARDED_FOR'
// 60: aastore
// 61: dup
// 62: bipush #8
// 64: ldc 'HTTP_FORWARDED'
// 66: aastore
// 67: dup
// 68: bipush #9
// 70: ldc 'HTTP_VIA'
// 72: aastore
// 73: dup
// 74: bipush #10
// 76: ldc 'REMOTE_ADDR'
// 78: aastore
// 79: astore_2
// 80: aload_2
// 81: pop
// 82: aload_2
// 83: <illegal opcode> invoke : ([Ljava/lang/String;)Ljava/lang/Object;
// 88: <illegal opcode> cast : (Ljava/lang/Object;)Ljava/util/Iterator;
// 93: aconst_null
// 94: astore_3
// 95: astore #4
// 97: aload #4
// 99: ifnull -> 215
// 102: aload #4
// 104: invokeinterface hasNext : ()Z
// 109: ifeq -> 215
// 112: aload #4
// 114: invokeinterface next : ()Ljava/lang/Object;
// 119: <illegal opcode> cast : (Ljava/lang/Object;)Ljava/lang/String;
// 124: astore_3
// 125: aload_1
// 126: aload_3
// 127: <illegal opcode> invoke : (Ljavax/servlet/http/HttpServletRequest;Ljava/lang/String;)Ljava/lang/Object;
// 132: <illegal opcode> cast : (Ljava/lang/Object;)Ljava/lang/String;
// 137: astore #5
// 139: aload #5
// 141: pop
// 142: aload #5
// 144: aconst_null
// 145: invokestatic compareNotEqual : (Ljava/lang/Object;Ljava/lang/Object;)Z
// 148: ifeq -> 172
// 151: aload #5
// 153: <illegal opcode> invoke : (Ljava/lang/String;)Ljava/lang/Object;
// 158: iconst_0
// 159: invokestatic valueOf : (I)Ljava/lang/Integer;
// 162: invokestatic compareNotEqual : (Ljava/lang/Object;Ljava/lang/Object;)Z
// 165: ifeq -> 172
// 168: iconst_1
// 169: goto -> 173
// 172: iconst_0
// 173: ifeq -> 205
// 176: ldc 'unknown'
// 178: aload #5
// 180: <illegal opcode> invoke : (Ljava/lang/String;Ljava/lang/String;)Ljava/lang/Object;
// 185: <illegal opcode> cast : (Ljava/lang/Object;)Z
// 190: ifne -> 197
// 193: iconst_1
// 194: goto -> 198
// 197: iconst_0
// 198: ifeq -> 205
// 201: iconst_1
// 202: goto -> 206
// 205: iconst_0
// 206: ifeq -> 212
// 209: aload #5
// 211: areturn
// 212: goto -> 102
// 215: aload_1
// 216: <illegal opcode> invoke : (Ljavax/servlet/http/HttpServletRequest;)Ljava/lang/Object;
// 221: <illegal opcode> cast : (Ljava/lang/Object;)Ljava/lang/String;
// 226: areturn
// Line number table:
// Java source line number -> byte code offset
// #84 -> 0
// #85 -> 14
// #86 -> 82
// #87 -> 125
// #88 -> 142
// #89 -> 209
// #92 -> 215
// Local variable table:
// start length slot name descriptor
// 0 227 0 this Lio/beapi/api/service/SessionService;
// 12 215 1 request Ljavax/servlet/http/HttpServletRequest;
// 80 147 2 IP_HEADER_CANDIDATES [Ljava/lang/String;
// 95 120 3 header Ljava/lang/String;
// 139 73 5 ip Ljava/lang/String;
}
Believe it or not, this WILL COMPILE, pass all tests and give you the exact same output. Pretty cool huh? :)
Some Exceptions…
Now some code does not work as well as others and below are a list of what you should know when converting:
- generated closures : depending on the compiler you are using, closures are dynamically generated on a second pass (not the initial compilation) AFAIK. This means that closures don’t convert over well.
- static() : This is a dynamically generated class for you variables. Dont convert this over. It will be created for you.
- @Generated & @Internal : Methods with these annotations are also dynamically generated. Leave them alone.
In Closing
This is what I have so far and with this, most of us could create a script to automate this to allow conversion to a partial sourcecode/bytecode amalgam.
The added benefit aside from obfuscation from AI is that this also seems to speed up builds (but not too much); when I figure out how to deal with generated closures, I think that will speed it up more.