Похоже, что String.intern () не очень хорошо масштабируется, когда вы добавляете больше строк. Похоже, O (n) с количеством строк в пуле.
Random rand = new Random();
for(int i=0;i<100;i++) {
long start = System.nanoTime();
for(int j=0;j<100000;j++)
Long.toString(rand.nextLong()).toString().intern();
long time = System.nanoTime() - start;
System.out.printf("Took %,d ns on average to intern() a random string%n", time/100000);
}
печать
Took 1,586 ns on average to intern() a random string
Took 3,843 ns on average to intern() a random string
Took 7,551 ns on average to intern() a random string
Took 13,436 ns on average to intern() a random string
Took 20,226 ns on average to intern() a random string
Took 27,609 ns on average to intern() a random string
Took 35,098 ns on average to intern() a random string
Took 42,439 ns on average to intern() a random string
Took 50,801 ns on average to intern() a random string
Took 20,975 ns on average to intern() a random string
Took 4,634 ns on average to intern() a random string
Took 10,512 ns on average to intern() a random string
Took 16,914 ns on average to intern() a random string
Took 23,601 ns on average to intern() a random string
Took 30,230 ns on average to intern() a random string
Took 36,184 ns on average to intern() a random string
Took 43,266 ns on average to intern() a random string
Вместо этого я использую массив в качестве пула строк.
private static void testHashArray(String[] strings2, int size) {
String[] pool = new String[size];
int hit=0, miss=0;
long start2 = System.nanoTime();
for (String s : strings2) {
int hash = (s.hashCode() & 0x7fffffff) % pool.length;
String s2 = pool[hash];
if (s.equals(s2)) {
hit++;
} else {
miss++;
}
if (s2 != s)
pool[hash] = s;
}
long time2 = System.nanoTime() - start2;
System.out.printf("Hash size: %,d took %.3f second. Hit/miss %,d/%,d %n", size, time2 / 1e9, hit, miss);
}
public static void main(String... args) {
Random rand = new Random();
// a million unique strings.
String[] strings = new String[1000 * 1000];
for (int i = 0; i < strings.length; i++)
strings[i] = String.valueOf(rand.nextLong());
// random selection of Strings
String[] strings2 = new String[10 * 1000 * 1000];
int totalSize = 0;
for (int i = 0; i < strings2.length; i++) {
int idx = (int) Math.pow(strings.length, rand.nextFloat());
String s = strings[idx];
strings2[i] = s;
totalSize += s.length() + 16; // with overhead
}
System.out.printf("Original size %,d%n", totalSize);
Set<String> uniqueStrings = Collections.newSetFromMap(new IdentityHashMap<String, Boolean>());
uniqueStrings.addAll(Arrays.asList(strings2));
System.out.printf("Unique strings %,d%n", uniqueStrings.size());
long start = System.nanoTime();
HashMap<String,String> map = new HashMap();
for(String s: strings2)
map.put(s,s);
long time = System.nanoTime() - start;
System.out.printf("Took %.3f second to map strings%n", time/1e9);
testHashArray(strings2, 10192);
testHashArray(strings2, 101929);
testHashArray(strings2, 1019291);
}
печать
Original size 353,293,201
Unique strings 766,222
Took 0.979 second to map strings
Hash size: 10,192 took 0.357 second. Hit/miss 5,213,210/4,786,790
Hash size: 101,929 took 0.309 second. Hit/miss 7,202,094/2,797,906
Hash size: 1,019,291 took 0.254 second. Hit/miss 8,789,382/1,210,618
Если выполнение интерна происходит медленно, как насчет выполнения его после загрузки в фоновом потоке. После загрузки сервера вы можете интернировать () строки при обнаружении дубликата.
Вам действительно нужно сэкономить 130 МБ? Я знаю, это звучит великолепно, но будет ли память использоваться для чего-то еще?
Если вам нужна более быстрая форма для intern (), вы можете использовать массив фиксированного размера.