Первые итерации теста, вероятно, используют огромные страницы (2 МБ страницы) из-за THP: Прозрачная огромная страница - https://www.kernel.org/doc/Documentation/vm/transhuge.txt -
проверьте / sys / kernel / mm / transparent_hugepage / enabled и grep AnonHugePages /proc/meminfo
во время выполнения теста.
Причина, по которой приложения работают быстрее, состоит в том, что
факторы. Первый фактор практически не имеет значения, и это не
представляет значительный интерес, потому что это также будет иметь обратную сторону
требуется большая копия страницы ясной страницы в ошибках страницы, которая является
потенциально отрицательный эффект. Первый фактор заключается в принятии
ошибка одной страницы для каждого 2M виртуального региона, затронутого пользователем (так
уменьшение частоты ядра входа / выхода в 512 раз). это
имеет значение только при первом обращении к памяти в течение жизни
отображение памяти.
Распределение огромных объемов памяти с помощью new
или malloc
обслуживается одним системным вызовом mmap
, который обычно не «заполняет» виртуальную память физическими страницами, отметьте man mmap
вокруг MADV_POPULATE:
MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping. ... This will help
to reduce blocking on page faults later.
Эта память только что зарегистрирована mmap (без MAP_POPULATE) как виртуальная, и доступ к записи запрещен в таблице страниц. Когда ваш тест пытается выполнить первую запись на любую страницу памяти, исключение сбоя страницы генерируется и обрабатывается ядром ОС. Ядро Linux выделит некоторую физическую память и отобразит виртуальную страницу в физическую (заполняет страницу). При включенном THP (он часто включен) ядро может выделить одну огромную страницу размером 2 МБ , если у нее есть несколько свободных огромных физических страниц. Если нет свободных огромных страниц, ядро выделит страницу размером 4 КБ. Таким образом, без огромных страниц у вас будет в 512 раз больше ошибок страниц (это можно проверить, введя vmstat 1 180
в другой консоли во время выполнения теста или perf stat -I 1000
).
Следующие обращения к заполненным страницам не будут иметь сбоев страниц, поэтому вы можете расширить тест с помощью второго (третьего) цикла for i in (0..N-1): a[i] = 1;
и измерить время обоих циклов.
Ваши результаты все еще звучат странно. Ваша система реальна или виртуализирована? Гипервизоры могут поддерживать страницы размером 2 МБ, а виртуальным системам может потребоваться гораздо больше средств для выделения памяти и обработки исключений.
На моем ПК с меньшим объемом памяти у меня наблюдается примерно 10% замедления, когда сбои страниц переключаются с огромного выделения страниц на выделение страниц размером 4 КБ (проверьте page-faults
строки с perf stat
- было около 2 тысяч сбоев страниц в секунду со страницами объемом 2 МБ и с ошибками> 200 тысяч страниц со страницами размером 4 КБ):
$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
$ perf stat -I1000 ./a.out
Iteration 0
Time to malloc: 8.10623e-06
Time to fill with data: 0.364378
Fill rate with data: 274.44 Mints/sec, 1097.76Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 400Mbytes
Iteration 1
Time to malloc: 1.90735e-05
Time to fill with data: 0.357983
Fill rate with data: 279.343 Mints/sec, 1117.37Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 800Mbytes
Iteration 2
Time to malloc: 1.69277e-05
# time counts unit events
1.000414902 999.893040 task-clock (msec)
1.000414902 1 context-switches # 0.001 K/sec
1.000414902 0 cpu-migrations # 0.000 K/sec
1.000414902 2,024 page-faults # 0.002 M/sec
1.000414902 2,664,963,857 cycles # 2.665 GHz
1.000414902 3,072,781,834 instructions # 1.15 insn per cycle
1.000414902 559,551,437 branches # 559.611 M/sec
1.000414902 25,176 branch-misses # 0.00% of all branches
Time to fill with data: 0.357014
Fill rate with data: 280.101 Mints/sec, 1120.4Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 1200Mbytes
Iteration 3
Time to malloc: 1.71661e-05
Time to fill with data: 0.358964
Fill rate with data: 278.579 Mints/sec, 1114.32Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 1600Mbytes
Iteration 4
Time to malloc: 1.69277e-05
Time to fill with data: 0.356918
Fill rate with data: 280.177 Mints/sec, 1120.71Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 2000Mbytes
Iteration 5
Time to malloc: 1.50204e-05
2.000779126 1000.703872 task-clock (msec)
2.000779126 1 context-switches # 0.001 K/sec
2.000779126 0 cpu-migrations # 0.000 K/sec
2.000779126 2,280 page-faults # 0.002 M/sec
2.000779126 2,686,072,244 cycles # 2.685 GHz
2.000779126 3,094,777,285 instructions # 1.16 insn per cycle
2.000779126 563,593,105 branches # 563.425 M/sec
2.000779126 9,661 branch-misses # 0.00% of all branches
Time to fill with data: 0.371785
Fill rate with data: 268.973 Mints/sec, 1075.89Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 2400Mbytes
Iteration 6
Time to malloc: 1.90735e-05
Time to fill with data: 0.418562
Fill rate with data: 238.913 Mints/sec, 955.653Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 2800Mbytes
Iteration 7
Time to malloc: 2.09808e-05
3.001146481 1000.436128 task-clock (msec)
3.001146481 1 context-switches # 0.001 K/sec
3.001146481 0 cpu-migrations # 0.000 K/sec
3.001146481 217,415 page-faults # 0.217 M/sec
3.001146481 2,687,783,783 cycles # 2.687 GHz
3.001146481 3,100,713,038 instructions # 1.16 insn per cycle
3.001146481 560,207,049 branches # 560.014 M/sec
3.001146481 83,230 branch-misses # 0.01% of all branches
Time to fill with data: 0.416297
Fill rate with data: 240.213 Mints/sec, 960.853Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 3200Mbytes
Iteration 8
Time to malloc: 1.38283e-05
Time to fill with data: 0.41672
Fill rate with data: 239.969 Mints/sec, 959.877Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 3600Mbytes
Iteration 9
Time to malloc: 1.40667e-05
Time to fill with data: 0.424997
Fill rate with data: 235.296 Mints/sec, 941.183Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 4000Mbytes
Iteration 10
Time to malloc: 1.28746e-05
4.001467773 1000.378604 task-clock (msec)
4.001467773 2 context-switches # 0.002 K/sec
4.001467773 0 cpu-migrations # 0.000 K/sec
4.001467773 232,690 page-faults # 0.233 M/sec
4.001467773 2,655,313,682 cycles # 2.654 GHz
4.001467773 3,087,157,016 instructions # 1.15 insn per cycle
4.001467773 557,266,313 branches # 557.070 M/sec
4.001467773 95,433 branch-misses # 0.02% of all branches
Time to fill with data: 0.413271
Fill rate with data: 241.972 Mints/sec, 967.888Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 4400Mbytes
Iteration 11
Time to malloc: 1.21593e-05
Time to fill with data: 0.414624
Fill rate with data: 241.182 Mints/sec, 964.73Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 4800Mbytes
Iteration 12
Time to malloc: 1.5974e-05
5.001792272 1000.372602 task-clock (msec)
5.001792272 2 context-switches # 0.002 K/sec
5.001792272 0 cpu-migrations # 0.000 K/sec
5.001792272 236,260 page-faults # 0.236 M/sec
5.001792272 2,687,340,230 cycles # 2.686 GHz
5.001792272 3,134,864,968 instructions # 1.17 insn per cycle
5.001792272 565,846,287 branches # 565.644 M/sec
5.001792272 104,634 branch-misses # 0.02% of all branches
Time to fill with data: 0.412331
Fill rate with data: 242.524 Mints/sec, 970.094Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 5200Mbytes
Iteration 13
Time to malloc: 1.3113e-05
Time to fill with data: 0.414433
Fill rate with data: 241.294 Mints/sec, 965.174Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 5600Mbytes
Iteration 14
Time to malloc: 1.88351e-05
Time to fill with data: 0.417277
Fill rate with data: 239.649 Mints/sec, 958.596Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 6000Mbytes
6.002129544 1000.404270 task-clock (msec)
6.002129544 1 context-switches # 0.001 K/sec
6.002129544 0 cpu-migrations # 0.000 K/sec
6.002129544 215,269 page-faults # 0.215 M/sec
6.002129544 2,676,269,667 cycles # 2.675 GHz
6.002129544 3,286,469,282 instructions # 1.23 insn per cycle
6.002129544 578,367,266 branches # 578.156 M/sec
6.002129544 345,470 branch-misses # 0.06% of all branches
....
После отключения THP с помощью команды root из https://access.redhat.com/solutions/46111 У меня всегда ~ 200 тысяч сбоев страниц в секунду и около 950 МБ / с:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
$ perf stat -I1000 ./a.out
Iteration 0
Time to malloc: 1.50204e-05
Time to fill with data: 0.422322
Fill rate with data: 236.786 Mints/sec, 947.145Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 400Mbytes
Iteration 1
Time to malloc: 1.50204e-05
Time to fill with data: 0.415068
Fill rate with data: 240.924 Mints/sec, 963.698Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 800Mbytes
Iteration 2
Time to malloc: 2.19345e-05
# time counts unit events
1.000162191 999.429856 task-clock (msec)
1.000162191 14 context-switches # 0.014 K/sec
1.000162191 0 cpu-migrations # 0.000 K/sec
1.000162191 232,727 page-faults # 0.233 M/sec
1.000162191 2,664,896,604 cycles # 2.666 GHz
1.000162191 3,080,713,267 instructions # 1.16 insn per cycle
1.000162191 555,116,838 branches # 555.434 M/sec
1.000162191 102,262 branch-misses # 0.02% of all branches
Time to fill with data: 0.440695
Fill rate with data: 226.914 Mints/sec, 907.658Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 1200Mbytes
Iteration 3
Time to malloc: 2.09808e-05
Time to fill with data: 0.414463
Fill rate with data: 241.276 Mints/sec, 965.104Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 1600Mbytes
Iteration 4
Time to malloc: 1.81198e-05
2.000544564 1000.142465 task-clock (msec)
2.000544564 16 context-switches # 0.016 K/sec
2.000544564 0 cpu-migrations # 0.000 K/sec
2.000544564 229,697 page-faults # 0.230 M/sec
2.000544564 2,621,180,984 cycles # 2.622 GHz
2.000544564 3,041,358,811 instructions # 1.15 insn per cycle
2.000544564 547,910,242 branches # 548.027 M/sec
2.000544564 93,682 branch-misses # 0.02% of all branches
Time to fill with data: 0.428383
Fill rate with data: 233.436 Mints/sec, 933.744Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 2000Mbytes
Iteration 5
Time to malloc: 1.5974e-05
Time to fill with data: 0.421986
Fill rate with data: 236.975 Mints/sec, 947.899Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 2400Mbytes
Iteration 6
Time to malloc: 1.5974e-05
Time to fill with data: 0.413477
Fill rate with data: 241.851 Mints/sec, 967.406Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 2800Mbytes
Iteration 7
Time to malloc: 1.88351e-05
3.000866438 999.980461 task-clock (msec)
3.000866438 20 context-switches # 0.020 K/sec
3.000866438 0 cpu-migrations # 0.000 K/sec
3.000866438 231,194 page-faults # 0.231 M/sec
3.000866438 2,622,484,960 cycles # 2.623 GHz
3.000866438 3,061,610,229 instructions # 1.16 insn per cycle
3.000866438 551,533,361 branches # 551.616 M/sec
3.000866438 104,561 branch-misses # 0.02% of all branches
Time to fill with data: 0.448333
Fill rate with data: 223.048 Mints/sec, 892.194Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 3200Mbytes
Iteration 8
Time to malloc: 1.50204e-05
Time to fill with data: 0.410566
Fill rate with data: 243.566 Mints/sec, 974.265Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 3600Mbytes
Iteration 9
Time to malloc: 1.3113e-05
4.001231042 1000.098860 task-clock (msec)
4.001231042 17 context-switches # 0.017 K/sec
4.001231042 0 cpu-migrations # 0.000 K/sec
4.001231042 228,532 page-faults # 0.229 M/sec
4.001231042 2,586,146,024 cycles # 2.586 GHz
4.001231042 3,026,679,955 instructions # 1.15 insn per cycle
4.001231042 545,236,541 branches # 545.284 M/sec
4.001231042 115,251 branch-misses # 0.02% of all branches
Time to fill with data: 0.441442
Fill rate with data: 226.53 Mints/sec, 906.121Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 4000Mbytes
Iteration 10
Time to malloc: 1.5974e-05
Time to fill with data: 0.42898
Fill rate with data: 233.111 Mints/sec, 932.445Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 4400Mbytes
Iteration 11
Time to malloc: 2.00272e-05
5.001547227 999.982415 task-clock (msec)
5.001547227 19 context-switches # 0.019 K/sec
5.001547227 0 cpu-migrations # 0.000 K/sec
5.001547227 225,796 page-faults # 0.226 M/sec
5.001547227 2,560,990,918 cycles # 2.561 GHz
5.001547227 3,005,384,743 instructions # 1.15 insn per cycle
5.001547227 542,275,580 branches # 542.315 M/sec
5.001547227 116,537 branch-misses # 0.02% of all branches
Time to fill with data: 0.414212
Fill rate with data: 241.422 Mints/sec, 965.689Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 4800Mbytes
Iteration 12
Time to malloc: 1.69277e-05
Time to fill with data: 0.411084
Fill rate with data: 243.259 Mints/sec, 973.037Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 5200Mbytes
Iteration 13
Time to malloc: 1.40667e-05
Time to fill with data: 0.413644
Fill rate with data: 241.754 Mints/sec, 967.015Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 5600Mbytes
Iteration 14
Time to malloc: 1.28746e-05
6.001849796 999.913923 task-clock (msec)
6.001849796 18 context-switches # 0.018 K/sec
6.001849796 0 cpu-migrations # 0.000 K/sec
6.001849796 236,912 page-faults # 0.237 M/sec
6.001849796 2,685,445,660 cycles # 2.686 GHz
6.001849796 3,153,464,551 instructions # 1.20 insn per cycle
6.001849796 568,989,467 branches # 569.032 M/sec
6.001849796 125,943 branch-misses # 0.02% of all branches
Time to fill with data: 0.444891
Fill rate with data: 224.774 Mints/sec, 899.097Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 6000Mbytes
Тест, модифицированный для статистики производительности с частотой печати и ограниченным количеством итераций:
$ cat test.c; g++ test.c
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <string.h>
#include <iostream>
#include <vector>
using namespace std;
double getWallTime()
{
struct timeval time;
if (gettimeofday(&time, NULL))
{
return 0;
}
return (double)time.tv_sec + (double)time.tv_usec * .000001;
}
#define M 1000000
int main()
{
int *a;
int n = 100000000;
int j;
double total = 0;
for(j=0; j<15; j++)
{
cout << "Iteration " << j << endl;
double start = getWallTime();
a = new int[n];
cout << "Time to malloc: " << getWallTime() - start << endl;
for (int i = 0; i < n; i++)
{
a[i] = 1;
}
double elapsed = getWallTime()-start;
cout << "Time to fill with data: " << elapsed << endl;
cout << "Fill rate with data: " << n/elapsed/M << " Mints/sec, " << n*sizeof(int)/elapsed/M << "Mbytes/sec" << endl;
total += n*sizeof(int)*1./M;
cout << "Allocated " << n*sizeof(int)*1./M << " Mbytes, with total memory allocated " << total << "Mbytes" << endl;
}
return 0;
}
Тест изменен для второго и третьего доступа к записи
$ g++ second.c -o second
$ cat second.c
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <string.h>
#include <iostream>
#include <vector>
using namespace std;
double getWallTime()
{
struct timeval time;
if (gettimeofday(&time, NULL))
{
return 0;
}
return (double)time.tv_sec + (double)time.tv_usec * .000001;
}
#define M 1000000
int main()
{
int *a;
int n = 100000000;
int j;
double total = 0;
for(j=0; j<15; j++)
{
cout << "Iteration " << j << endl;
double start = getWallTime();
a = new int[n];
cout << "Time to malloc: " << getWallTime() - start << endl;
for (int i = 0; i < n; i++)
{
a[i] = 1;
}
double elapsed = getWallTime()-start;
cout << "Time to fill with data: " << elapsed << endl;
cout << "Fill rate with data: " << n/elapsed/M << " Mints/sec, " << n*sizeof(int)/elapsed/M << "Mbytes/sec" << endl;
start = getWallTime();
for (int i = 0; i < n; i++)
{
a[i] = 2;
}
elapsed = getWallTime()-start;
cout << "Time to second write access of data: " << elapsed << endl;
cout << "Access rate of data: " << n/elapsed/M << " Mints/sec, " << n*sizeof(int)/elapsed/M << "Mbytes/sec" << endl;
start = getWallTime();
for (int i = 0; i < n; i++)
{
a[i] = 3;
}
elapsed = getWallTime()-start;
cout << "Time to third write access of data: " << elapsed << endl;
cout << "Access rate of data: " << n/elapsed/M << " Mints/sec, " << n*sizeof(int)/elapsed/M << "Mbytes/sec" << endl;
total += n*sizeof(int)*1./M;
cout << "Allocated " << n*sizeof(int)*1./M << " Mbytes, with total memory allocated " << total << "Mbytes" << endl;
}
return 0;
}
Без THP - около 1,25 ГБ / с для второго и третьего доступа:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
$ ./second
Iteration 0
Time to malloc: 9.05991e-06
Time to fill with data: 0.426387
Fill rate with data: 234.529 Mints/sec, 938.115Mbytes/sec
Time to second write access of data: 0.318292
Access rate of data: 314.177 Mints/sec, 1256.71Mbytes/sec
Time to third write access of data: 0.321722
Access rate of data: 310.827 Mints/sec, 1243.31Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 400Mbytes
Iteration 1
Time to malloc: 3.50475e-05
Time to fill with data: 0.411859
Fill rate with data: 242.802 Mints/sec, 971.206Mbytes/sec
Time to second write access of data: 0.317989
Access rate of data: 314.476 Mints/sec, 1257.91Mbytes/sec
Time to third write access of data: 0.321637
Access rate of data: 310.91 Mints/sec, 1243.64Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 800Mbytes
Iteration 2
Time to malloc: 2.81334e-05
Time to fill with data: 0.411918
Fill rate with data: 242.767 Mints/sec, 971.067Mbytes/sec
Time to second write access of data: 0.318647
Access rate of data: 313.827 Mints/sec, 1255.31Mbytes/sec
Time to third write access of data: 0.321041
Access rate of data: 311.487 Mints/sec, 1245.95Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 1200Mbytes
Iteration 3
Time to malloc: 2.5034e-05
Time to fill with data: 0.411138
Fill rate with data: 243.227 Mints/sec, 972.909Mbytes/sec
Time to second write access of data: 0.318429
Access rate of data: 314.042 Mints/sec, 1256.17Mbytes/sec
Time to third write access of data: 0.321332
Access rate of data: 311.205 Mints/sec, 1244.82Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 1600Mbytes
Iteration 4
Time to malloc: 3.71933e-05
Time to fill with data: 0.410922
Fill rate with data: 243.355 Mints/sec, 973.421Mbytes/sec
Time to second write access of data: 0.320262
Access rate of data: 312.244 Mints/sec, 1248.98Mbytes/sec
Time to third write access of data: 0.319223
Access rate of data: 313.261 Mints/sec, 1253.04Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 2000Mbytes
Iteration 5
Time to malloc: 2.19345e-05
Time to fill with data: 0.418508
Fill rate with data: 238.944 Mints/sec, 955.777Mbytes/sec
Time to second write access of data: 0.320419
Access rate of data: 312.092 Mints/sec, 1248.37Mbytes/sec
Time to third write access of data: 0.319752
Access rate of data: 312.742 Mints/sec, 1250.97Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 2400Mbytes
Iteration 6
Time to malloc: 3.19481e-05
Time to fill with data: 0.410054
Fill rate with data: 243.87 Mints/sec, 975.481Mbytes/sec
Time to second write access of data: 0.320244
Access rate of data: 312.262 Mints/sec, 1249.05Mbytes/sec
Time to third write access of data: 0.319546
Access rate of data: 312.944 Mints/sec, 1251.78Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 2800Mbytes
Iteration 7
Time to malloc: 3.19481e-05
Time to fill with data: 0.409491
Fill rate with data: 244.206 Mints/sec, 976.822Mbytes/sec
Time to second write access of data: 0.318501
Access rate of data: 313.971 Mints/sec, 1255.88Mbytes/sec
Time to third write access of data: 0.320052
Access rate of data: 312.449 Mints/sec, 1249.8Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 3200Mbytes
Iteration 8
Time to malloc: 2.5034e-05
Time to fill with data: 0.409922
Fill rate with data: 243.949 Mints/sec, 975.795Mbytes/sec
Time to second write access of data: 0.320583
Access rate of data: 311.932 Mints/sec, 1247.73Mbytes/sec
Time to third write access of data: 0.319478
Access rate of data: 313.011 Mints/sec, 1252.04Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 3600Mbytes
Iteration 9
Time to malloc: 2.69413e-05
Time to fill with data: 0.41104
Fill rate with data: 243.285 Mints/sec, 973.141Mbytes/sec
Time to second write access of data: 0.320389
Access rate of data: 312.121 Mints/sec, 1248.48Mbytes/sec
Time to third write access of data: 0.319762
Access rate of data: 312.733 Mints/sec, 1250.93Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 4000Mbytes
Iteration 10
Time to malloc: 2.59876e-05
Time to fill with data: 0.412612
Fill rate with data: 242.358 Mints/sec, 969.434Mbytes/sec
Time to second write access of data: 0.318304
Access rate of data: 314.165 Mints/sec, 1256.66Mbytes/sec
Time to third write access of data: 0.319453
Access rate of data: 313.035 Mints/sec, 1252.14Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 4400Mbytes
Iteration 11
Time to malloc: 2.98023e-05
Time to fill with data: 0.412428
Fill rate with data: 242.467 Mints/sec, 969.866Mbytes/sec
Time to second write access of data: 0.318467
Access rate of data: 314.004 Mints/sec, 1256.02Mbytes/sec
Time to third write access of data: 0.319716
Access rate of data: 312.778 Mints/sec, 1251.11Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 4800Mbytes
Iteration 12
Time to malloc: 2.69413e-05
Time to fill with data: 0.410515
Fill rate with data: 243.597 Mints/sec, 974.386Mbytes/sec
Time to second write access of data: 0.31832
Access rate of data: 314.149 Mints/sec, 1256.6Mbytes/sec
Time to third write access of data: 0.319569
Access rate of data: 312.921 Mints/sec, 1251.69Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 5200Mbytes
Iteration 13
Time to malloc: 2.28882e-05
Time to fill with data: 0.412385
Fill rate with data: 242.492 Mints/sec, 969.967Mbytes/sec
Time to second write access of data: 0.318929
Access rate of data: 313.549 Mints/sec, 1254.2Mbytes/sec
Time to third write access of data: 0.31949
Access rate of data: 312.999 Mints/sec, 1252Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 5600Mbytes
Iteration 14
Time to malloc: 2.90871e-05
Time to fill with data: 0.41235
Fill rate with data: 242.512 Mints/sec, 970.05Mbytes/sec
Time to second write access of data: 0.340456
Access rate of data: 293.724 Mints/sec, 1174.89Mbytes/sec
Time to third write access of data: 0.319716
Access rate of data: 312.778 Mints/sec, 1251.11Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 6000Mbytes
С THP - немного более быстрое распределение, но одинаковая скорость второго и третьего доступа:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
$ ./second
Iteration 0
Time to malloc: 1.50204e-05
Time to fill with data: 0.365043
Fill rate with data: 273.94 Mints/sec, 1095.76Mbytes/sec
Time to second write access of data: 0.320503
Access rate of data: 312.01 Mints/sec, 1248.04Mbytes/sec
Time to third write access of data: 0.319442
Access rate of data: 313.046 Mints/sec, 1252.18Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 400Mbytes
...
Iteration 14
Time to malloc: 2.7895e-05
Time to fill with data: 0.409294
Fill rate with data: 244.323 Mints/sec, 977.293Mbytes/sec
Time to second write access of data: 0.318422
Access rate of data: 314.049 Mints/sec, 1256.19Mbytes/sec
Time to third write access of data: 0.322098
Access rate of data: 310.465 Mints/sec, 1241.86Mbytes/sec
Allocated 400 Mbytes, with total memory allocated 6000Mbytes