Я использую ExtraTreesClassifier
для обучения и прогнозирования. Я выполняю один и тот же исходный код для одного и того же набора данных в Windows 10 и Linux Ubuntu 16.04, и, к удивлению, я получаю огромную разницу во времени выполнения.
Результаты:
+---------------+-----------+----------+----------+---------+
| Dataset in Mo | Win Train | Win Pred | Ub Train | Ub Pred |
+---------------+-----------+----------+----------+---------+
| 430 | 104 | 11 | 2420 | 2019 |
+---------------+-----------+----------+----------+---------+
| 530 | 122 | 14 | 2948 | 2162 |
+---------------+-----------+----------+----------+---------+
| 699 | 140 | 18 | 3672 | 2500 |
+---------------+-----------+----------+----------+---------+
Примечание: время загрузки файла csv и создания dataFrame незначительно.
Исходный код:
import time
import pandas as pd
import datatable as dt
import numpy as np
from sklearn.ensemble import ExtraTreesClassifier
def __init__(self):
self.ExTrCl = ExtraTreesClassifier()
def train_with_dt(self, csv_file_path):
start_0_time = time.time()
data_arn = dt.fread(csv_file_path)
end_time = time.time()
print(" time Read_csv file : ",end_time-start_0_time," s")
data_classe = np.ravel(data_arn[:,"familyId"])
del data_arn[:,"familyId"]
start_time_train = time.time()
self.ExTrCl.fit(data_arn, data_classe)
end_time = time.time()
print(" train only time : ",end_time-start_time_train, " s")
def test_groupe_score_dt(self, test_matrix, list_classes):
start_0_time = time.time()
dt_dftest = dt.Frame(np.array(test_matrix),names=self.list_motifs)
end_time = time.time()
print(" time creatind Fram dt = ",end_time-start_0_time)
result = self.ExTrCl.predict(dt_dftest)
end_time = time.time()
print(" Time pred = ",end_time-start_0_time," s")
Информация об ОС и используемая версия библиотеки указаны в таблице ниже. Обновляю всю используемую библиотеку.
+---------------------------------------+-------------------------------------------+
| Windows 10 | Ubuntu 16.04 |
| Intel i7-8550U CPU @ 1.80Ghz 1.99Ghz | Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz |
| cpu cores : 4 | cpu cores : 1 |
| 64 bit OS | 64 bit OS |
| RAM 16 Go | RAM 1007 Go |
+---------------------------------------+-------------------------------------------+
| Python 3.7.7 | Python 3.5.2 |
| ----------------- | ------------- |
| biopython==1.77 | biopython==1.73 |
| datatable==0.11.0a0+pr2536.12 | datatable==0.10.1 |
| numpy==1.19.0 | numpy==1.18.5 |
| pandas==1.0.5 | pandas==0.24.2 |
| pyahocorasick==1.4.0 | pyahocorasick==1.4.0 |
| scikit-learn==0.23.1 | scikit-learn==0.22.2.post1 |
| scipy==1.5.0 | scipy==1.4.1 |
| suffix-trees==0.3.0 | suffix-trees==0.3.0 |
+---------------------------------------+-------------------------------------------+
используя cprofile:
1619734 function calls (1589052 primitive calls) in 6495.451 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
4828 6248.349 1.294 6248.349 1.294 {built-in method numpy.core.multiarray.array}
100 130.458 1.305 130.458 1.305 {method 'build' of 'sklearn.tree._tree.DepthFirstTreeBuilder' objects}
1 48.288 48.288 48.288 48.288 {built-in method datatable.lib._datatable.gread}
2 21.834 10.917 25.749 12.874 Main.py:40(get_matrix_nbOcrrs_listStr_AhoCorasick)
2 20.747 10.374 2570.626 1285.313 model.py:233(test_groupe_score_dt)
4365 6.476 0.001 6.476 0.001 {method 'reduce' of 'numpy.ufunc' objects}
1 5.851 5.851 6492.121 6492.121 Main.py:309(main)
6710 3.705 0.001 3.705 0.001 {method 'copy' of 'list' objects}
400 2.548 0.006 2.548 0.006 {method 'predict' of 'sklearn.tree._tree.Tree' objects}
1 2.288 2.288 6495.453 6495.453 Main.py:1()
1 1.334 1.334 3889.596 3889.596 model.py:189(train_with_dt)
400 0.827 0.002 3.628 0.009 _classes.py:880(predict_proba)
4 0.522 0.131 4936.793 1234.198 _forest.py:591(predict)
400 0.354 0.001 3.982 0.010 _forest.py:442(_accumulate_prediction)
376662 0.150 0.000 0.150 0.000 {method 'add_word' of 'ahocorasick.Automaton' objects}
803 0.120 0.000 0.120 0.000 {built-in method marshal.loads}
2272/2260 0.070 0.000 0.144 0.000 {built-in method builtins.__build_class__}
1081/1 0.069 0.000 6495.453 6495.453 {built-in method builtins.exec}
143/119 0.064 0.000 0.116 0.001 {built-in method _imp.create_dynamic}
2 0.046 0.023 0.046 0.023 {method 'make_automaton' of 'ahocorasick.Automaton' objects}
...etc
Спасибо за вашу помощь.