Почему мой классификатор дает низкую точность? - PullRequest
0 голосов
/ 03 августа 2020

У меня есть набор данных временных рядов показаний датчиков. Цель состоит в том, чтобы предсказать CoodinateID.

Вот как выглядит набор данных:

Столбцы: TimeStamp, X, Y, Z, Magnitude и CoordinateID.

enter image description here

There are 33937 samples in my dataset. I dropped the timestamp column and used X, Y, Z and Magnitudes as features and CoordinateID as labels.

# Import train_test_split function
from sklearn.model_selection import train_test_split

X=df[['X', 'Y', 'Z', 'Magnitude']]  # Features
y=df['CoordinateID']  # Labels

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) 

I get an accuracy of 0.012865841681398546 when I used RandomForestClassifier just for experiment purposes.

Another approach that i tried was to create Neural Net from scratch using tensorflow and keras (Sequential) but the accuracy is stuck and hardly ever cross 14%

EDIT:

# Import train_test_split function
from sklearn.model_selection import train_test_split

X=df[['X', 'Y', 'Z', 'Magnitude']]  # Features
y=df['CoordinateID']  # Labels

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) 

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

output:

(23756, 4) (10182, 4) (23756,) (10182,)


#Import Random Forest Model
from sklearn.ensemble import RandomForestClassifier

#Create a Gaussian Classifier
clf=RandomForestClassifier(n_estimators=200)

#Train the model using the training sets y_pred=clf.predict(X_test)
clf.fit(X_train,y_train)

y_pred=clf.predict(X_test)

#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy1:",metrics.accuracy_score(y_test, y_pred))

output:

Accuracy1: 0.012865841681398546

What am I missing here? What other approaches or improvements should I go for to get a good model performance?

EDIT: Here's what my distribution of CoordinateID looks like: введите описание изображения здесь

...