У меня есть набор данных временных рядов показаний датчиков. Цель состоит в том, чтобы предсказать CoodinateID.
Вот как выглядит набор данных:
Столбцы: TimeStamp, X, Y, Z, Magnitude и CoordinateID.
There are 33937 samples in my dataset.
I dropped the timestamp column and used X, Y, Z and Magnitudes as features and CoordinateID as labels.
# Import train_test_split function
from sklearn.model_selection import train_test_split
X=df[['X', 'Y', 'Z', 'Magnitude']] # Features
y=df['CoordinateID'] # Labels
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
I get an accuracy of 0.012865841681398546
when I used RandomForestClassifier just for experiment purposes.
Another approach that i tried was to create Neural Net from scratch using tensorflow and keras (Sequential) but the accuracy is stuck and hardly ever cross 14%
EDIT:
# Import train_test_split function
from sklearn.model_selection import train_test_split
X=df[['X', 'Y', 'Z', 'Magnitude']] # Features
y=df['CoordinateID'] # Labels
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
output:
(23756, 4) (10182, 4) (23756,) (10182,)
#Import Random Forest Model
from sklearn.ensemble import RandomForestClassifier
#Create a Gaussian Classifier
clf=RandomForestClassifier(n_estimators=200)
#Train the model using the training sets y_pred=clf.predict(X_test)
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy1:",metrics.accuracy_score(y_test, y_pred))
output:
Accuracy1: 0.012865841681398546
What am I missing here? What other approaches or improvements should I go for to get a good model performance?
EDIT:
Here's what my distribution of CoordinateID looks like:
введите описание изображения здесь