Как использовать частоты обратных документов в коде для наивных байесовских классификаций текста?
Вот мои правила обновления условной вероятности:
#Step7 : compute conditional probabilities
smoothingParameter=1.0 # Laplace smoothing
#create a table to store the conditional probabilities of a words given the labels
tableOfConditionals=np.zeros((len(priors), len(feature_words)))
# iterating over the labels
for i in range(len(priors)):
#find all examples with the given labels
tempDatas= np.array(vectorized_training)[np.array(y_train)==list(priors.keys())[i],:]
# sum the data matrix so that we get the count of each words in the ducuments
tempDatas= np.sum(tempDatas,axis=0)
#iterate over the feature words
for j in range(len(feature_words)):
tableOfConditionals[i,j]=(tempDatas[j]+smoothingParameter)/(np.sum(tempDatas)+smoothingParameter*len(feature_words))
#p(wi|cj) to be the probability of having the featured word wj given it is from class cj. I computed it by the formula
#(1+numberOfOccurenceOfWiinCj)/(totalNumberOfWordsInCj+numberOfFeatureWords).
Справка !!