I have written the following code which reads a csv file that contains a bunch of words and their sentiment value. Words like abandon may have a value of -1, while words like progress and freedom have a value of +1. So, the csv files acts as a database while we ask user for a txt file containing a speech or an essay to compare.
After reading all the sentiment values (ranging from -1 to +1) the code attempts to build a histogram. We have five categories, i.e, "Negative", "Weakly Negative", "Neutral", "Weakly Positive", "Positive". We map how frequently certain numbers appear and put them in their respective category.
My questions are,
- Can I make this code run any faster?
- Can I make this code more any smaller?
import numpy as np;
import matplotlib.pyplot as plot
lexiconSentiment = np.genfromtxt("list_of_lexicon_value.csv", delimiter = ',', dtype = [('f0', 'S24'), ('f1', '<f8')])
userInput = input("Enter the file-name: ")
textFileIntoArray = np.genfromtxt(userInput, delimiter = ' ', dtype = 'str')
booleanValues = np.in1d(lexiconSentiment['f0'], textFileIntoArray)
listOfSentimentNumberValue = []
the size of booleanValue
for x in range(0,booleanValues.size):
if booleanValues[x]:
listOfSentimentNumberValue.append(lexiconSentiment['f1'][x])
print lexiconSentiment['f0'][x]
xLabelDescription = ["Negative", "Weakly Negative", "Neutral", "Weakly Positive", "Positive"]
plot.xlabel("Sentiment"); plot.ylabel("Percent of Words")
plot.hist(listOfSentimentNumberValue, bins = (-1.0,-0.5,0.0,0.5,1.0,1.5), color = 'blue', range = (0.0, 0.0), normed = True)
label_pos = [-0.75,-0.25,0.25,0.75,1.25]
plot.xticks(label_pos, xLabelDescription)