python - Issues while encoding, decoding arabic language in terminal -

- January 15, 2015

in script cosine similarity need first, convert arabic string vector before perform cosine similarity on terminal under linux --> problem while convert arabic string vector producing arabic as:

[u'\u0627\u0644\u0634\u0645\u0633 \u0645\u0634\u0631\u0642\u0647 \u0646\u0647\u0627\u0631\u0627', u'\u0627\u0644\u0633\u0645\u0627\u0621 \u0632\u0631\u0642\u0627\u0621']

my script:

train_set = ["السماء زرقاء", "الشمس مشرقه نهارا"] #documents test_set = ["الشمس التى فى السماء مشرقه","السماء زرقاء"] #query stopwords = set(stopwords.words('english'))  vectorizer = countvectorizer(stop_words = stopwords) transformer = tfidftransformer() trainvectorizerarray = vectorizer.fit_transform(train_set).toarray() testvectorizerarray = vectorizer.transform(test_set).toarray() print 'fit vectorizer train set', trainvectorizerarray print 'transform vectorizer test set', testvectorizerarray cx = lambda a, b : round(np.inner(a, b)/(la.norm(a)*la.norm(b)), 3)  vector in trainvectorizerarray:     print vector     testv in testvectorizerarray:         print testv         cosine = cx(vector, testv)         print cosine

your result list of strings, join string , clear sentence:

>>> print "\n".join(a) الشمس مشرقه نهارا السماء زرقاء

Search This Blog

Th

python - Issues while encoding, decoding arabic language in terminal -

Comments

Post a Comment

Popular posts from this blog

xslt - Substring before throwing error -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -