- Corpus and NLP for Text Analytics
- Crash course for R: Regular Expression
謝舒凱 Graduate Institute of Linguistics, NTU
BUT it is possible to use libraries written in those lower-level and hence faster languages, while writing your code in R and taking advantage of its functional programming style and its many other libraries for data analysis.
# works from CRAN ! install.packages("coreNLP") # wget http://nlp.stanford.edu/software/stanford-corenlp-full-2015-04-20.zip download.file("http://nlp.stanford.edu/software/stanford-corenlp-full-2015-04-20.zip") unzip("stanford-corenlp-full-2015-04-20.zip") library(coreNLP) initCoreNLP("stanford-corenlp-full-2015-04-20/") FB = c("Facebook is looking for new ways to get users to share more, rather than just consume content, in a push that seemingly puts it in more direct rivalry with Twitter.") output = annotateString(FB) getToken(output)[,c(1:3,6:7)] getParse(output) getDependency(output) getSentiment(output) getCoreference(output)
qdap: (Quantitative Discourse Analysis Package) is an R package designed to assist in quantitative discourse analysis.
comments <- read.table("perfumes_comments.csv", header = TRUE, sep = "\t", dec = ".", quote = "\"") summary(comments) # random rows of the data set x <- sample(nrow(comments), 10, replace = FALSE) comments[x,] strong <-grepl("strong",comments$Comment, ignore.case = TRUE) sum(strong)/nrow(comments) sweet <-grepl("sweet|soft",comments$Comment, ignore.case = TRUE) sum(sweet)/nrow(comments)