web - How to extract information of review period from a journal using R -
i want survey of review period of journal scientific reports, http://www.nature.com/srep/articles. want extract submitting time , accepting time of each article within time window (or recent 100 articles). there suggestion of how in r? solution can simple never used r web scraping before. hints can quite helpful.
here's can try
compile links in csv file change see in links srepid
@ end, shown below :
> head(links) links 1 http://www.nature.com/articles/srep20000 2 http://www.nature.com/articles/srep20001 3 http://www.nature.com/articles/srep20002 4 http://www.nature.com/articles/srep20003 5 http://www.nature.com/articles/srep20004 6 http://www.nature.com/articles/srep20005
then run following code :
library(rvest) links <- read.csv("link.csv",t,"~") (i in 1:nrow(links)) { url <- read_html(as.character(links[i,1])) #upload links[i,2] <- url %>% html_node("dd:nth-child(2) time") %>% html_text() %>% as.character() #accepted links[i,3] <- url %>% html_node("dd:nth-child(4) time") %>% html_text() %>% as.character() } colnames(links)[2] <- "received" colnames(links)[3] <- "accepted"
you'll results :
> head(links) links received accepted 1 http://www.nature.com/articles/srep20000 15 october 2015 22 december 2015 2 http://www.nature.com/articles/srep20001 21 october 2015 22 december 2015 3 http://www.nature.com/articles/srep20002 20 october 2015 22 december 2015 4 http://www.nature.com/articles/srep20003 10 november 2015 22 december 2015 5 http://www.nature.com/articles/srep20004 15 november 2015 22 december 2015 6 http://www.nature.com/articles/srep20005 09 november 2015 22 december 2015
note : maximum urls , longer time taken code completion . site doesn't allow botic action on pages , won't able give info without using alternate way .
Comments
Post a Comment