web - How to extract information of review period from a journal using R -


i want survey of review period of journal scientific reports, http://www.nature.com/srep/articles. want extract submitting time , accepting time of each article within time window (or recent 100 articles). there suggestion of how in r? solution can simple never used r web scraping before. hints can quite helpful.

here's can try

compile links in csv file change see in links srepid @ end, shown below :

> head(links)                                      links 1 http://www.nature.com/articles/srep20000 2 http://www.nature.com/articles/srep20001 3 http://www.nature.com/articles/srep20002 4 http://www.nature.com/articles/srep20003 5 http://www.nature.com/articles/srep20004 6 http://www.nature.com/articles/srep20005 

then run following code :

    library(rvest) links <- read.csv("link.csv",t,"~")    (i in 1:nrow(links)) {  url <- read_html(as.character(links[i,1]))  #upload  links[i,2] <- url %>%          html_node("dd:nth-child(2) time") %>%         html_text() %>%         as.character()  #accepted  links[i,3] <- url %>%    html_node("dd:nth-child(4) time") %>%   html_text() %>%   as.character()    }  colnames(links)[2] <- "received" colnames(links)[3] <- "accepted" 

you'll results :

    > head(links)                                      links         received         accepted 1 http://www.nature.com/articles/srep20000  15 october 2015 22 december 2015 2 http://www.nature.com/articles/srep20001  21 october 2015 22 december 2015 3 http://www.nature.com/articles/srep20002  20 october 2015 22 december 2015 4 http://www.nature.com/articles/srep20003 10 november 2015 22 december 2015 5 http://www.nature.com/articles/srep20004 15 november 2015 22 december 2015 6 http://www.nature.com/articles/srep20005 09 november 2015 22 december 2015 

note : maximum urls , longer time taken code completion . site doesn't allow botic action on pages , won't able give info without using alternate way .


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -