搜索
查看: 2099|回复: 0

[R] 根据pubmed文章ID来提取内容并输出成word cloud

[复制链接]

633

主题

1182

帖子

4030

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
4030
发表于 2017-7-10 09:11:38 | 显示全部楼层 |阅读模式
很有趣,值得玩一下。
[AppleScript] 纯文本查看 复制代码
##--------------------------------------------------------
## Given a list of PMIDs get their annotation
## Aedin, Dec 2011 (In response to student question on CCCB BioC Course)
## To Run given pmids2tagcloud a list of pmids eg
## pmids=c(10521349, 10582678, 11004666, 11108479, 11108479, 11114790, 11156382, 11156382, 11156382, 11165872)
## pmids2tagcloud(pmids)
## ---------------------------------------------------------
getPMIDAnnot<-function(pmidlist) {
        require(annotate)
        require(XML)
        print("Using annotate and XML to get info on each PMID")
        pubmedRes<-xmlRoot(pubmed(pmidlist))
        numAbst <- length(xmlChildren(pubmedRes))
        absts <- list()
        for (i in 1:numAbst) {
                absts[[i]] <- buildPubMedAbst(pubmedRes[[i]])
                }

        #unlist(lapply(absts, function(x) authors(x)[1]))

        ## Write Output to PMIDInfo
        PMIDInfo<-data.frame(matrix(NA, nrow=length(pmidlist)))
        PMIDInfo$FirstAuthor= unlist(lapply(absts, function(x) authors(x)[1]))
        PMIDInfo$Journal= unlist(lapply(absts, function(x) journal(x)[1]))
        PMIDInfo$pubDate= unlist(lapply(absts, function(x) pubDate(x)[1]))
        PMIDInfo$articleTitle= unlist(lapply(absts, function(x) articleTitle(x)[1]))
        PMIDInfo$abstText= unlist(lapply(absts, function(x) abstText(x)[1]))
        PMIDInfo$PubMedID= unlist(lapply(absts, function(x) pmid(x)[1]))
        rownames(PMIDInfo) =PMIDInfo$PubMedID
        PMIDInfo= PMIDInfo[,-1]

        #Res<-cbind(outMat, Total= apply(outMat, 1, sum), PMIDInfo[,c(5, 1,3,4,2)])
        #Res$pubDate<-unlist(strsplit(Res$pubDate, " "))[seq(2, length(Res$pubDate)*2, 2)]
        #names(Res)[10] ="Year"
        #print(Res)

       # print(PMIDInfo[1:2,])
        return(PMIDInfo)
        }



pmids2tagcloud<-function(pmids, pdfFilename=NULL, pngFilename=NULL,addTitle=TRUE) { 
     require(tm)
     require(wordcloud)
     print(paste("Using tm and wordcloud to create tag cloud from", length(pmids), "abstracts"))
     pubmedAbsts<-getPMIDAnnot(as.character(unique(pmids)))
     words<-tolower(unlist(strsplit(as.character(pubmedAbsts$abstText), " ")))
     # remove parentheses, comma, [semi-]colon, period, quotation marks
     words <- words[-grep("[\\)\\(,;:\\.\\'\\\"]", words)]
     words <- words[-grep("^\\d+$", words)]
     words <- words[!words %in% stopwords()]
     wt <- table(words)
     if (!is.null(pdfFilename) & !is.null(pngFilename)) print("Please provide for only one filename, either a pdf OR png file")
     if (!is.null(pdfFilename)) pdf(file=pdfFilename)
     if (!is.null(pngFilename)) png(file=pngFilename)
     wordcloud(names(wt),as.vector(wt))
     if (addTitle) title(main=paste("TagCloud generated from", length(pmids), "PubMed Abstracts"),
     sub= paste("PMIDS:",paste(pmids, collapse=" "), sep=""), cex.sub=0.5, col.main="green",col.sub="gray")
     dev.off()
}



上一篇:批量修改png图片,把文件名添加到图片里面去
下一篇:limma常用的预处理差异表达一体包
你这个问题很复杂,需要打赏,请点击 http://www.bio-info-trainee.com/donate 进行打赏,谢谢
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树 ( 粤ICP备15016384号  

GMT+8, 2019-11-17 09:04 , Processed in 0.036528 second(s), 27 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.