搜索
查看: 2934|回复: 6

【菜鸟Python练习12】[ROSALIND-GRPH] Overlap Graphs

[复制链接]

4

主题

51

帖子

327

积分

中级会员

Rank: 3Rank: 3

积分
327
发表于 2016-10-31 07:15:49 | 显示全部楼层 |阅读模式
本帖最后由 xuehzh95 于 2016-10-31 07:17 编辑

[12]Overlap Graphs http://rosalind.info/problems/grph/

A graph whose nodes have all been labeled can be represented by an adjacency list, in whicheach row of the list contains the two node labels corresponding to a unique edge.

A directed graph (or digraph) is a graph containing directed edges, each of which has an orientation.That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of anedge form its tail and head, respectively.  The directed edge with tail v and head
w isrepresented by (v,w) (but not by (w,v)).  A directed loop is a directed edge of the form (v,v).

For a collection of strings and a positive integer k, the overlap graph for the strings is adirected graph
Ok in which each string is represented by a node, and string s is connected to stringt with a directed edge when there is a length k suffix of s that matches a length k prefix of t,as long as s≠t; we demand s≠tto prevent directed loops in the overlap graph(although directed cycles may be present).
Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.
Return: The adjacency list corresponding to O3.  You may return edges in any order.


Sample Dataset
>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG

Sample Output
Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323








本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?立即注册

x



上一篇:每种酶都可以找到抑制剂吗?抑制效率如何呢?
下一篇:Perl处理得到>ID locus=scaffold:start:end:strand换行接序列的fasta文件
回复

使用道具 举报

0

主题

13

帖子

230

积分

中级会员

Rank: 3Rank: 3

积分
230
发表于 2018-3-28 23:51:12 | 显示全部楼层
## overlap ##
from collections import OrderedDict
import re

def overlap_graph(dna,n):
    edges = []
    for ke1, val1 in dna:
        for ke2, val2 in dna:     #定义函数调用要注意名称
            if ke1 != ke2 and val1[-n:] == val2[0:n]:
               edges.append(ke1+'\t'+ke2)
    return edges

seq = OrderedDict()

with open ('/Users/qjy543/data/test/testdata/sampledatabase12.txt') as f:
     for line in f:
         line = line.rstrip()
         if line.startswith('>'):
            seqName = re.sub('>','',line)
            seq[seqName] = ''
            continue
         seq[seqName] += line.upper()  #注意if的用法

dna =seq.items()

fh = open('rosalind_grph_output.txt', 'wt')

for x in overlap_graph(dna,3):
    fh.write(x+'\n')

fh.close()


回复 支持 1 反对 0

使用道具 举报

20

主题

68

帖子

870

积分

版主

Rank: 7Rank: 7Rank: 7

积分
870
QQ
发表于 2016-10-31 14:47:06 | 显示全部楼层
本帖最后由 bioinfo.dong 于 2016-10-31 14:54 编辑

贴下我的code,感觉写的没问题,提交答案总报错,不晓得咋回事儿了~

[Python] 纯文本查看 复制代码
### 12. Overlap Graphs ###
from collections import OrderedDict
import re

def overlap_graph(dna,n):
    edges = []
    for ke1, val1 in dna.items():
        for ke2, val2 in dna.items():
            if ke1 != ke2 and val1[-n:] == val2[:n]:
                edges.append(ke1+'\t'+ke2)
    return edges

seq = OrderedDict()
with open('/home/dong/Documents/rosalind_grph.txt') as f:
    for line in f:
        line = line.rstrip()
        if line.startswith('>'):
            seqName = re.sub('>','',line)
            seq[seqName] = ''
            continue
        seq[seqName] += line.upper()
        
dna = seq.items()
fh = open('rosalind_grph_output.txt', 'wt')
for x in overlap_graph(dna,3):
    fh.write(x+'\n')
    
fh.close()
You really shouldn't spend your time reinventing the wheel
回复 支持 1 反对 0

使用道具 举报

4

主题

51

帖子

327

积分

中级会员

Rank: 3Rank: 3

积分
327
 楼主| 发表于 2016-10-31 15:35:42 | 显示全部楼层
[Python] 纯文本查看 复制代码
seq_list = []
stseq = ''
for line in open('rosalind_grph.txt'):
    if line[0] == '>':
        if stseq != '':
            seq_list.append([stname,stseq])
            stseq = ''
        stname = line[1:-1]
    else:
        stseq = stseq + line.strip('\n')
seq_list.append([stname,stseq])
l = len(seq_list)

for i in range(0,l):
    for j in range(0,i):
        if seq_list[i][1] == seq_list[j][1]:
            continue
        if seq_list[i][1][0:3] == seq_list[j][1][-3:]:
            print seq_list[j][0],seq_list[i][0]
        if seq_list[i][1][-3:] == seq_list[j][1][0:3]:
            print seq_list[i][0],seq_list[j][0]
回复 支持 反对

使用道具 举报

103

主题

133

帖子

836

积分

版主

Rank: 7Rank: 7Rank: 7

积分
836
发表于 2017-8-3 10:05:09 | 显示全部楼层
bioinfo.dong 发表于 2016-10-31 14:47
贴下我的code,感觉写的没问题,提交答案总报错,不晓得咋回事儿了~

[mw_shl_code=python,true]### 12. O ...

line 23 dna=seq.items()
line 07 dna.items()?   
问题是不是这里呢?
基因组,转绿组,肿瘤信息,生物统计,Python, Linux.
回复 支持 反对

使用道具 举报

1

主题

10

帖子

80

积分

注册会员

Rank: 2

积分
80
发表于 2018-4-27 22:49:40 | 显示全部楼层
代码写出来了,哪个大神能给我解释下这题的意思和有什么应用吗
回复 支持 反对

使用道具 举报

0

主题

17

帖子

846

积分

高级会员

Rank: 4

积分
846
发表于 2018-12-2 13:49:04 | 显示全部楼层
bioinfo.dong 发表于 2016-10-31 14:47
贴下我的code,感觉写的没问题,提交答案总报错,不晓得咋回事儿了~

[mw_shl_code=python,true]### 12. O ...

去掉dna=seq.items()吧,前面已经声明了
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|小黑屋|生信技能树    

GMT+8, 2018-12-12 05:23 , Processed in 0.042843 second(s), 28 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.