Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
O
Optimized Nussinov Algorithm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
mika04
Optimized Nussinov Algorithm
Commits
7d62705b
Commit
7d62705b
authored
2 months ago
by
Mika Cankosyan
Browse files
Options
Downloads
Patches
Plain Diff
README.txt
parent
15b2425c
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
README.txt
+33
-0
33 additions, 0 deletions
README.txt
nussinov.py
+3
-8
3 additions, 8 deletions
nussinov.py
with
36 additions
and
8 deletions
README.txt
+
33
−
0
View file @
7d62705b
Table of contents
Introduction
Program description
How to run
Introduction
The Nussinov algorithm is one of the earliest well-known algorithms for RNA secondary structure prediction. Given an RNA primary sequence, it finds
the secondary structure with the maximum number of complemetary base-pairs and without any pseudoknots. It is a very simple algorithm that uses
dynamic programming. It works best for short sequences and for structures without pseudoknots, but struggles with longer ones and with pseudoknots.
It is deterministic, meaning it will output the same secondary structure every time. This is good for consistency and ease of debugging, but
it means that it can't give probabilities for multiple different structures. it is possible for some RNA molecules to take on multiple different
secondary structures. Since it is a simple algorithm that only "cares" about number of complementary base-pairs, it doesn't account for other
biological considerations, thermodynamic stability, etc
Program description
This program implements both the original Nussinov algorithm, as well as an optimized and probabilistic forms of it. The original version uses
dynamic programming. In a nutshell, given an RNA primary sequence, it constructs a dynamic programming table and a backtrace table, and then
uses the backtrace table to backtrace the optimal secondary structure (i.e. the one with maximum number of base-pairs). In the optimized version,
instead of just giving a score of 1 for complementary (i.e. A-U and G-C) base-pairs, it assigns custom scores for those as well as another possible
base-pair, G-U. By default, A-U is given a score of 1, G-C 1.5, and G-U 0.5, but these can be adjusted to what works best. In the probabilistic
version, instead of storing values in the backtrace to generate only the one optimal path, it stores multiple paths in each entry and backtraces
probabilistically, which basically will output a secondary structure probabilistically weighted by its score
How to run
Start the program with a python interpreter, e.g. "python3 nussinov.py". You will be asked to enter the RNA primary sequence and whether you want
to use the optimized and/or probabilistic version. Then it will display output including the dynamic programming table and the optimal or
probabilistically chosen secondary structure, displayed in two different ways. If you enter something wrong it will ask you to run and try again
See comments in nussinov.py for more information on certain things
\ No newline at end of file
This diff is collapsed.
Click to expand it.
nussinov.py
+
3
−
8
View file @
7d62705b
...
...
@@ -24,8 +24,8 @@ def exit_with_error(error):
# just give u all the tied best scores, whereas running with "optimal" always chooses the same one of the tied-best scores/structures)
def
scores_to_weights
(
scores
):
return
[(
score
+
1
)
**
3
for
score
in
scores
]
# cubed to make good scores especially likely, + 1 bc we can't have all weights = 0
,
and
the + 1 before cubing
so that
#
valu
es 0 < v < 1 are still appropriately scaled up
# cubed to make good scores especially likely, + 1 bc we can't have all weights = 0 and so that
#
scor
es 0 < v < 1 are still appropriately scaled up
def
is_valid_rna_sequence
(
rna_sequence
):
valid_chars
=
{
'
A
'
,
'
U
'
,
'
G
'
,
'
C
'
}
...
...
@@ -56,7 +56,6 @@ def probabilistic_bt_to_chosen_bt(bt, i, j):
length
=
j
-
i
+
1
while
(
bt
[
i
][
j
]
!=
[]):
# print(i, j, bt[i][j]) # debugging
paths
=
[
path
for
path
,
_
in
bt
[
i
][
j
]]
scores
=
[
score
for
_
,
score
in
bt
[
i
][
j
]]
weights1
=
scores_to_weights
(
scores
)
...
...
@@ -71,7 +70,6 @@ def probabilistic_bt_to_chosen_bt(bt, i, j):
i
+=
1
j
-=
1
else
:
# bifurcation
# print(bt[i][j]) # debugging
probabilistic_bt_to_chosen_bt
(
bt
,
i
,
bt
[
i
][
j
])
probabilistic_bt_to_chosen_bt
(
bt
,
bt
[
i
][
j
]
+
1
,
j
)
return
bt
...
...
@@ -80,7 +78,6 @@ def probabilistic_bt_to_chosen_bt(bt, i, j):
# returns a list consisting of '(', ')', and '-' characters
def
bt_to_coded_list
(
bt
,
i
,
j
):
# print(i, j) # debugging
length
=
j
-
i
+
1
og_i
=
i
coded_list
=
list
(
'
-
'
*
length
)
...
...
@@ -251,15 +248,13 @@ dp_table, bt = nussinov(rna_sequence, optimized, probabilistic)
print
(
"
\n\n
the dynamic programming table:
\n
"
)
print_2d_array
(
dp_table
)
# print(bt) # debugging
if
(
probabilistic
):
bt
=
probabilistic_bt_to_chosen_bt
(
bt
,
0
,
len
(
rna_sequence
)
-
1
)
# print(bt) # debugging
print
(
"
\n\n
the probabilistically chosen secondary structure, in hyphen-parentheses notation, and as a list of base-pairs
"
,
\
"
each key-value pair is a base-pair)
\n
"
)
else
:
print
(
"
\n\n
the optimal secondary structure, in hyphen-parentheses notation, and as a list of base-pairs (each key-value pair is a base-pair)
\n
"
)
print
(
"
\n\n
the optimal secondary structure, in hyphen-parentheses notation, and as a list of base-pairs (each key-value pair is a base-pair)
:
\n
"
)
coded_list
=
bt_to_coded_list
(
bt
,
0
,
len
(
rna_sequence
)
-
1
)
base_pairs
=
bt_to_base_pairs
(
bt
,
0
,
len
(
rna_sequence
)
-
1
,
{})
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment