NaProBase

Sequence
Description
The file sequence.csv contains information about proteins, RNAs, and DNAs involved in complexes. These sequences have been unified and duplicated structrues have been eliminated.
Fields
Field Name Data Type Description Non-null Count Example
uid text unique id 19465 RA002292
sequence text Primary structure of macromolecules 19465 UGGAUGGAACGAAAAA
auth_ids text PDB IDs assigned by authors 19465 7RR5_m
auth_desc text Descriptions explained by authors 19465 mRNA
pdb_ids text PDB IDs assigned by RCSB 17824 7RR5_EC
pdb_descs text Descriptions explained by authors 17824 mRNA
organisms text Descriptions explained by authors 16200 Saccharomyces cerevisiae
sequence_type text Specifies the type of macromolecule. (RNA, DNA, and Protein) 19465 RNA
The separator for all fields is ; character.
Statistics
Title Count Description
RNA 2605
DNA 4415
Protein 12445
Total 19465
Preview
uid sequence auth_ids auth_desc pdb_ids pdb_descs organisms sequence_type
RA002292 UGGAUGGAACGAAAAA 7RR5_m mRNA 7RR5_EC mRNA Saccharomyces cerevisiae RNA
DA004106 TTAG...TTAA 7LBX_F;7LBX_D DNA (5'-D(P*TP...AP*A)-3') 7LBX_F;7LBX_D DNA synthetic construct DNA
PA001727 GPMS...SGNV 3ICQ_U;3ICQ_T Exportin-T 3ICQ_D;3ICQ_A Exportin-T Schizosaccharomyces pombe Protein
DA000301 AATG...CTAT 1PAR_F;1BDT_F;... DNA (5'-D(*AP*...AP*T)-3') 1PAR_B;1BDT_B;... DNA NaN DNA
PA000295 AKRT...TKEQ 6XU6_Cp;6XU7_Cp;... 60S ribosomal protein L37a NaN NaN NaN Protein
PA000234 AKGE...KILE 4V5P_CZ;4V5P_AZ ELONGATION FACTOR TU NaN NaN NaN Protein
PA003630 MAKG...AKIK 7S1K_j;5MDZ_c;... 50S RIBOSOMAL PROTEIN L33 7S1K_MA;5MDZ_GA;... 50S ribosomal protein L33 Escherichia coli BL21(DE3) Protein
DA003392 TACGTCATC 4J9L_P DNA 4J9L_C DNA NaN DNA
PA001268 GEIQ...HHHHH 4K4W_A;4K4W_E RNA-directed RNA polymerase 3D-POL NaN NaN NaN Protein
RA002085 UACUGUGCCAUAC 3V6Y_B RNA (5'-R(*UP...*C)-3') 3V6Y_B RNA Caenorhabditis elegans RNA
Python code example
Loading dataset
import pandas as pd
dfSeq = pd.read_csv(r'sequence.csv', index_col = None)
print(dfSeq.shape)
 
Getting only RNA sequences
dfRNA=dfSeq[dfSeq.sequence_type == 'RNA']
              
Query
Getting all DNA sequences with PDB IDs containing '1A'
dfRNA=dfSeq[dfSeq.sequence_type == 'RNA']
              
Download
File size: 8.25 MB
File format: Comma Separated Vector (csv)
Version: 1.0
Release date: March 30, 2023
Delimiter: ,
Download sequence.csv
nematzadeh.sajjad@gmail.com