NaProLink

Description

The file sequence.csv contains information about proteins, RNAs, and DNAs involved in complexes. These sequences have been unified and duplicated structrues have been eliminated.

Fields

Field Name	Data Type	Description	Non-null Count	Example
uid	text	unique id	19465	RA002292
sequence	text	Primary structure of macromolecules	19465	UGGAUGGAACGAAAAA
auth_ids	text	PDB IDs assigned by authors	19465	7RR5_m
auth_desc	text	Descriptions explained by authors	19465	mRNA
pdb_ids	text	PDB IDs assigned by RCSB	17824	7RR5_EC
pdb_descs	text	Descriptions explained by authors	17824	mRNA
organisms	text	Descriptions explained by authors	16200	Saccharomyces cerevisiae
sequence_type	text	Specifies the type of macromolecule. (RNA, DNA, and Protein)	19465	RNA
`The separator for all fields is ; character.`

Go top

Statistics

Title	Count	Description
RNA	2605
DNA	4415
Protein	12445
Total	19465

Go top

Preview

uid	sequence	auth_ids	auth_desc	pdb_ids	pdb_descs	organisms	sequence_type
RA002292	UGGAUGGAACGAAAAA	7RR5_m	mRNA	7RR5_EC	mRNA	Saccharomyces cerevisiae	RNA
DA004106	TTAG...TTAA	7LBX_F;7LBX_D	DNA (5'-D(PTP...APA)-3')	7LBX_F;7LBX_D	DNA	synthetic construct	DNA
PA001727	GPMS...SGNV	3ICQ_U;3ICQ_T	Exportin-T	3ICQ_D;3ICQ_A	Exportin-T	Schizosaccharomyces pombe	Protein
DA000301	AATG...CTAT	1PAR_F;1BDT_F;...	DNA (5'-D(AP...AP*T)-3')	1PAR_B;1BDT_B;...	DNA	NaN	DNA
PA000295	AKRT...TKEQ	6XU6_Cp;6XU7_Cp;...	60S ribosomal protein L37a	NaN	NaN	NaN	Protein
PA000234	AKGE...KILE	4V5P_CZ;4V5P_AZ	ELONGATION FACTOR TU	NaN	NaN	NaN	Protein
PA003630	MAKG...AKIK	7S1K_j;5MDZ_c;...	50S RIBOSOMAL PROTEIN L33	7S1K_MA;5MDZ_GA;...	50S ribosomal protein L33	Escherichia coli BL21(DE3)	Protein
DA003392	TACGTCATC	4J9L_P	DNA	4J9L_C	DNA	NaN	DNA
PA001268	GEIQ...HHHHH	4K4W_A;4K4W_E	RNA-directed RNA polymerase 3D-POL	NaN	NaN	NaN	Protein
RA002085	UACUGUGCCAUAC	3V6Y_B	RNA (5'-R(UP...C)-3')	3V6Y_B	RNA	Caenorhabditis elegans	RNA

Go top

Python code example

Loading dataset

import pandas as pd
dfSeq = pd.read_csv(r'sequence.csv', index_col = None)
print(dfSeq.shape)

Getting only RNA sequences

dfRNA=dfSeq[dfSeq.sequence_type == 'RNA']

Query

Getting all DNA sequences with PDB IDs containing '1A'

dfRNA=dfSeq[dfSeq.sequence_type == 'RNA']

Go top

Download

File size: 8.25 MB

File format: Comma Separated Vector (csv)

Version: 1.0

Release date: March 30, 2023

Delimiter: ,

Download sequence.csv

Go top

NaProBase