Thursday, July 16, 2015

Python: string (1)


Abstract: memory usage of a string.

In python, int or float is fixed variables, which require 24 bit of memory in 64xbit system. The memory usage of string varies depending on the length of the string, at least, 40+length. Here, conversion from DNA to numeric DNA sequences is showed to save memory usage.

The result:
A blank string 37
A string with 10 characters 47
A string with 10 characters 47
A integer with 1 characters 24
A integer with 10 characters 24
A float number with 10 characters 24
A integer with more than 10 characters 24
A DNA sequence of 101 : 138
42114221141232211334342223244412211434412341444234413434231324423433432433211111111111114142141214131
The numeric DNA sequence require bit: 72


The script:
# -*- coding: utf-8 -*-
"""
Created on Thu Jul 16 10:41:41 2015

@author: yuan
"""

import sys
import re


#memory usage of a string measured by bit
print 'A blank string', sys.getsizeof('')
print 'A string with 10 characters', sys.getsizeof('abcdefghij')
print 'A string with 10 characters', sys.getsizeof('1234567890')

#integer would require less memory than sting
print 'A integer with 1 characters', sys.getsizeof(int('1') )
print 'A integer with 10 characters', sys.getsizeof(int('1234567890') )
print 'A float number with 10 characters', sys.getsizeof(float('1234567890') )
print 'A integer with more than 10 characters', sys.getsizeof(int('1234567890')*10 )


#
DNA='CGAACGGAACAGTGGAATTCTCGGGTGCCCAGGAACTCCAGTCACCCGTCCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAACACGACAGACATA'
len_dna=len(DNA)
print 'A DNA sequence of', len_dna, ':', sys.getsizeof(DNA)

#convert DNA string into numeric DNA to save memory
#A=1, T=3,G=2, C=4, N=0
int_dna=DNA.upper()
int_dna=re.sub('A', '1', int_dna)
int_dna=re.sub('T', '3', int_dna)
int_dna=re.sub('G', '2', int_dna)
int_dna=re.sub('C', '4', int_dna)
int_dna=re.sub('N', '5', int_dna)
int_dna=int(int_dna)
print int_dna
print 'The numeric DNA sequence require bit:', sys.getsizeof(int_dna)

No comments:

Post a Comment