Tiezheng Yuan Ph.D.: Python: data frame(4) math of data frame

Abstract: math of data frame and the method apply()

Apply a function to all values in a data frame, such as logarithm.

>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"), index=list("abc"))

>>> print 'Creat a data frame:\n', df

Creat a data frame:

A B C D

a 0.763952 0.251649 0.665287 0.666049

b 0.149687 0.292732 0.052708 0.166835

c 0.747266 0.954362 0.376998 0.326654

[3 rows x 4 columns]

>>>

>>> new_df=df*10

>>> print 'New data frame multiplied with 10:\n', new_df

New data frame multiplied with 10:

A B C D

a 7.639523 2.516488 6.652872 6.660487

b 1.496867 2.927325 0.527075 1.668353

c 7.472662 9.543621 3.769978 3.266537

[3 rows x 4 columns]

>>> new_df=np.sqrt(df)

>>> print 'Sqrt of data frame:',new_df

Sqrt of data frame: new_df=np.log(df)

A B C D

a 0.874044 0.501646 0.815651 0.816118

b 0.386894 0.541048 0.229581 0.408455

c 0.864446 0.976915 0.614001 0.571536

[3 rows x 4 columns]

Or, the subtraction, adding or multiply between data frames

>>> new_df=df+df

>>> print 'add a data frames\n', new_df

add a data frames

A B C D

a 1.527905 0.503298 1.330574 1.332097

b 0.299373 0.585465 0.105415 0.333671

c 1.494532 1.908724 0.753996 0.653307

[3 rows x 4 columns]

>>> new_df=df*df+1

>>> print 'multiply of data frames\n', new_df

multiply of data frames

A B C D

a 1.583623 1.063327 1.442607 1.443621

b 1.022406 1.085692 1.002778 1.027834

c 1.558407 1.910807 1.142127 1.106703

[3 rows x 4 columns]

>>>

Similar with vector operations, the apply() is more practical compared with the for loop. n more common cases, the function apply() can use calculate by rows (axis=1) or columns (axis=0).

>>> r=df.apply(np.sum, axis=1)

>>> print 'Sums by rows:', r

Sums by rows: a 2.346937

b 0.661962

c 2.405280

dtype: float64

>>> c=df.apply(np.sum, axis=0)

>>> print 'mean by columns:', c

mean by columns: A 1.660905

B 1.498743

C 1.094992

D 1.159538

dtype: float64

>>>

We can use personal functions instead of standard functions:

>>> y=10

>>> new_df=df.apply(lambda x,y=y: np.log(x*y), axis=0)

>>> print 'Complicated calculation by rows:\n', new_df

Complicated calculation by rows:

A B C D

a 2.033335 0.922864 1.895049 1.896193

b 0.403374 1.074089 -0.640412 0.511837

c 2.011251 2.255873 1.327069 1.183730

[3 rows x 4 columns]

>>>

Another loop method of a data frame is iterrows().

#for loop
#the iterator(): i is row names,

for i, row in df.iterrows():

print row

#change values by rows

row[2:]=[3,4]

df.set_value(i,'C',2)

print df

Tiezheng Yuan Ph.D.

Sunday, November 8, 2015

Python: data frame(4) math of data frame

No comments:

Post a Comment