Sunday, November 8, 2015

Python: data frame(4) math of data frame


Abstract: math of data frame and the method apply()


Apply a function to all values in a data frame, such as logarithm.
>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"), index=list("abc"))
>>> print 'Creat a data frame:\n', df
Creat a data frame:
A B C D
a 0.763952 0.251649 0.665287 0.666049
b 0.149687 0.292732 0.052708 0.166835
c 0.747266 0.954362 0.376998 0.326654
[3 rows x 4 columns]
>>>
>>> new_df=df*10
>>> print 'New data frame multiplied with 10:\n', new_df
New data frame multiplied with 10:
A B C D
a 7.639523 2.516488 6.652872 6.660487
b 1.496867 2.927325 0.527075 1.668353
c 7.472662 9.543621 3.769978 3.266537
[3 rows x 4 columns]

>>> new_df=np.sqrt(df)
>>> print 'Sqrt of data frame:',new_df
Sqrt of data frame: new_df=np.log(df)
A B C D
a 0.874044 0.501646 0.815651 0.816118
b 0.386894 0.541048 0.229581 0.408455
c 0.864446 0.976915 0.614001 0.571536
[3 rows x 4 columns]

Or, the subtraction, adding or multiply between data frames
>>> new_df=df+df
>>> print 'add a data frames\n', new_df
add a data frames
A B C D
a 1.527905 0.503298 1.330574 1.332097
b 0.299373 0.585465 0.105415 0.333671
c 1.494532 1.908724 0.753996 0.653307
[3 rows x 4 columns]
>>> new_df=df*df+1
>>> print 'multiply of data frames\n', new_df
multiply of data frames
A B C D
a 1.583623 1.063327 1.442607 1.443621
b 1.022406 1.085692 1.002778 1.027834
c 1.558407 1.910807 1.142127 1.106703
[3 rows x 4 columns]
>>>
Similar with vector operations, the apply() is more practical compared with the for loop. n more common cases, the function apply() can use calculate by rows (axis=1) or columns (axis=0).
>>> r=df.apply(np.sum, axis=1)
>>> print 'Sums by rows:', r
Sums by rows: a 2.346937
b 0.661962
c 2.405280
dtype: float64
>>> c=df.apply(np.sum, axis=0)
>>> print 'mean by columns:', c
mean by columns: A 1.660905
B 1.498743
C 1.094992
D 1.159538
dtype: float64
>>>

We can use personal functions instead of standard functions:
>>> y=10
>>> new_df=df.apply(lambda x,y=y: np.log(x*y), axis=0)
>>> print 'Complicated calculation by rows:\n', new_df
Complicated calculation by rows:
A B C D
a 2.033335 0.922864 1.895049 1.896193
b 0.403374 1.074089 -0.640412 0.511837
c 2.011251 2.255873 1.327069 1.183730
[3 rows x 4 columns]
>>>

Another loop method of a data frame is iterrows().
#for loop
#the iterator(): i is row names,
for i, row in df.iterrows():
print row
#change values by rows
row[2:]=[3,4]
df.set_value(i,'C',2)
print df



No comments:

Post a Comment