Abstract: math of
data frame and the method apply()
Apply a function to
all values in a data frame, such as logarithm.
>>>
df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"),
index=list("abc"))
>>>
print 'Creat a data frame:\n', df
Creat
a data frame:
A B C D
a
0.763952 0.251649 0.665287 0.666049
b
0.149687 0.292732 0.052708 0.166835
c
0.747266 0.954362 0.376998 0.326654
[3
rows x 4 columns]
>>>
>>>
new_df=df*10
>>>
print 'New data frame multiplied with 10:\n', new_df
New
data frame multiplied with 10:
A B C D
a
7.639523 2.516488 6.652872 6.660487
b
1.496867 2.927325 0.527075 1.668353
c
7.472662 9.543621 3.769978 3.266537
[3
rows x 4 columns]
>>>
new_df=np.sqrt(df)
>>>
print 'Sqrt of data frame:',new_df
Sqrt
of data frame: new_df=np.log(df)
A B C D
a
0.874044 0.501646 0.815651 0.816118
b
0.386894 0.541048 0.229581 0.408455
c
0.864446 0.976915 0.614001 0.571536
[3
rows x 4 columns]
Or, the subtraction,
adding or multiply between data frames
>>>
new_df=df+df
>>>
print 'add a data frames\n', new_df
add
a data frames
A B C D
a
1.527905 0.503298 1.330574 1.332097
b
0.299373 0.585465 0.105415 0.333671
c
1.494532 1.908724 0.753996 0.653307
[3
rows x 4 columns]
>>>
new_df=df*df+1
>>>
print 'multiply of data frames\n', new_df
multiply
of data frames
A B C D
a
1.583623 1.063327 1.442607 1.443621
b
1.022406 1.085692 1.002778 1.027834
c
1.558407 1.910807 1.142127 1.106703
[3
rows x 4 columns]
>>>
Similar with vector
operations, the apply() is more practical compared with the for loop.
n more common cases, the function apply() can use calculate by rows
(axis=1) or columns (axis=0).
>>>
r=df.apply(np.sum, axis=1)
>>>
print 'Sums by rows:', r
Sums
by rows: a 2.346937
b
0.661962
c
2.405280
dtype:
float64
>>>
c=df.apply(np.sum, axis=0)
>>>
print 'mean by columns:', c
mean
by columns: A 1.660905
B
1.498743
C
1.094992
D
1.159538
dtype:
float64
>>>
We can use personal
functions instead of standard functions:
>>>
y=10
>>>
new_df=df.apply(lambda x,y=y: np.log(x*y), axis=0)
>>>
print 'Complicated calculation by rows:\n', new_df
Complicated
calculation by rows:
A B C D
a
2.033335 0.922864 1.895049 1.896193
b
0.403374 1.074089 -0.640412 0.511837
c
2.011251 2.255873 1.327069 1.183730
[3
rows x 4 columns]
>>>
Another loop method
of a data frame is iterrows().
#for loop#the iterator(): i is row names,
for
i, row in df.iterrows():
print
row
#change
values by rows
row[2:]=[3,4]
df.set_value(i,'C',2)
print
df
No comments:
Post a Comment