Wednesday, January 20, 2016

Python-pandas (7): Sort A Data Frame



Abstract: Sort rows or columns of a data frame using the function sort() or sort_index() of the module pandas.


Firstly, create a data frame with column and row names, respectively.



>>>df = pd.DataFrame([{'c1':3,'c2':10}, {'c1':2, 'c2':30},
{'c1':1,'c2':20},{'c1':2,'c2':15},
{'c1':2,'c2':100}, {'c1':-2,'c2':10}])
>>>df.index=df['c1']
>>>print df
c1 c2
c1
3 3 10
2 2 30
1 1 20
2 2 15
2 2 100
-2 -2 10
[6 rows x 2 columns]

1. Sort rows
The function sort() can be used for sorting rows by given columns or by row names:

#sort rows by columns 'c1' and 'c2' in descending order
>>>sorted_df=df.sort(['c1','c2'], ascending=False)
>>>print sorted_df
c1 c2
c1
3 3 10
2 2 100
2 2 30
2 2 15
1 1 20
-2 -2 10
[6 rows x 2 columns]

#one is in descending order, and the other is in ascending order.
>>>sorted_df=df.sort(['c1','c2'], ascending=[False, True])
>>>print sorted_df
c1 c2
c1
3 3 10
2 2 15
2 2 30
2 2 100
1 1 20
-2 -2 10
[6 rows x 2 columns]

#sort rows by row names using sort_index()
>>>sorted_df=df.sort_index(axis=0, ascending=True)
>>>print sorted_df
c1 c2
c1
-2 -2 10
1 1 20
2 2 30
2 2 15
2 2 100
3 3 10
[6 rows x 2 columns]

#sort rows by row names. Row names should be unique with this method
>>>df.index=list('cdabfg')
>>>sorted_df=df.ix[sorted(df.index)]
>>>print sorted_df
c1 c2
a 1 20
b 2 15
c 3 10
d 2 30
f 2 100
g -2 10
[6 rows x 2 columns]

#sort rows by a given order. The missing data would be filled by NAN
>>>sorted_df=df.ix[list('gxdacfa')]
>>>print sorted_df
c1 c2
g -2 10
x NaN NaN
d 2 30
a 1 20
c 3 10
f 2 100
a 1 20
[7 rows x 2 columns]

2. sort columns
#sort columns by column names
>>>sorted_df=df.sort_index(axis=1, ascending=False)
>>>print sorted_df
c2 c1
c 10 3
d 30 2
a 20 1
b 15 2
f 100 2
g 10 -2
[6 rows x 2 columns]

#sort columns by column names
>>>sorted_df=df[sorted(df.columns, reverse=True)]
>>>print sorted_df
c2 c1
c 10 3
d 30 2
a 20 1
b 15 2
f 100 2
g 10 -2
[6 rows x 2 columns]

3. other usages
#in-place sorting using sort()
>>>df.sort(['c1','c2'], ascending=False, inplace=True)
>>>print df
c1 c2
c 3 10
f 2 100
d 2 30
b 2 15
a 1 20
g -2 10
[6 rows x 2 columns]

--end--

No comments:

Post a Comment