Tiezheng Yuan Ph.D.: Python-pandas (7): Sort A Data Frame

Abstract: Sort rows or columns of a data frame using the function sort() or sort_index() of the module pandas.

Firstly, create a data frame with column and row names, respectively.

>>>df = pd.DataFrame([{'c1':3,'c2':10}, {'c1':2, 'c2':30},

{'c1':1,'c2':20},{'c1':2,'c2':15},

{'c1':2,'c2':100}, {'c1':-2,'c2':10}])

>>>df.index=df['c1']

>>>print df

c1 c2

3 3 10

2 2 30

1 1 20

2 2 15

2 2 100

-2 -2 10

[6 rows x 2 columns]

1. Sort rows

The function sort() can be used for sorting rows by given columns or by row names:

#sort rows by columns 'c1' and 'c2' in descending order

>>>sorted_df=df.sort(['c1','c2'], ascending=False)

>>>print sorted_df

c1 c2

3 3 10

2 2 100

2 2 30

2 2 15

1 1 20

-2 -2 10

[6 rows x 2 columns]

#one is in descending order, and the other is in ascending order.

>>>sorted_df=df.sort(['c1','c2'], ascending=[False, True])

>>>print sorted_df

c1 c2

3 3 10

2 2 15

2 2 30

2 2 100

1 1 20

-2 -2 10

[6 rows x 2 columns]

#sort rows by row names using sort_index()

>>>sorted_df=df.sort_index(axis=0, ascending=True)

>>>print sorted_df

c1 c2

-2 -2 10

1 1 20

2 2 30

2 2 15

2 2 100

3 3 10

[6 rows x 2 columns]

#sort rows by row names. Row names should be unique with this method

>>>df.index=list('cdabfg')

>>>sorted_df=df.ix[sorted(df.index)]

>>>print sorted_df

c1 c2

a 1 20

b 2 15

c 3 10

d 2 30

f 2 100

g -2 10

[6 rows x 2 columns]

#sort rows by a given order. The missing data would be filled by NAN

>>>sorted_df=df.ix[list('gxdacfa')]

>>>print sorted_df

c1 c2

g -2 10

x NaN NaN

d 2 30

a 1 20

c 3 10

f 2 100

a 1 20

[7 rows x 2 columns]

2. sort columns

#sort columns by column names

>>>sorted_df=df.sort_index(axis=1, ascending=False)

>>>print sorted_df

c2 c1

c 10 3

d 30 2

a 20 1

b 15 2

f 100 2

g 10 -2

[6 rows x 2 columns]

#sort columns by column names

>>>sorted_df=df[sorted(df.columns, reverse=True)]

>>>print sorted_df

c2 c1

c 10 3

d 30 2

a 20 1

b 15 2

f 100 2

g 10 -2

[6 rows x 2 columns]

3. other usages

#in-place sorting using sort()

>>>df.sort(['c1','c2'], ascending=False, inplace=True)

>>>print df

c1 c2

c 3 10

f 2 100

d 2 30

b 2 15

a 1 20

g -2 10

[6 rows x 2 columns]

--end--

Tiezheng Yuan Ph.D.

Wednesday, January 20, 2016

Python-pandas (7): Sort A Data Frame

No comments:

Post a Comment