How to drop columns from a Pandas DataFrame – with examples.

I use Pandas – and Python in general – for any type of scripting. Having grown to loathe redundant menial tasks, especially with CSV’s, I lean more and more on this powerful library. Since I manipulate and use them (CSV’s) daily at work, if I perform a routine 3 times, it finds its way into a Python script with pandas leading the charge. More often than not, when analyzing CSV data, they tend to be messy. Likely they have columns you don’t need or care about. Pandas DataFrames have a drop() function that allows you to get rid of those columns and keep only the ones you need. As always, I learn best by example so keep reading and learn with me…

view-of-hallway-columns
Photo by Ali Lokhandwala on Unsplash
OS, Database, and software used:
  • OpenSuse Leap 15.0
  • Python 3.7.2/li>


Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!


Far from in shape, I still walk a lot. Typically I walk or hike at least an hour a day, 6 days a week. I track certain stats from my walks for a month in – you guessed it – a CSV file. The example CSV below contains some columns I want to keep and others I don’t. How can I get rid of them with pandas? Pretty simple.

To start, I’ll import pandas and load up the CSV in a DataFrame with the read_csv() function:

1
2
3
4
5
6
7
8
9
>>> import pandas as pd
>>> stats = pd.read_csv('/home/joshua/Practice Data/Fitness_DB_Data/aug_stats.csv', delimiter=',')
>>> stats.head()
   day_walked   cal_burned   miles_walked   duration   mph  additional_weight   weight_amount  trekking_poles   shoe_id   trail_id
0  2018-08-01        336.1           3.37   01:01:48   3.3               true             1.5            true         4          7
1  2018-08-02        355.3           3.70   01:15:14   3.0              false             0.0           false         4          4
2  2018-08-03        259.9           2.57   00:47:47   3.2               true             1.5            true         4          7
3  2018-08-05        341.2           3.37   01:02:44   3.2               true             1.5            true         4          7
4  2018-08-06        357.7           3.64   01:05:46   3.3               true             1.5            true         4          7

(I’ve written many blog posts about pandas and CSV’s. I’ll have a list towards the end of the post so be sure and check those that interest you.)

Although the columns are clearly visible via the output from the head() function, pandas DataFrames do have a columns attribute available you can access to see a list of them as well:

1
2
3
4
5
>>> stats.columns
Index(['day_walked', 'cal_burned', 'miles_walked', 'duration', 'mph',
       'additional_weight', 'weight_amount', 'trekking_poles', 'shoe_id',
       'trail_id'],
      dtype='object')

Dropping a column is as simple as just specifying the column(s) to remove as a parameter to drop():

1
2
3
4
5
6
7
8
9
10
>>> stats.drop(['additional_weight'], axis=1)
   day_walked  cal_burned  miles_walked  duration  mph  weight_amount  trekking_poles  shoe_id  trail_id
0  2018-08-01       336.1          3.37  01:01:48  3.3            1.5            True        4         7
1  2018-08-02       355.3          3.70  01:15:14  3.0            0.0           False        4         4
2  2018-08-03       259.9          2.57  00:47:47  3.2            1.5            True        4         7
3  2018-08-05       341.2          3.37  01:02:44  3.2            1.5            True        4         7
4  2018-08-06       357.7          3.64  01:05:46  3.3            1.5            True        4         7
5  2018-08-17       184.2          1.89  00:39:00  2.9            0.0           False        4         2
6  2018-08-18       242.9          2.53  00:51:25  3.0            0.0           False        4         2
7  2018-08-30       204.4          1.95  00:37:35  3.1            0.0           False        4         5

However, do not ignore the axis parameter. Check out what the DataFrame.drop() documentation says about it, along with acceptable values for it:

1
2
    axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

So the ‘additional_weight’ column is gone now…

1
2
3
4
5
6
7
>>> stats.head()
   day_walked  cal_burned  miles_walked  duration  mph  additional_weight  weight_amount  trekking_poles  shoe_id  trail_id
0  2018-08-01       336.1          3.37  01:01:48  3.3               True            1.5            True        4         7
1  2018-08-02       355.3          3.70  01:15:14  3.0              False            0.0           False        4         4
2  2018-08-03       259.9          2.57  00:47:47  3.2               True            1.5            True        4         7
3  2018-08-05       341.2          3.37  01:02:44  3.2               True            1.5            True        4         7
4  2018-08-06       357.7          3.64  01:05:46  3.3               True            1.5            True        4         7

Huh? What gives?

Unless we reassign the current DataFrame object to another one, to see these changes, we must use the inplace=True parameter:

1
2
3
4
5
6
7
8
>>> stats.drop(['weight_amount'], axis=1, inplace=True)
>>> stats.head()
   day_walked  cal_burned  miles_walked  duration  mph  weight_amount  trekking_poles  shoe_id  trail_id
0  2018-08-01       336.1          3.37  01:01:48  3.3            1.5            True        4         7
1  2018-08-02       355.3          3.70  01:15:14  3.0            0.0           False        4         4
2  2018-08-03       259.9          2.57  00:47:47  3.2            1.5            True        4         7
3  2018-08-05       341.2          3.37  01:02:44  3.2            1.5            True        4         7
4  2018-08-06       357.7          3.64  01:05:46  3.3            1.5            True        4         7

Need to drop multiple columns? No problem. Just supply them in a list and it’s a done deal:

1
2
3
4
5
6
7
8
>>> stats.drop(['weight_amount', 'trekking_poles'], axis=1, inplace=True)
>>> stats.head()
   day_walked  cal_burned  miles_walked  duration  mph  shoe_id  trail_id
0  2018-08-01       336.1          3.37  01:01:48  3.3        4         7
1  2018-08-02       355.3          3.70  01:15:14  3.0        4         4
2  2018-08-03       259.9          2.57  00:47:47  3.2        4         7
3  2018-08-05       341.2          3.37  01:02:44  3.2        4         7
4  2018-08-06       357.7          3.64  01:05:46  3.3        4         7

Other posts you may be interested in: Bulk CSV Uploads with Pandas and PostgreSQL

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Advertisements

One thought on “How to drop columns from a Pandas DataFrame – with examples.

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.