Have you ever had to type out a massive CREATE TABLE
statement by hand? One with dozens of columns? Maybe several dozens of columns? There are likely some GUI tools to help with large CREATE TABLE
commands. Or, other drag-n-drop types of software that I am not familiar with. What if you could write a few lines of Python code and take care of a huge CREATE TABLE
statement with way less effort than typed manually? Interested? Continue reading and see how using pandas, pyodbc, and MySQL…
pandas
Pandas and the python os module use case – appending source file information from CSV’s.
In my day job, I use Python to automate several redundant tasks. Between the Pandas library and the CSV module, there is always something available for me to reach for. I typically process several different CSV’s each day with a planned final destination in a SQL database. While contemplating the schema design, I determined it would be best to store the actual source file information from which the data is derived, using the source files’ name and appending it to the end of each row. How did Python help me accomplish this? Continue reading to find out…
[Keep reading for more SQL database and Python-centric content >>>]Easy datetime manipulation with the pandas to_datetime() function.
It seems like no matter what (programming) language you use, there are nuances when it comes to working with date and/or datetime values. Nevertheless, dates are super important. A lot hinges on date and time values. For instance, it is paramount that my employer and I both are conscious of the days and hours I work. Be it business-related or centered around life, we always tend to keep the time so to speak. Pandas has a to_datetime()
function that provides quite a bit of useful functionality when working with datetime values. Inspired by a great video where I learned about this function, this post will cover several basic examples of its use. As always, read and learn with me…

-
Â
- OpenSuse Leap 15.0 Â
- Python 3.7.2
Self-Promotion:
If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!
To start, I must give credit where credit is due. I learned so much from this fantastic pandas video. Several of the examples that follow are inspired or derived from it, applying the concepts to my own data sources. Be sure and check it out!
To use pandas we need to first, import it. I’ll also be using csv data so the powerful read_csv()
function takes care of that. Do check out any one of the several links provided in the closing section for other blog posts I’ve written on the read_csv()
function.
1 2 3 4 5 6 7 8 9 10 | >>> import pandas as pd >>> from datetime import date >>> data = pd.read_csv(r'/home/joshua/Practice Data/Fitness_DB_Data/aug_2019_hiking_stats.csv', delimiter=',') >>> data.head()  day_walked cal_burned miles_walked duration mph shoe_id 0 2019–08–01 358.1 3.53 01:05:51 3.2 6 1 2019–08–02 354.7 3.46 01:05:12 3.2 6 2 2019–08–04 354.2 3.55 01:05:07 3.3 6 3 2019–08–05 351.3 3.52 01:04:36 3.3 6 4 2019–08–06 292.2 2.83 00:53:44 3.2 4 |
Let’s look at the dtypes
attribute, paying particular attention to the ‘day_walked’ columns’ type.
1 2 3 4 5 6 7 8 | >>> data.dtypes day_walked object cal_burned float64 miles_walked float64 duration object mph float64 shoe_id int64 dtype: object |
Using to_datetime()
, I’ll reassign the ‘day_walked’ columns type to a datetime of the same format that the column string currently represents: 4 digit year, 2 digit month, and 2 digit day:
1 | data['day_walked'] = pd.to_datetime(data['day_walked'], format='%Y-%m-%d') |
Now have a look at the ‘day_walked’ column type:
1 2 3 4 5 6 7 8 | >>> data.dtypes day_walked datetime64[ns] cal_burned float64 miles_walked float64 duration object mph float64 shoe_id int64 dtype: object |
Since it is a datetime
type, you now can access several properties relating to date and time functionality. It is important to note here that my dataset does not contain any timestamp values for column ‘day_walked’. Therefore, those specific types of properties are not available for it.
You can retrieve the actual weekday name via that property:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | >>> data['day_walked'].dt.weekday_name 0 Thursday 1 Friday 2 Sunday 3 Monday 4 Tuesday 5 Wednesday 6 Thursday 7 Friday 8 Sunday 9 Monday 10 Tuesday 11 Wednesday 12 Thursday 13 Friday 14 Sunday 15 Monday 16 Tuesday 17 Thursday 18 Friday 19 Sunday 20 Tuesday 21 Wednesday 22 Thursday 23 Friday |
Although we can obviously see it in the dataset, you can also extract the year value through that particular property:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | >>> data['day_walked'].dt.year 0 2019 1 2019 2 2019 3 2019 4 2019 5 2019 6 2019 7 2019 8 2019 9 2019 10 2019 11 2019 12 2019 13 2019 14 2019 15 2019 16 2019 17 2019 18 2019 19 2019 20 2019 21 2019 22 2019 23 2019 |
Same with the month number:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | >>> data['day_walked'].dt.month 0 8 1 8 2 8 3 8 4 8 5 8 6 8 7 8 8 8 9 8 10 8 11 8 12 8 13 8 14 8 15 8 16 8 17 8 18 8 19 8 20 8 21 8 22 8 23 8 |
Need just the day of the month? There is an attribute for it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | >>> data['day_walked'].dt.day 0 1 1 2 2 4 3 5 4 6 5 7 6 8 7 9 8 11 9 12 10 13 11 14 12 15 13 16 14 18 15 19 16 20 17 22 18 23 19 25 20 27 21 28 22 29 23 30 |
The weekday
attribute returns a number, indicating the day of the week where Monday would be 0 (zero) and Sunday is 6:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | >>> data['day_walked'].dt.weekday 0 3 1 4 2 6 3 0 4 1 5 2 6 3 7 4 8 6 9 0 10 1 11 2 12 3 13 4 14 6 15 0 16 1 17 3 18 4 19 6 20 1 21 2 22 3 23 4 |
To know the ordinal day of the year, that attribute is available:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | >>> data['day_walked'].dt.dayofyear 0 213 1 214 2 216 3 217 4 218 5 219 6 220 7 221 8 223 9 224 10 225 11 226 12 227 13 228 14 230 15 231 16 232 17 234 18 235 19 237 20 239 21 240 22 241 23 242 |
You can easily determine the first day of the month with is_month_start
. For those dates that are not, False
is returned:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | >>> data['day_walked'].dt.is_month_start 0 True 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 False 9 False 10 False 11 False 12 False 13 False 14 False 15 False 16 False 17 False 18 False 19 False 20 False 21 False 22 False 23 False |
According to the results above, I did walk on the first day of the month. However, calling is_month_end
returns False
for each row, that is not the last day of the month, indicating I did not walk on the last day of the month as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | >>> data['day_walked'].dt.is_month_end 0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 False 9 False 10 False 11 False 12 False 13 False 14 False 15 False 16 False 17 False 18 False 19 False 20 False 21 False 22 False 23 False |
By it being a datetime data type, column ‘walking_stats’ can be used in comparisons with other datetime values. Let’s assign the date of ‘2019–08–19’ to a variable:
1 | >>> curious_day = pd.to_datetime('2019–08–19') |
To retrieve those rows where the date is greater than or equal to the date value of ‘2019–08–19’, the comparison could look like this:
1 2 3 4 5 6 7 8 9 10 11 | >>> data.loc[data['day_walked'] >= curious_day]  day_walked cal_burned miles_walked duration mph shoe_id 15 2019–08–19 370.1 3.68 01:08:03 3.2 5 16 2019–08–20 360.1 3.60 01:06:20 3.3 5 17 2019–08–22 376.1 3.75 01:09:09 3.3 5 18 2019–08–23 351.7 3.46 01:04:39 3.2 5 19 2019–08–25 355.1 3.51 01:05:17 3.2 5 20 2019–08–27 378.1 3.71 01:09:31 3.2 5 21 2019–08–28 345.1 3.41 01:03:27 3.2 5 22 2019–08–29 365.8 3.58 01:07:15 3.2 5 23 2019–08–30 205.5 2.02 00:37:47 3.2 5 |
Likewise, using the less-than comparison operator, retrieves those rows with dates prior to ‘2019–08–19’:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | >>> data.loc[data['day_walked'] < curious_day]  day_walked cal_burned miles_walked duration mph shoe_id 0 2019–08–01 358.1 3.53 01:05:51 3.2 6 1 2019–08–02 354.7 3.46 01:05:12 3.2 6 2 2019–08–04 354.2 3.55 01:05:07 3.3 6 3 2019–08–05 351.3 3.52 01:04:36 3.3 6 4 2019–08–06 292.2 2.83 00:53:44 3.2 4 5 2019–08–07 339.0 3.31 01:02:20 3.2 4 6 2019–08–08 344.1 3.36 01:03:16 3.2 4 7 2019–08–09 316.7 3.12 00:58:13 3.2 5 8 2019–08–11 363.2 3.64 01:06:47 3.3 5 9 2019–08–12 346.1 3.42 01:03:38 3.2 5 10 2019–08–13 370.0 3.66 01:08:01 3.2 5 11 2019–08–14 347.0 3.45 01:03:55 3.2 5 12 2019–08–15 327.0 3.25 01:00:09 3.2 5 13 2019–08–16 351.0 3.45 01:04:32 3.2 5 14 2019–08–18 347.4 3.47 01:03:53 3.3 5 |
Explore the to_datetime()
function, trying out it’s handy functionality on your data sets’ datetime values. Thanks for reading!
See these links to official documentation on many of the pandas topics covered in the post:
Other posts you may be interested in: Bulk CSV Uploads with Pandas and PostgreSQL
-
Â
- Pandas concat() then to_sql() - CSV upload to PostgreSQL Â
- Pandas to SQL - importing CSV data files into PostgreSQL Â
- Basic CSV file import and exploration with Pandas - first steps. Â
- Postgres, Python, and Psycopg2 - executemany() method CSV upload example. Â
- Python and psycopg2 for CSV bulk upload in PostgreSQL - with examples… Â
- COPY and CAST() - Bulk uploads in PostgreSQL
Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!
A Call To Action!
Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.
Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.
Have I mentioned how much I love a cup of coffee?!?!
To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)
Be sure and visit the “Best Of” page for a collection of my best blog posts.
Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.
Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.
Renaming pandas DataFrame columns – with examples.
In the blog post, How to drop columns from a Pandas DataFrame – with examples., I covered examples of how to completely remove certain columns from a pandas DataFrame. But what if you need to keep the columns, yet their names are not to your liking? Perhaps you need to provide a report with meaningful column names in a CSV or Excel file? Again, pandas make this relatively simple. Let’s understand with some examples…
[Keep reading for more SQL database and Python-centric goodness >>>]How to drop columns from a Pandas DataFrame – with examples.
I use Pandas – and Python in general – for any type of scripting. Having grown to loathe redundant menial tasks, especially with CSV’s, I lean more and more on this powerful library. Since I manipulate and use them (CSV’s) daily at work, if I perform a routine 3 times, it finds its way into a Python script with pandas leading the charge. More often than not, when analyzing CSV data, they tend to be messy. Likely they have columns you don’t need or care about. Pandas DataFrames have a drop()
function that allows you to get rid of those columns and keep only the ones you need. As always, I learn best by example so keep reading and learn with me…