When we deal with time difference or comparison, we have to care about time zone and timestamp. For example the main database stores times in unix time, while another transaction system records only local time. Such a kind of difference of time expression often happens especially when the systems are placed in different countries.
In this entry we discuss the conversion between Unix time and timestamp with or without time zone and libraries for the conversion. The aim of this entry is to understand how to convert a time expression into another.
A timestamp is basically any expression of a time. But in some context timestamp means especially Unix time, which we see later.
UTC is basically the same as the British local time without DST.
The unix time is the number of seconds from 1970-01-01 00:00:00 UTC.
To be rigorous, a time zone is
an area, where the same local time expression is used. We often use
a text such as
Asia/Tokyo to describe a time zone. You
can find the list of the timezone
(Or if you have installed R, then the function
gives a vector of timezones.)
But a timezone can also be an offset from UTC. For example JST (Japanese
Standard Time) is 9 hour earlier than UTC. We describe it as
Asia/Tokyo is equivalent to
A timezone can have two UTC offsets: DST. The timezone
can be equivalent to
UTC+01:00 (CET) or
The timezone is assumed to be the timezone where you are. The timezone is often omitted, because it is "obvious".
Daylight Saving Time
Daylight Saving Time (DST) is a change of the UTC offset. On the last Sunday of March we switch the UTC offset from UTC+1 to UTC+2 (in Germany).
Therefore "2 o'clock" seems to disappear. (If you follow the gray part of the diagram, that is what happens at the end of DST.)
Note we need a geographic timezone (such as "Europe/Berlin") in order to deal with DST in a proper way.
Unix time and a timezone
As we said above, the time expression without time zone has
ambiguity. But if we specify a timezone, its expression of time is
unique. For example,
2018-01-23 12:34:56 UTC+1. Because of the unique
expression of time, we can convert it in a different timezone. For
example the above timestamp is equivalent to
2018-01-23 08:34:56 UTC-3.
If you understand this fact, then you can easily understand the following fact as well. Unix time has no timezone.
Even though UTC is used in the definition of unix time, the value of Unix time is completely independent of the timezone. The timezone is used just to make the time expression unique. You may also define Unix time as
the number of seconds from 1970-01-01 09:00:00 UTC+9.
This definition is completely equivalent to the usual definition of Unix time with UTC. So you do not need to convert a time expression in UTC to obtain the corresponding Unix time.
The Unix time is independent of the place where you are. You can check the current timestamp at time.is. Assume that you have a friend in Japan. You give him a phone call and ask him to open the website and to read out the number he sees. Then you get the exactly same number you are seeing.
Now we can safely understand the following triad.
If 2 of 3 vertices are given, the rest is uniquely determined. Here "YmdHMS" is a time expression of form "%Y-%m-%d %H:%M:%S" (such as "2018-08-20 12:34:56").
Convert time expressions
There are several possibilities to convert time expressions. In this article we focus only on Python and R.
What do we have to consider is two properties:
- if we can parse both a Unix time and an ordinary time expression with timezone (such as ISO-8601 format) and
- if we can express the given time in both a Unix time and an ordinary time expression with a timezone.
Here are sample questions which we consider:
- Find the Unix time corresponding to the time expression
- Express the Unix time
1533116601in a human readable format with timezone in
Note that the above two timestamps are the same.
This is the most simple library for time in python. It uses C library.
Its (only) main class is
The library is awkward to deal with a timezone except UTC and local timezone.
import time iso_str = "2018-08-01T05:43:21-0400" the_time = time.strptime(iso_str, "%Y-%m-%dT%H:%M:%S%z") time.strftime("%Y-%m-%d %H:%M:%S %Z %z", the_time) ## 2018-08-01 05:43:21 time.mktime(the_time) ## 1533095001.0 ## wrong
The last value is wrong. The returned Unix time is equivalent to
2018-08-01 05:43:21+02:00. That is, the timezone is just ignored and
the local timezone is used.
To use the correct timezone (
America/Toronto) we have to modify the class
os.environ["TZ"] = "America/Toronto" time.tzset() the_time = time.strptime(iso_str, "%Y-%m-%dT%H:%M:%S%z") time.strftime("%Y-%m-%d %H:%M:%S %Z %z", the_time) # 2018-08-01 05:43:21 time.mktime(the_time)) ## 1533116601.0 ## correct
We get the correct Unix time, but the timezone in
time.strftime is still
ignored. Moreover we have to change Class attribute to deal with the given
In my opinion this standard library is useless unless you use only either UTC or local time.
datetime module is one of standard libraries and provides several
We use only
datetime.datetime in this entry.
from datetime import datetime iso_str = "2018-08-01T05:43:21-0400" the_time = datetime.strptime(iso_str, "%Y-%m-%dT%H:%M:%S%z") the_time.isoformat(sep=" ") ## 2018-08-01 05:43:21-04:00 the_time.timestamp() ## 1533116601.0 ## correct the_time.tzname() ## UTC-04:00
We can parse a time expression in the ISO format without any problem. The
strptime can recognise properly the timezone.
NB: The only difficulty is that the timezone (
%z) can not contain a colon.
-04:00 causes a
the timezone with a colon.
But if you parse a Unix time, we have to be careful about the behaviour which is related to the timezone.
unix_time = 1533116601 the_time1 = datetime.utcfromtimestamp(unix_time) the_time1.isoformat(sep=" ") ## 2018-08-01 09:43:21 the_time2 = datetime.fromtimestamp(unix_time) the_time2.isoformat(sep=" ") ## 2018-08-01 11:43:21
utcfromtimestamp() gives the UTC datetime and
the local datetime. It seems to be good.
the_time1.timestamp() ## 1533109401.0 ??? the_time2.timestamp() ## 1533116601.0
The documentation says that
return a UTC datetime and a local time respectively, nevertheless they add
no timezone information to the instances. (Namely
A datetime object without
tzinfo behaves as a local time. Therefore
the_time1.timestamp() returns the Unix time of
instead of the original one.
To avoid this kind of ambiguity, we should always give a timezone/UTC offset
(and not use
from datetime import timezone, timedelta the_time1 = datetime.fromtimestamp(unix_time, tz=timezone.utc) the_time1.isoformat(sep=" ") ## 2018-08-01 09:43:21+00:00 the_time1.timestamp() ## 1533116601.0 the_tz = timezone(timedelta(hours=-4)) ## UTC-4 the_time2 = datetime.fromtimestamp(unix_time, tz=the_tz) the_time2.isoformat(sep=" ") ## 2018-08-01 05:43:21-04:00 the_time2.timestamp() ## 1533116601.0
If we want to give an ordinary timezone instead of a UTC offset in order to
deal with DST properly, we should use pytz.
An instance of
pytz just modifies a datetime instance.
import pytz tz_toronto = pytz.timezone("America/Toronto") the_time2 = datetime.fromtimestamp(unix_time, tz=tz_toronto) the_time2.isoformat(sep=" ") ## 2018-08-01 05:43:21-04:00 the_time2.tzinfo ## America/Toronto the_time2.timestamp() ## 1533116601.0
When you add a timezone information to an datetime instance without tzinfo,
localize() method and do not use
tz_japan = pytz.timezone("Asia/Tokyo") iso_str_jst = "2018-08-01T18:43:21" time_no_tz = datetime.strptime(iso_str_jst, "%Y-%m-%dT%H:%M:%S") time_no_tz.astimezone(tz_japan) ## 2018-08-02 01:43:21+09:00 ## wrong time_jp = tz_japan.localize(time_no_tz) time_jp ## 2018-08-01 18:43:21+09:00 ## correct time_jp.astimezone(tz_toronto) ## 2018-08-01 05:43:21-04:00 ## correct
2018-08-01T18:43:21+0900 is equivalent to the
2018-08-01 05:43:21-04:00). It is definitely easy for you to
time_no_tz.astimezone(tz_japan) gives the wrong answer.
That is because
time_no_tz behaves as a local time. After we add the
correct timezone, we get the right time expression by
pandas provides some useful methods for datetime.
iso_str = "2018-08-01T05:43:21-04:00" dt = pd.to_datetime(iso_str) dt ## 2018-08-01 09:43:21 ## converted in UTC unix_time = 1533116601 ds = pd.to_datetime(unix_time, unit="s") ds ## 2018-08-01 09:43:21 ## in UTC
There are three advantages of the function to_datetime().
- We do not need to specify the format of the string. The function guesses automatically the format and applies it.
- We can use the same function to a Unix time by adding
- The function accepts a list or a pandas.Series and converts the values element-wisely.
There are two disadvantages. One is that the function can be slow, because it guesses the format. Another is that the function converts the time expression in UTC but does not add the timezone information. Therefore we have to add the proper timezone manually.
jst.localize(dt) ## 2018-08-01 09:43:21+09:00 ## wrong dt.replace(tzinfo=timezone.utc)\ .astimezone(jst) ## 2018-08-01 18:43:21+09:00 ## correct
The second one is the right way to add the timezone. This is because
localize() method add the timezone information without changing
any values except timezone. Since we have already the time expression
in UTC without the timezone, we have to add the correct timezone to the
time expression and then convert it in the timezone which you need.
replace(tzinfo=timezone.utc) we may use
Many methods of datetime objects are also available for pandas.Timestamp. Thus it is easy to get the correspondence Unix time.
dt.timestamp() # 1533116601.0
POSIXct instance consists of a Unix time (+ alpha).
iso_str <- "2018-08-01T05:43:21-0400" dt1 <- as.POSIXct(iso_str, format="%Y-%m-%dT%H:%M:%S%z") dt1 ## "2018-08-01 11:43:21 CEST" attr(dt1, "tzone") ## ""
dt1 is a
POSIXct instance. Note that the instance display the time
in local timezone. While
as.POSIXct can parse the time zone properly, it
adds no timezone information to the variable. To add the timezone we have
to give it in the
dt2 <- as.POSIXct(iso_str, format="%Y-%m-%dT%H:%M:%S%z", tz="America/Toronto") dt2 ## "2018-08-01 05:43:21 EDT" attr(dt2, "tzone") ## "America/Toronto"
NB: EDT = Eastern Daylight Time.
It is easy to get the corresponding Unix time.
as.integer(dt2) ## 1533116601
We can also convert a Unix time into an ordinary time expression. But we have to give the origin of the Unix time.
unix_time <- 1533116601 dt3 <- as.POSIXct(unix_time, origin="1970-01-01", tz="America/Toronto") dt3 ## "2018-08-01 05:43:21 EDT"
To convert the time expression in a different timezone it suffices to modify
dt_jp <- dt3 ## POSIXct instance in EDT attr(dt_jp, "tzone") <- "Asia/Tokyo" dt_jp ## "2018-08-01 18:43:21 JST"
But to convert the timezone we should use
with_tz() converts the timezone of the given time.
dt <- parse_date_time(iso_str, "YmdHMSz", tz="America/Toronto") dt ## POSIXct object "2018-08-01 05:43:21 EDT" with_tz(dt, tzone="Asia/Tokyo") ## "2018-08-01 18:43:21 JST" force_tz(dt, tzone="Asia/Tokyo") ## "2018-08-01 05:43:21 JST"
with_tz() accepts a vector, we can modify many POSIXct objects
at the same time.
There is another class which can express a time in R:
POSIXlt. But we
skip this class. There are two reasons.
- I do not know how to deal with a timezone in the class. No relevant documents can be found.
dplyrdoes not support
There is no reason to use the class.
lubridate::fast_strptime returns a
- The Unix time has no timezone. It is independent of the place where you are.
- The Unix time, YmdHMS and UTC offset. 2 of them determine the rest.
- In Python we should use datetime + pytz or pandas + pytz.
- In R we should stick with POSIXct (+ lubridate).
As a best practice we should always give a timezone explicitly and should not rely on the default behavior of the library.