When we deal with time difference or comparison, we have to care about time zone and timestamp. For example the main database stores times in unix time, while another transaction system records only local time. Such a kind of difference of time expression often happens especially when the systems are placed in different countries.
In this entry we discuss the conversion between Unix time and timestamp with or without time zone and libraries for the conversion. The aim of this entry is to understand how to convert a time expression into another.
Terminology
Timestamp
A timestamp is basically any expression of a time. But in some context timestamp means especially Unix time, which we see later.
UTC
UTC is basically the same as the British local time without DST.
Unix time
The unix time is the number of seconds from 1970-01-01 00:00:00 UTC.
Timezone
To be rigorous, a time zone is
an area, where the same local time expression is used. We often use
a text such as Europe/Berlin
or Asia/Tokyo
to describe a time zone. You
can find the list of the timezone
Here.
(Or if you have installed R, then the function
OlsonNames()
gives a vector of timezones.)
But a timezone can also be an offset from UTC. For example JST (Japanese
Standard Time) is 9 hour earlier than UTC. We describe it as UTC+09:00
.
Thus Asia/Tokyo
is equivalent to UTC+09:00
.
A timezone can have two UTC offsets: DST. The timezone
Europe/Berlin
can be equivalent to UTC+01:00
(CET) or UTC+02:00
(CEST).
Local time
The timezone is assumed to be the timezone where you are. The timezone is often omitted, because it is "obvious".
Daylight Saving Time
Daylight Saving Time (DST) is a change of the UTC offset. On the last Sunday of March we switch the UTC offset from UTC+1 to UTC+2 (in Germany).
Therefore "2 o'clock" seems to disappear. (If you follow the gray part of the diagram, that is what happens at the end of DST.)
Note we need a geographic timezone (such as "Europe/Berlin") in order to deal with DST in a proper way.
Unix time and a timezone
As we said above, the time expression without time zone has
ambiguity. But if we specify a timezone, its expression of time is
unique. For example, 2018-01-23 12:34:56 UTC+1
. Because of the unique
expression of time, we can convert it in a different timezone. For
example the above timestamp is equivalent to 2018-01-23 08:34:56 UTC-3
.
If you understand this fact, then you can easily understand the following fact as well. Unix time has no timezone.
Even though UTC is used in the definition of unix time, the value of Unix time is completely independent of the timezone. The timezone is used just to make the time expression unique. You may also define Unix time as
the number of seconds from 1970-01-01 09:00:00 UTC+9.
This definition is completely equivalent to the usual definition of Unix time with UTC. So you do not need to convert a time expression in UTC to obtain the corresponding Unix time.
The Unix time is independent of the place where you are. You can check the current timestamp at time.is. Assume that you have a friend in Japan. You give him a phone call and ask him to open the website and to read out the number he sees. Then you get the exactly same number you are seeing.
Now we can safely understand the following triad.
If 2 of 3 vertices are given, the rest is uniquely determined. Here "YmdHMS" is a time expression of form "%Y-%m-%d %H:%M:%S" (such as "2018-08-20 12:34:56").
Convert time expressions
There are several possibilities to convert time expressions. In this article we focus only on Python and R.
What do we have to consider is two properties:
- if we can parse both a Unix time and an ordinary time expression with timezone (such as ISO-8601 format) and
- if we can express the given time in both a Unix time and an ordinary time expression with a timezone.
Here are sample questions which we consider:
- Find the Unix time corresponding to the time expression
2018-08-01T05:43:21-0400
. - Express the Unix time
1533116601
in a human readable format with timezone inAmerica/Toronto
.
Note that the above two timestamps are the same.
Python: time
Documentation.
This is the most simple library for time in python. It uses C library.
Its (only) main class is time.struct_time
.
The library is awkward to deal with a timezone except UTC and local timezone.
import time
iso_str = "2018-08-01T05:43:21-0400"
the_time = time.strptime(iso_str, "%Y-%m-%dT%H:%M:%S%z")
time.strftime("%Y-%m-%d %H:%M:%S %Z %z", the_time) ## 2018-08-01 05:43:21
time.mktime(the_time) ## 1533095001.0 ## wrong
The last value is wrong. The returned Unix time is equivalent to
2018-08-01 05:43:21+02:00
. That is, the timezone is just ignored and
the local timezone is used.
To use the correct timezone (America/Toronto
) we have to modify the class
attribute.
os.environ["TZ"] = "America/Toronto"
time.tzset()
the_time = time.strptime(iso_str, "%Y-%m-%dT%H:%M:%S%z")
time.strftime("%Y-%m-%d %H:%M:%S %Z %z", the_time) # 2018-08-01 05:43:21
time.mktime(the_time)) ## 1533116601.0 ## correct
We get the correct Unix time, but the timezone in time.strftime
is still
ignored. Moreover we have to change Class attribute to deal with the given
timezone.
In my opinion this standard library is useless unless you use only either UTC or local time.
Python: datetime
The datetime
module is one of standard libraries and provides several
classes. (Documentation)
We use only datetime.datetime
in this entry.
from datetime import datetime
iso_str = "2018-08-01T05:43:21-0400"
the_time = datetime.strptime(iso_str, "%Y-%m-%dT%H:%M:%S%z")
the_time.isoformat(sep=" ") ## 2018-08-01 05:43:21-04:00
the_time.timestamp() ## 1533116601.0 ## correct
the_time.tzname() ## UTC-04:00
We can parse a time expression in the ISO format without any problem. The
constructor strptime
can recognise properly the timezone.
NB: The only difficulty is that the timezone (%z
) can not contain a colon.
That is -04:00
causes a ValueError
, nevertheless isoformat()
returns
the timezone with a colon.
But if you parse a Unix time, we have to be careful about the behaviour which is related to the timezone.
unix_time = 1533116601
the_time1 = datetime.utcfromtimestamp(unix_time)
the_time1.isoformat(sep=" ") ## 2018-08-01 09:43:21
the_time2 = datetime.fromtimestamp(unix_time)
the_time2.isoformat(sep=" ") ## 2018-08-01 11:43:21
utcfromtimestamp()
gives the UTC datetime and fromtimestamp()
gives
the local datetime. It seems to be good.
But
the_time1.timestamp() ## 1533109401.0 ???
the_time2.timestamp() ## 1533116601.0
The documentation says that utcfromtimestamp()
and fromtimestamp()
return a UTC datetime and a local time respectively, nevertheless they add
no timezone information to the instances. (Namely tzinfo
is None
.)
A datetime object without tzinfo
behaves as a local time. Therefore
the_time1.timestamp()
returns the Unix time of 2018-08-01 09:43:21+02:00
instead of the original one.
To avoid this kind of ambiguity, we should always give a timezone/UTC offset
(and not use utcfromtimestamp()
).
from datetime import timezone, timedelta
the_time1 = datetime.fromtimestamp(unix_time, tz=timezone.utc)
the_time1.isoformat(sep=" ") ## 2018-08-01 09:43:21+00:00
the_time1.timestamp() ## 1533116601.0
the_tz = timezone(timedelta(hours=-4)) ## UTC-4
the_time2 = datetime.fromtimestamp(unix_time, tz=the_tz)
the_time2.isoformat(sep=" ") ## 2018-08-01 05:43:21-04:00
the_time2.timestamp() ## 1533116601.0
If we want to give an ordinary timezone instead of a UTC offset in order to
deal with DST properly, we should use pytz.
An instance of pytz
just modifies a datetime instance.
import pytz
tz_toronto = pytz.timezone("America/Toronto")
the_time2 = datetime.fromtimestamp(unix_time, tz=tz_toronto)
the_time2.isoformat(sep=" ") ## 2018-08-01 05:43:21-04:00
the_time2.tzinfo ## America/Toronto
the_time2.timestamp() ## 1533116601.0
When you add a timezone information to an datetime instance without tzinfo,
use localize()
method and do not use astimezone()
.
tz_japan = pytz.timezone("Asia/Tokyo")
iso_str_jst = "2018-08-01T18:43:21"
time_no_tz = datetime.strptime(iso_str_jst, "%Y-%m-%dT%H:%M:%S")
time_no_tz.astimezone(tz_japan) ## 2018-08-02 01:43:21+09:00 ## wrong
time_jp = tz_japan.localize(time_no_tz)
time_jp ## 2018-08-01 18:43:21+09:00 ## correct
time_jp.astimezone(tz_toronto) ## 2018-08-01 05:43:21-04:00 ## correct
Here 2018-08-01T18:43:21+0900
is equivalent to the iso_str
(i.e. 2018-08-01 05:43:21-04:00
). It is definitely easy for you to
understand why time_no_tz.astimezone(tz_japan)
gives the wrong answer.
That is because time_no_tz
behaves as a local time. After we add the
correct timezone, we get the right time expression by astimezone()
.
Python: pandas
pandas
provides some useful methods for datetime.
iso_str = "2018-08-01T05:43:21-04:00"
dt = pd.to_datetime(iso_str)
dt ## 2018-08-01 09:43:21 ## converted in UTC
unix_time = 1533116601
ds = pd.to_datetime(unix_time, unit="s")
ds ## 2018-08-01 09:43:21 ## in UTC
There are three advantages of the function to_datetime().
- We do not need to specify the format of the string. The function guesses automatically the format and applies it.
- We can use the same function to a Unix time by adding
unit="s"
option. - The function accepts a list or a pandas.Series and converts the values element-wisely.
There are two disadvantages. One is that the function can be slow, because it guesses the format. Another is that the function converts the time expression in UTC but does not add the timezone information. Therefore we have to add the proper timezone manually.
jst.localize(dt) ## 2018-08-01 09:43:21+09:00 ## wrong
dt.replace(tzinfo=timezone.utc)\
.astimezone(jst) ## 2018-08-01 18:43:21+09:00 ## correct
The second one is the right way to add the timezone. This is because
localize()
method add the timezone information without changing
any values except timezone. Since we have already the time expression
in UTC without the timezone, we have to add the correct timezone to the
time expression and then convert it in the timezone which you need.
Instead of replace(tzinfo=timezone.utc)
we may use
tz_localize(timezone.utc)
.
Many methods of datetime objects are also available for pandas.Timestamp. Thus it is easy to get the correspondence Unix time.
dt.timestamp() # 1533116601.0
R: POSIXct
A POSIXct
instance consists of a Unix time (+ alpha).
(Manual).
iso_str <- "2018-08-01T05:43:21-0400"
dt1 <- as.POSIXct(iso_str, format="%Y-%m-%dT%H:%M:%S%z")
dt1 ## "2018-08-01 11:43:21 CEST"
attr(dt1, "tzone") ## ""
Here dt1
is a POSIXct
instance. Note that the instance display the time
in local timezone. While as.POSIXct
can parse the time zone properly, it
adds no timezone information to the variable. To add the timezone we have
to give it in the tz
-Option.
dt2 <- as.POSIXct(iso_str, format="%Y-%m-%dT%H:%M:%S%z", tz="America/Toronto")
dt2 ## "2018-08-01 05:43:21 EDT"
attr(dt2, "tzone") ## "America/Toronto"
NB: EDT = Eastern Daylight Time.
It is easy to get the corresponding Unix time.
as.integer(dt2) ## 1533116601
We can also convert a Unix time into an ordinary time expression. But we have to give the origin of the Unix time.
unix_time <- 1533116601
dt3 <- as.POSIXct(unix_time, origin="1970-01-01", tz="America/Toronto")
dt3 ## "2018-08-01 05:43:21 EDT"
To convert the time expression in a different timezone it suffices to modify
the tzone
attribute.
dt_jp <- dt3 ## POSIXct instance in EDT
attr(dt_jp, "tzone") <- "Asia/Tokyo"
dt_jp ## "2018-08-01 18:43:21 JST"
But to convert the timezone we should use lubridate::with_tz
.
lubridate is a library providing
useful functions for POSIXct
instances and currently belongs to
tidyverse.
with_tz()
converts the timezone of the given time.
dt <- parse_date_time(iso_str, "YmdHMSz", tz="America/Toronto")
dt ## POSIXct object "2018-08-01 05:43:21 EDT"
with_tz(dt, tzone="Asia/Tokyo") ## "2018-08-01 18:43:21 JST"
force_tz(dt, tzone="Asia/Tokyo") ## "2018-08-01 05:43:21 JST"
Because with_tz()
accepts a vector, we can modify many POSIXct objects
at the same time.
R: POSIXlt
There is another class which can express a time in R: POSIXlt
. But we
skip this class. There are two reasons.
- I do not know how to deal with a timezone in the class. No relevant documents can be found.
dplyr
does not supportPOSIXlt
.
There is no reason to use the class.
NB. lubridate::fast_strptime
returns a POSIXlt
object.
Use parse_date_time()
instead.
Summary
- The Unix time has no timezone. It is independent of the place where you are.
- The Unix time, YmdHMS and UTC offset. 2 of them determine the rest.
- In Python we should use datetime + pytz or pandas + pytz.
- In R we should stick with POSIXct (+ lubridate).
As a best practice we should always give a timezone explicitly and should not rely on the default behavior of the library.