Unix time and conversion of time expression

When we deal with time difference or comparison, we have to care about time zone and timestamp. For example the main database stores times in unix time, while another transaction system records only local time. Such a kind of difference of time expression often happens especially when the systems are placed in different countries.

In this entry we discuss the conversion between Unix time and timestamp with or without time zone and libraries for the conversion. The aim of this entry is to understand how to convert a time expression into another.

Terminology

Timestamp

A timestamp is basically any expression of a time. But in some context timestamp means especially Unix time, which we see later.

UTC

UTC is basically the same as the British local time without DST.

Unix time

The unix time is the number of seconds from 1970-01-01 00:00:00 UTC.

Timezone

To be rigorous, a time zone is an area, where the same local time expression is used. We often use a text such as Europe/Berlin or Asia/Tokyo to describe a time zone. You can find the list of the timezone Here. (Or if you have installed R, then the function OlsonNames() gives a vector of timezones.)

But a timezone can also be an offset from UTC. For example JST (Japanese Standard Time) is 9 hour earlier than UTC. We describe it as UTC+09:00. Thus Asia/Tokyo is equivalent to UTC+09:00. A timezone can have two UTC offsets: DST. The timezone Europe/Berlin can be equivalent to UTC+01:00 (CET) or UTC+02:00 (CEST).

Local time

The timezone is assumed to be the timezone where you are. The timezone is often omitted, because it is "obvious".

Daylight Saving Time

Daylight Saving Time (DST) is a change of the UTC offset. On the last Sunday of March we switch the UTC offset from UTC+1 to UTC+2 (in Germany).

Therefore "2 o'clock" seems to disappear. (If you follow the gray part of the diagram, that is what happens at the end of DST.)

Note we need a geographic timezone (such as "Europe/Berlin") in order to deal with DST in a proper way.

Unix time and a timezone

As we said above, the time expression without time zone has ambiguity. But if we specify a timezone, its expression of time is unique. For example, 2018-01-23 12:34:56 UTC+1. Because of the unique expression of time, we can convert it in a different timezone. For example the above timestamp is equivalent to 2018-01-23 08:34:56 UTC-3.

If you understand this fact, then you can easily understand the following fact as well. Unix time has no timezone.

Even though UTC is used in the definition of unix time, the value of Unix time is completely independent of the timezone. The timezone is used just to make the time expression unique. You may also define Unix time as

the number of seconds from 1970-01-01 09:00:00 UTC+9.

This definition is completely equivalent to the usual definition of Unix time with UTC. So you do not need to convert a time expression in UTC to obtain the corresponding Unix time.

Epoch time has no timezone.

The Unix time is independent of the place where you are. You can check the current timestamp at time.is. Assume that you have a friend in Japan. You give him a phone call and ask him to open the website and to read out the number he sees. Then you get the exactly same number you are seeing.

Now we can safely understand the following triad.

triad

If 2 of 3 vertices are given, the rest is uniquely determined. Here "YmdHMS" is a time expression of form "%Y-%m-%d %H:%M:%S" (such as "2018-08-20 12:34:56").

Convert time expressions

There are several possibilities to convert time expressions. In this article we focus only on Python and R.

What do we have to consider is two properties:

  • if we can parse both a Unix time and an ordinary time expression with timezone (such as ISO-8601 format) and
  • if we can express the given time in both a Unix time and an ordinary time expression with a timezone.

Here are sample questions which we consider:

  1. Find the Unix time corresponding to the time expression 2018-08-01T05:43:21-0400.
  2. Express the Unix time 1533116601 in a human readable format with timezone in America/Toronto.

Note that the above two timestamps are the same.

Python: time

Documentation. This is the most simple library for time in python. It uses C library. Its (only) main class is time.struct_time.

The library is awkward to deal with a timezone except UTC and local timezone.

import time

iso_str = "2018-08-01T05:43:21-0400"
the_time = time.strptime(iso_str, "%Y-%m-%dT%H:%M:%S%z")
time.strftime("%Y-%m-%d %H:%M:%S %Z %z", the_time) ## 2018-08-01 05:43:21
time.mktime(the_time) ## 1533095001.0 ## wrong

The last value is wrong. The returned Unix time is equivalent to 2018-08-01 05:43:21+02:00. That is, the timezone is just ignored and the local timezone is used.

To use the correct timezone (America/Toronto) we have to modify the class attribute.

os.environ["TZ"] = "America/Toronto"
time.tzset()
the_time = time.strptime(iso_str, "%Y-%m-%dT%H:%M:%S%z")
time.strftime("%Y-%m-%d %H:%M:%S %Z %z", the_time) # 2018-08-01 05:43:21
time.mktime(the_time)) ## 1533116601.0 ## correct

We get the correct Unix time, but the timezone in time.strftime is still ignored. Moreover we have to change Class attribute to deal with the given timezone.

In my opinion this standard library is useless unless you use only either UTC or local time.

Python: datetime

The datetime module is one of standard libraries and provides several classes. (Documentation) We use only datetime.datetime in this entry.

from datetime import datetime

iso_str = "2018-08-01T05:43:21-0400"
the_time = datetime.strptime(iso_str, "%Y-%m-%dT%H:%M:%S%z")
the_time.isoformat(sep=" ") ## 2018-08-01 05:43:21-04:00
the_time.timestamp() ## 1533116601.0 ## correct
the_time.tzname() ## UTC-04:00

We can parse a time expression in the ISO format without any problem. The constructor strptime can recognise properly the timezone.

NB: The only difficulty is that the timezone (%z) can not contain a colon. That is -04:00 causes a ValueError, nevertheless isoformat() returns the timezone with a colon.

But if you parse a Unix time, we have to be careful about the behaviour which is related to the timezone.

unix_time = 1533116601
the_time1 = datetime.utcfromtimestamp(unix_time)
the_time1.isoformat(sep=" ") ## 2018-08-01 09:43:21
the_time2 = datetime.fromtimestamp(unix_time)
the_time2.isoformat(sep=" ") ## 2018-08-01 11:43:21

utcfromtimestamp() gives the UTC datetime and fromtimestamp() gives the local datetime. It seems to be good.

But

the_time1.timestamp() ## 1533109401.0 ???
the_time2.timestamp() ## 1533116601.0

The documentation says that utcfromtimestamp() and fromtimestamp() return a UTC datetime and a local time respectively, nevertheless they add no timezone information to the instances. (Namely tzinfo is None.) A datetime object without tzinfo behaves as a local time. Therefore the_time1.timestamp() returns the Unix time of 2018-08-01 09:43:21+02:00 instead of the original one.

To avoid this kind of ambiguity, we should always give a timezone/UTC offset (and not use utcfromtimestamp()).

from datetime import timezone, timedelta

the_time1 = datetime.fromtimestamp(unix_time, tz=timezone.utc)
the_time1.isoformat(sep=" ") ## 2018-08-01 09:43:21+00:00
the_time1.timestamp() ## 1533116601.0

the_tz = timezone(timedelta(hours=-4)) ## UTC-4
the_time2 = datetime.fromtimestamp(unix_time, tz=the_tz)
the_time2.isoformat(sep=" ") ## 2018-08-01 05:43:21-04:00
the_time2.timestamp() ## 1533116601.0

If we want to give an ordinary timezone instead of a UTC offset in order to deal with DST properly, we should use pytz. An instance of pytz just modifies a datetime instance.

import pytz

tz_toronto = pytz.timezone("America/Toronto")
the_time2 = datetime.fromtimestamp(unix_time, tz=tz_toronto)
the_time2.isoformat(sep=" ") ## 2018-08-01 05:43:21-04:00
the_time2.tzinfo ## America/Toronto
the_time2.timestamp() ## 1533116601.0

When you add a timezone information to an datetime instance without tzinfo, use localize() method and do not use astimezone().

tz_japan = pytz.timezone("Asia/Tokyo")

iso_str_jst = "2018-08-01T18:43:21"
time_no_tz = datetime.strptime(iso_str_jst, "%Y-%m-%dT%H:%M:%S")

time_no_tz.astimezone(tz_japan) ## 2018-08-02 01:43:21+09:00 ## wrong

time_jp = tz_japan.localize(time_no_tz)
time_jp ## 2018-08-01 18:43:21+09:00 ## correct
time_jp.astimezone(tz_toronto) ## 2018-08-01 05:43:21-04:00 ## correct

Here 2018-08-01T18:43:21+0900 is equivalent to the iso_str (i.e. 2018-08-01 05:43:21-04:00). It is definitely easy for you to understand why time_no_tz.astimezone(tz_japan) gives the wrong answer. That is because time_no_tz behaves as a local time. After we add the correct timezone, we get the right time expression by astimezone().

Python: pandas

pandas provides some useful methods for datetime.

iso_str = "2018-08-01T05:43:21-04:00"
dt = pd.to_datetime(iso_str)
dt ## 2018-08-01 09:43:21 ## converted in UTC

unix_time = 1533116601
ds = pd.to_datetime(unix_time, unit="s")
ds ## 2018-08-01 09:43:21 ## in UTC

There are three advantages of the function to_datetime().

  1. We do not need to specify the format of the string. The function guesses automatically the format and applies it.
  2. We can use the same function to a Unix time by adding unit="s" option.
  3. The function accepts a list or a pandas.Series and converts the values element-wisely.

There are two disadvantages. One is that the function can be slow, because it guesses the format. Another is that the function converts the time expression in UTC but does not add the timezone information. Therefore we have to add the proper timezone manually.

jst.localize(dt)   ## 2018-08-01 09:43:21+09:00 ## wrong
dt.replace(tzinfo=timezone.utc)\
  .astimezone(jst) ## 2018-08-01 18:43:21+09:00 ## correct

The second one is the right way to add the timezone. This is because localize() method add the timezone information without changing any values except timezone. Since we have already the time expression in UTC without the timezone, we have to add the correct timezone to the time expression and then convert it in the timezone which you need.

Instead of replace(tzinfo=timezone.utc) we may use tz_localize(timezone.utc).

Many methods of datetime objects are also available for pandas.Timestamp. Thus it is easy to get the correspondence Unix time.

dt.timestamp() # 1533116601.0

R: POSIXct

A POSIXct instance consists of a Unix time (+ alpha). (Manual).

iso_str <- "2018-08-01T05:43:21-0400"
dt1 <- as.POSIXct(iso_str, format="%Y-%m-%dT%H:%M:%S%z")
dt1 ## "2018-08-01 11:43:21 CEST"
attr(dt1, "tzone") ## ""

Here dt1 is a POSIXct instance. Note that the instance display the time in local timezone. While as.POSIXct can parse the time zone properly, it adds no timezone information to the variable. To add the timezone we have to give it in the tz-Option.

dt2 <- as.POSIXct(iso_str, format="%Y-%m-%dT%H:%M:%S%z", tz="America/Toronto")
dt2 ## "2018-08-01 05:43:21 EDT"
attr(dt2, "tzone") ## "America/Toronto"

NB: EDT = Eastern Daylight Time.

It is easy to get the corresponding Unix time.

as.integer(dt2) ## 1533116601

We can also convert a Unix time into an ordinary time expression. But we have to give the origin of the Unix time.

unix_time <- 1533116601
dt3 <- as.POSIXct(unix_time, origin="1970-01-01", tz="America/Toronto")
dt3 ## "2018-08-01 05:43:21 EDT"

To convert the time expression in a different timezone it suffices to modify the tzone attribute.

dt_jp <- dt3 ## POSIXct instance in EDT
attr(dt_jp, "tzone") <- "Asia/Tokyo"
dt_jp ## "2018-08-01 18:43:21 JST"

But to convert the timezone we should use lubridate::with_tz.

lubridate is a library providing useful functions for POSIXct instances and currently belongs to tidyverse.

with_tz() converts the timezone of the given time.

dt <- parse_date_time(iso_str, "YmdHMSz", tz="America/Toronto")
dt               ## POSIXct object  "2018-08-01 05:43:21 EDT" 
with_tz(dt, tzone="Asia/Tokyo")  ## "2018-08-01 18:43:21 JST"
force_tz(dt, tzone="Asia/Tokyo") ## "2018-08-01 05:43:21 JST"   

Because with_tz() accepts a vector, we can modify many POSIXct objects at the same time.

R: POSIXlt

There is another class which can express a time in R: POSIXlt. But we skip this class. There are two reasons.

  • I do not know how to deal with a timezone in the class. No relevant documents can be found.
  • dplyr does not support POSIXlt.

There is no reason to use the class.

NB. lubridate::fast_strptime returns a POSIXlt object. Use parse_date_time() instead.

Summary

  • The Unix time has no timezone. It is independent of the place where you are.
  • The Unix time, YmdHMS and UTC offset. 2 of them determine the rest.
  • In Python we should use datetime + pytz or pandas + pytz.
  • In R we should stick with POSIXct (+ lubridate).

As a best practice we should always give a timezone explicitly and should not rely on the default behavior of the library.

Share this page on        
Categories: #data-mining