Storing Date/Times in Databases

After my talk during ConFoo on Advanced Date/Time Handling I received a question about whether the UTC-offset, together with the date/time in years, months, days, hours, minutes and seconds, was enough for storing a date/time in a database and still being able to do calculations with this. The answer to this question was no, but it lead to an even more interesting discussion about what would be enough to store an accurate date/time in a database.

Firstly let me explain why storing a UTC offset is not adequate at all for doing any sort of calculations. Right now (March 25th, 2010), Montreal is on the same time as Santiago (in Chile) with both a UTC offset of -4 hours. This means that for the same date/time (2010-03-25 19:03) the same timestamp is generated (1269558180). However if we want to add eight months (2010-11-25 19:03) to the current time for each of those locations then the timestamps should be different — 1290729780 for Montreal and 1290722580 for Santiago. It turns out that neither of them is exactly a full 24 hour difference from 2010-03-25 19:03 as is shown the following script:

<?php
$s = '2010-03-25 19:03';
$montrealNow = date_create( "{$s} America/Montreal" )->format( 'U' );
$santiagoNow = date_create( "{$s} America/Santiago" )->format( 'U' );
$s = '2010-11-25 19:03';
$montrealThen = date_create( "{$s} America/Montreal" )->format( 'U' );
$santiagoThen = date_create( "{$s} America/Santiago" )->format( 'U' );

$hour = 60 * 60;
$day = 24 * $hour;

echo "Montreal:\n",
    $montrealNow, " - ",
    $montrealThen, "\n",
    ( ( $montrealThen - $montrealNow ) % $day ) / $hour, "\n\n";

echo "Santiago:\n",
    $santiagoNow, " - ",
    $santiagoThen, "\n",
    ( ( $santiagoThen - $santiagoNow ) % $day ) / $hour, "\n\n";

?>

which gives the output:

Montreal:
1269558180 - 1290729780
1

Santiago:
1269558180 - 1290722580
23

From this we see that Montreal has an extra hour and Santiago has an hour less. This is because Montreal changed from Daylight Savings Time to normal time and Santiago, being on the southern hemisphere, moved from normal time to Daylight Savings time.

Now if we had only the timestamp (1269558180) and the UTC-offset, the only thing we could do would be to advance a specific number of days. In this case, there are 245 days, which makes (245 * 24 * 60 * 60 = ) 21168000 seconds. If we add this to the original timestamp, we end up at 1290726180 which corresponds to 2010-11-25 18:03 Montreal time, or 2010-11-25 20:03 Santiago time. Neither of them being the correct 2010-11-25 19:03. Because from the UTC offset of 4 hours we don't know whether we're in Montreal or Santiago, we can conclude that with just this UTC offset and the original timestamp we can't calculate the timestamp of something that's 8 months in the future. Storing the UTC-offset can not change this fact either.

In order to to proper calculations, you need to keep information about the timezone itself. In PHP timezones are identified with identifiers such as America/Montreal. If we store those alongside the timestamps, we can do the proper calculations. The following example demonstrates that:

<?php
$timestamp = 1269558180;
$tzid      = 'America/Montreal';

$d = new DateTime( "@$timestamp" );
$d->setTimeZone( new DateTimeZone( $tzid ) );
$d->modify( '+8 months' );

echo $d->format( 'Y-m-d H:i' ), "\n";
?>

which gives the output: 2010-11-25 19:03.

This seems to work, but unfortunately, even the approach of storing the timestamp and timezone identifier is not going to work correctly under certain circumstances. In order to find out why, we have to go back to January this year.

Imagine that in January this year—January 15th to be precise—we run the following script to determine the timestamp of a date/time two months in the future:

<?php
$tz = 'America/Santiago';
$ts = date_create( "2010-01-15 10:41 $tz" )->format( 'U' );

$d = new DateTime( "@$ts" );
$d->setTimeZone( new DateTimeZone( $tz ) );
$d->modify( '+2 months' );

echo $d->format( 'U Y-m-d H:i' ), "\n";
?>

This returns the timestamp 1268664060 for the date/time 2010-03-15 10:41:00.

Now skip forwards to the current date and time. If we used this calculated timestamp and the timezone identifier America/Santiago today to generate a date/time string, we would however get a different output. The following example shows this:

<?php
$tz = 'America/Santiago';
$d = date_create( "@1268664060" );
$d->setTimeZone( new DateTimeZone( $tz ) );

echo $d->format( 'Y-m-d H:i:s' ), "\n";
?>

which gives the output: 2010-03-15 11:41:00 (and not 2010-03-15 10:41:00).

The difference in output is not a bug, but is caused because some countries change Daylight Savings Time rules quite frequently. In this case Chile decided (on March 4th) that instead of going forwards to DST at midnight March 14th, they will delay that to April 4th. When we ran the code on January 15th, the rules still thought that March 15th would already be on normal time again, outside of DST. But running the code now, with the updated rule set, we find that DST is still in effect until April 4th. The following example shows the transitions from/to DST for Santiago in 2010:

<?php
$tzid = "America/Santiago";

$tz = new DateTimeZone( $tzid );
$transitions = $tz->getTransitions(
    strtotime( '2010-01-01 00:00 UTC' ),
    strtotime( '2010-12-31 00:00 UTC' )
);

foreach ( $transitions as $t )
{
    echo $t['time'], ' ', $t['offset'] / 3600, ' ', $t['abbr'], "\n";
}
?>

If we run this script with PHP 5.3.1 or 5.3.2—which still have the old incorrect rule set—the output is:

2010-01-01T00:00:00+0000 -3 CLST
2010-03-14T03:00:00+0000 -4 CLT
2010-10-10T04:00:00+0000 -3 CLST

In the SVN repository, the rule set has been updated, so snapshots of PHP 5.3-dev have the correct rules and the script will show:

2010-01-01T00:00:00+0000 -3 CLST
2010-04-04T03:00:00+0000 -4 CLT
2010-10-10T04:00:00+0000 -3 CLST

This leads to the conclusion that storing timestamps and timezone identifiers is not good enough either, unless you want an exact point in time, as opposed to the more expected date/time in a location. So how should you store the latter then? Basically, in the same way that DateTime objects are serialized in PHP 5.3. Let us imagine again, that the following code is run on January 15th again:

<?php
$tzid = 'America/Santiago';
$d = new DateTime( "2010-01-15 10:41 $tzid" );
$d->modify( '+2 months' );
$s = $d->format( 'Y-m-d H:i:s' );
$ts = $d->format( 'U' );
echo "$s (ts=$ts)\n";
?>

which gives the output 2010-03-15 10:41:00 (ts=1268664060). The next example is run with a recent rule set (such as in PHP SVN) today:

<?php
$tzid = 'America/Santiago';
$d = new DateTime( "$s $tzid" );
$s = $d->format( 'Y-m-d H:i:s' );
$ts = $d->format( 'U' );
echo "$s (ts=$ts)\n";
?>

which gives the output 2010-03-15 10:41:00 (ts=1268660460). As you can see, now the date/time itself is correct, although there is a different timestamp. PHP's DateTime serialisation does something similar:

<?php
$d = new DateTime();
$d->setTimeZone( new DateTimeZone( 'America/Santiago' ) );
var_dump( $d );
?>

which gives the output:

object(DateTime)#1 (3) {
  ["date"]=>
  string(19) "2010-03-29 11:32:08"
  ["timezone_type"]=>
  int(3)
  ["timezone"]=>
  string(16) "America/Santiago"
}

For things to work correctly, you need to have an up-to-date rule set. PHP versions are not released as often as the timezonedb (sometimes more than 20 times a year) and to address this issue you can install the pecl extension timezonedb with pecl install timezonedb.

To store information in a database, I would use a char column-type to store the whole America/Santiago timezone identifier and another char column-type to store the date/time (in the yyyy-mm-dd hh:ii:ss format). Alternatively, you can pick a 'datetime' column-type, as long as that type ignores timezones altogether. For MySQL that is the DATETIME column-type, and for PostgreSQL the TIMESTAMP or TIMESTAMP WITHOUT TIME ZONE column-types.

Databases rarely handle timezones, daylight savings time and rule changes correctly, so avoid the database specific functionality all together. Using either a char or a timezone-less 'datetime' column-type would still allow you to sort and by using a 'datetime' column-type you can even do calculations.

Happy summer time!

Shortlink

This article has a short URL available: http://drck.me/sdid-7tf

Comments

Will there be a way in the future to specify custom DateTimeZone objects?

I need a mapping between PHP's DateTime/DateTimeZone and iCalendar (see RFC 5545#3.6.5). iCalendar doesn't specify a set of strings matching specific timezones, but it does list all the rules which should allow me to map back to a PHP timezone.

@Evert: I have not really thought about that. I just had a quick look at the RFC that you pointed at and saw that most of their example have the timezone identifier:

BEGIN:VTIMEZONE
TZID:America/New_York   <--
LAST-MODIFIED:20050809T050000Z
BEGIN:STANDARD
DTSTART:20071104T020000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
END:STANDARD

The other rules seem to be just an expression of the Olson database, the same one that PHP uses. All the rules are exposed through the DateTimeZone::getTransitions method so it would be possible to write your own "guesser" algorithm.

Oh the joy of timezones, it's enough to throw them all away and make everyone use UTC.

One thing that I've seen done for webapps is to provide only the UTC datetime and then use JavaScript to transform it into the local time. The main assumption there is that the web browser is set to the correct local time, which I think is a reasonable thing.

If you intend to share the stored date-time across timezones, for example because you have users from different timezones, you should additionaly convert the date-time to UTC before storing it. This makes sure that your users are actually seeing the same date (as in "moment"), independent of their timezone.

The TZID value unfortunately seems to be non-standard for some implementations, so it can't be relied on. Guessing it with your suggestion sounds like a good starting point though.

Sorry for the off-topicness.

Building on what Joseph said. What about always storing one timezone (same as system) then in the views do the conversions for that user's timezone. That would save the storage for the timezone column in each row.

Does that sound like a viable approach?

Thanks, Jason

In MySQL, make sure to <a href="http://stackoverflow.com/questions/1646171/mysql-datetime-fields-and-daylight-savings-time-how-do-i-reference-the-extra/2541647#2541647">set the connection session to UTC</a> or MySQL will assume you're sending dates in the local timezone, resulting in an inaccessible hour per year.

We have tried to solve the problem several years ago, we ended up storing everything in UTC. Date and time are converted from users' timezone into UTC, stored, and converted back into users' timezone when rendered out. Reading above article makes me think whether this approach is good enough, I guess only tests will show. brb :)

It's not the timezones that are the problem. It's daylight savings (than you G. V. Hudson for that idiotic idea) that screws everything up. When the rest of the world will see the light (pun intended) I'll never know...

Where is the problem to save dates in postgres with column-type TIMESTAMP WITH TIME ZONE?

@Derick How portable is your solution? What if your app isn't the only consumer of the data?

On a semi-related note, it took me about 15 minutes to learn the meaning and usage of the "@" operator within the date_create() parameter. It probably would have been much, much longer if I hadn't stumbled across a comment (http://php.net/manual/en/function.date-create.php#76216) for date_create(), which references a PHP 'request' (http://bugs.php.net/bug.php?id=40171) which appears was handled you.

Is there any chance of getting this functionality officially documented? It seems like a fairly common use-case, and I can see myself pulling my hair out trying to debug a problem like this...

p.s. It appears that the link in my previous comment for bugs.php.net was munged; the text is correct, however the actual URL in the HTML is encoded (it has id%3D40171 instead of id=40171).

I always use MySQL 5 timestamp column type to store dates and on the other hand in my application every user could select a timezone in his/her profile. Timezones are in literal format instead of (-+)hour:minute. Beforehand I use the mysql_tzinfo_to_sql utility to load the time zone tables in mysql and do it every often (upon updated). So I'm able to use date/time functions in MysQL and set a session specific variable (SET time_zone = timezone; ) according to user's timezone and then all dates stored in "timestamp" columns are retrieved in user's local time.

I actually handle date/time calculations on the DB layer, do you think it might make problems later?

@Nima:I haven't had a lot of experience with it, but from the problems I've faced so far, the safest thing to do is handle all date/time calculations in one place (whether that be PHP or MySQL). Most of the problems I've had come from mixing the two. It's certainly easier to keep the timezone data up-to-date in MySQL though. Depending on your server admin, distro, etc. the timezone data might stay outdated for a couple of months or even years with PHP.

@David, as you mentioned the tzdata package in all linux distros updated regularly, so using that data leads to staying up-to-date. MySQL takes care DST if you use timezone in literal format. It may be DB dependent solution but as php used with MySQL quite often, I thought it's worth mentioning it here.

Daylight Saving* Time.

sigh - the wheel must be discovered by every generation.

http://en.wikipedia.org/wiki/Julian_day_number

@Mavenaut Julian days don't help (in fact Postgres uses Julian datetime's internally and still has all these problems.) It's no different than a UTC timestamp. It does nothing to help with any of these DST problems.

Add Comment

Name:
Email:

Will not be posted. Please leave empty instead of filling in garbage though!
Comment:

Please follow the reStructured Text format. Do not use the comment form to report issues in software, use the relevant issue tracker. I will not answer them here.


All comments are moderated

Life Line