Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
msgpack: support tzindex in datetime
Support non-zero tzindex in datetime extended type. If both tzoffset and
tzindex are specified, tzindex is prior (same as in Tarantool [1]).

Use `tz` parameter to set up timezone name:

```
dt = tarantool.Datetime(year=2022, month=8, day=31,
                        hour=18, minute=7, sec=54,
                        nsec=308543321, tz='Europe/Moscow')
```

You may use `tz` property to get timezone name of a datetime object.

pytz is used to build timezone info. Tarantool index to Olson name
map and inverted one are built with gen_timezones.sh script based on
tarantool/go-tarantool script [2]. All Tarantool unique and alias
timezones present in pytz.all_timezones list. Only the following
abbreviated timezones from Tarantool presents in pytz.all_timezones
(version 2022.2.1):
- CET
- EET
- EST
- GMT
- HST
- MST
- UTC
- WET

pytz does not natively support work with abbreviated timezones due to
its possibly ambiguous nature [3-5]. Tarantool itself do not support
work with ambiguous abbreviated timezones:

```
Tarantool 2.10.1-0-g482d91c66

tarantool> datetime.new({tz = 'BST'})
---
- error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone'
...
```

If ambiguous timezone is specified, the exception is raised.

Tarantool header timezones.h [6] provides a map for all abbreviated
timezones with category info (all ambiguous timezones are marked with
TZ_AMBIGUOUS flag) and offset info. We parse this info to build
pytz.FixedOffset() timezone for each Tarantool abbreviated timezone not
supported natively by pytz.

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/
2. https://github.com/tarantool/go-tarantool/blob/5801dc6f5ce69db7c8bc0c0d0fe4fb6042d5ecbc/datetime/gen-timezones.sh
3. https://stackoverflow.com/questions/37109945/how-to-use-abbreviated-timezone-namepst-ist-in-pytz
4. https://stackoverflow.com/questions/27531718/datetime-timezone-conversion-using-pytz
5. https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset
6. https://github.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h

Closes #204
  • Loading branch information
DifferentialOrange committed Sep 26, 2022
commit aa7302a39c31b1fd69052c36af82eadf25aaa0f6
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
You may use `tzoffset` property to get timezone offset of a datetime
object.

- Timezone in datetime type support (#204).

Use `tz` parameter to set up timezone name:

```python
dt = tarantool.Datetime(year=2022, month=8, day=31,
hour=18, minute=7, sec=54,
nsec=308543321, tz='Europe/Moscow')
```

If both `tz` and `tzoffset` is specified, `tz` is used.

You may use `tz` property to get timezone name of a datetime object.

### Changed
- Bump msgpack requirement to 1.0.4 (PR #223).
The only reason of this bump is various vulnerability fixes,
Expand Down
60 changes: 49 additions & 11 deletions tarantool/msgpack_ext/types/datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
import pandas
import pytz

import tarantool.msgpack_ext.types.timezones as tt_timezones
from tarantool.error import MsgpackError

# https://www.tarantool.io/en/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
#
# The datetime MessagePack representation looks like this:
Expand Down Expand Up @@ -63,6 +66,17 @@ def compute_offset(timestamp):
# There is no precision loss since offset is in minutes
return int(utc_offset.total_seconds()) // SEC_IN_MIN

def get_python_tzinfo(tz, error_class):
if tz in pytz.all_timezones:
return pytz.timezone(tz)

# Checked with timezones/validate_timezones.py
tt_tzinfo = tt_timezones.timezoneAbbrevInfo[tz]
if (tt_tzinfo['category'] & tt_timezones.TZ_AMBIGUOUS) != 0:
raise error_class(f'Failed to create datetime with ambiguous timezone "{tz}"')

return pytz.FixedOffset(tt_tzinfo['offset'])

def msgpack_decode(data):
cursor = 0
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)
Expand All @@ -84,23 +98,29 @@ def msgpack_decode(data):
datetime = pandas.to_datetime(total_nsec, unit='ns')

if tzindex != 0:
raise NotImplementedError
if tzindex not in tt_timezones.indexToTimezone:
raise MsgpackError(f'Failed to decode datetime with unknown tzindex "{tzindex}"')
tz = tt_timezones.indexToTimezone[tzindex]
tzinfo = get_python_tzinfo(tz, MsgpackError)
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo), tz
elif tzoffset != 0:
tzinfo = pytz.FixedOffset(tzoffset)
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo)
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo), ''
else:
return datetime
return datetime, ''

class Datetime():
def __init__(self, data=None, *, timestamp=None, year=None, month=None,
day=None, hour=None, minute=None, sec=None, nsec=None,
tzoffset=0):
tzoffset=0, tz=''):
if data is not None:
if not isinstance(data, bytes):
raise ValueError('data argument (first positional argument) ' +
'expected to be a "bytes" instance')

self._datetime = msgpack_decode(data)
datetime, tz = msgpack_decode(data)
self._datetime = datetime
self._tz = tz
return

# The logic is same as in Tarantool, refer to datetime API.
Expand Down Expand Up @@ -133,11 +153,20 @@ def __init__(self, data=None, *, timestamp=None, year=None, month=None,
microsecond=microsecond,
nanosecond=nanosecond)

if tzoffset != 0:
tzinfo = pytz.FixedOffset(tzoffset)
datetime = datetime.replace(tzinfo=tzinfo)
if tz != '':
if tz not in tt_timezones.timezoneToIndex:
raise ValueError(f'Unknown Tarantool timezone "{tz}"')

self._datetime = datetime
tzinfo = get_python_tzinfo(tz, ValueError)
self._datetime = datetime.replace(tzinfo=tzinfo)
self._tz = tz
elif tzoffset != 0:
tzinfo = pytz.FixedOffset(tzoffset)
self._datetime = datetime.replace(tzinfo=tzinfo)
self._tz = ''
else:
self._datetime = datetime
self._tz = ''

def __eq__(self, other):
if isinstance(other, Datetime):
Expand All @@ -151,7 +180,7 @@ def __str__(self):
return self._datetime.__str__()

def __repr__(self):
return f'datetime: {self._datetime.__repr__()}'
return f'datetime: {self._datetime.__repr__()}, tz: "{self.tz}"'

def __copy__(self):
cls = self.__class__
Expand Down Expand Up @@ -206,6 +235,10 @@ def tzoffset(self):
return compute_offset(self._datetime)
return 0

@property
def tz(self):
return self._tz

@property
def value(self):
return self._datetime.value
Expand All @@ -214,7 +247,12 @@ def msgpack_encode(self):
seconds = self.value // NSEC_IN_SEC
nsec = self.nsec
tzoffset = self.tzoffset
tzindex = 0

tz = self.tz
if tz != '':
tzindex = tt_timezones.timezoneToIndex[tz]
else:
tzindex = 0

buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)

Expand Down
9 changes: 9 additions & 0 deletions tarantool/msgpack_ext/types/timezones/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from tarantool.msgpack_ext.types.timezones.timezones import (
TZ_AMBIGUOUS,
indexToTimezone,
timezoneToIndex,
timezoneAbbrevInfo,
)

__all__ = ['TZ_AMBIGUOUS', 'indexToTimezone', 'timezoneToIndex',
'timezoneAbbrevInfo']
69 changes: 69 additions & 0 deletions tarantool/msgpack_ext/types/timezones/gen-timezones.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/usr/bin/env bash
set -xeuo pipefail

SRC_COMMIT="9ee45289e01232b8df1413efea11db170ae3b3b4"
SRC_FILE=timezones.h
DST_FILE=timezones.py

[ -e ${SRC_FILE} ] && rm ${SRC_FILE}
wget -O ${SRC_FILE} \
https://raw.githubusercontent.com/tarantool/tarantool/${SRC_COMMIT}/src/lib/tzcode/timezones.h

# We don't need aliases in indexToTimezone because Tarantool always replace it:
#
# tarantool> T = date.parse '2022-01-01T00:00 Pacific/Enderbury'
# ---
# ...
# tarantool> T
# ---
# - 2022-01-01T00:00:00 Pacific/Kanton
# ...
#
# So we can do the same and don't worry, be happy.

cat <<EOF > ${DST_FILE}
# Automatically generated by gen-timezones.sh

TZ_UTC = 0x01
TZ_RFC = 0x02
TZ_MILITARY = 0x04
TZ_AMBIGUOUS = 0x08
TZ_NYI = 0x10
TZ_OLSON = 0x20
TZ_ALIAS = 0x40
TZ_DST = 0x80

indexToTimezone = {
EOF

grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $1, $3)}' >> ${DST_FILE}
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $1, $2)}' >> ${DST_FILE}

cat <<EOF >> ${DST_FILE}
}

timezoneToIndex = {
EOF

grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $3, $1)}' >> ${DST_FILE}
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
grep ZONE_ALIAS ${SRC_FILE} | sed "s/ZONE_ALIAS( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}

cat <<EOF >> ${DST_FILE}
}

timezoneAbbrevInfo = {
EOF

grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
| awk '{printf("\t%s : {\"offset\" : %d, \"category\" : %s},\n", $3, $2, $4)}' >> ${DST_FILE}
echo "}" >> ${DST_FILE}

rm timezones.h

python validate_timezones.py
Loading