Skip to content

Conversation

@dimitri-yatsenko
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko commented Jul 23, 2018

@coveralls
Copy link

coveralls commented Jul 23, 2018

Coverage Status

Coverage decreased (-1.1%) to 88.22% when pulling 64c9e94 on dimitri-yatsenko:master into 49a93a7 on datajoint:master.

@ixcat ixcat self-requested a review July 24, 2018 18:53
Copy link

@ixcat ixcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good from last week -
Did a change req unti we confirm the connection default to utf8 was the desired change.

* Sped up queries (#458)
* Bugfix in restriction of the form (A & B) * B (#463)
* Improved error messages (#466)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably mention #468 - also:

Did we want to default the connection encoding to utf8 or latin1?

Utf8 means latin1 installs could potentially require adding a new setting with the release
(not sure if this is 100% compatible or 99.9% compatible and what the .01% would be if not)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point - could you tell us a bit more on what negative consequence might arise if one uses utf8 connection on a latin1 install of MySQL?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think this is probably one of the safer mismatches, However, I'm not sure of the precise impacts and it would probably take some time to investigate.

From what I could tell casually, it shouldn't cause any issues for the 'ascii range' (e.g. first 128 characters), but could potentially result in issues in the higher range of latin1 (https://en.wikipedia.org/wiki/ISO/IEC_8859-1, see characters > 128 value.).

Best set of general answers I found were here:
https://stackoverflow.com/questions/7048745/what-is-the-difference-between-utf-8-and-iso-8859-1/39109074

Of note, it looks like UTF8 for the characters in 128-255 range of latin1 is treated as multibyte; That is, the same 'glyph' would be translated for storage as 2 byte in the utf-8 scheme, but as a single byte in the 8859-1 (latin1) scheme. Where this translation occurs when configuring mysql connection and how it would relate to what is stored in a table, I'd need to test. (the thinking behind my previous not sure if this is 100% compatible or 99.9% compatible and what the .01% would be if not statement.)

my thought in making the initial comment was to assume there to be at least some limited impact (based on casual research as outlined above this appears true), and then treat the discussion here more as one of 'how to manage the presumed impact'; of course we can do tests and be more definitive for the nominal case of utf8 vs latin1 for the ascii range, or perhaps a bit beyond; that said, I'm not confident given our current knowledge of these topics that such tests should be relied upon (aka some other corner case could be missed & give false confidence in the decision).

Conversely, if we treat anything non-ascii as 'previously unsupported', the change should be fine; that said, no need to force a configuration setting if it is not required. Another option would be to default to setting nothing unless a setting is specified.

here = path.abspath(path.dirname(__file__))

long_description = "An object-relational mapping and relational algebra to facilitate data definition and data manipulation in MySQL databases."
long_description = "A relational data framework for scientific data pipelines with MySQL backend."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good :)

Copy link
Contributor

@eywalker eywalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss what's an appropriate default for the encoding and make a release. I don't think make_module_code should be part of this release.

FROM information_schema.tables WHERE table_schema='{db}'
""".format(db=self.database)).fetchone()[0])

def make_module_code(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for? Addition of an entirely new feature like this should not be done as part of a patch increment. I'd much rather you add this to the next release.

:return: the python module
"""
mod = ModuleType(modulename)
s = schema(dbname, mod.__dict__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while at this, can we make a few changes?

  1. rename create_tables to create_schema in the Schema class - the name create_tables actually doesn't make sense as this really controls whether the schema is going to be created in the database or not
  2. get rid of the mod.__dict__ - that will make this schema instance behave the same way as any one defined like schema = dj.schema('schema_name').

Copy link
Member Author

@dimitri-yatsenko dimitri-yatsenko Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. create_tables is already widely used and renaming it would cause problems. Usually, the user means to prevent the creation of new tables when the database schema already exists. I think it makes senes to keep things as they are now.
  2. I do not believe that omitting mod.__dict__ will do what you describe. We are creating a module here, not a schema.

I believe the current code is correct as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After an off-line discussion, we agreed that create_if_missing will replace create_tables but in a future release.

mod = ModuleType(modulename)
s = schema(dbname, mod.__dict__)
s.spawn_missing_classes()
s = schema(dbname, create_tables=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be create_tables=False? Why not allow someone to use this to define new tables with this schema object? I prefer that we just leave it like how a typical schema object from other module would look like, which is create_tables=True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point and yes, I agree. However, let's make it configurable by adding a create_if_missing argument to create_virtual_module. Let's do it with the next feature release when we rename create_tables throughout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think create_if_missing should default to False for in create_virtual_module to avoid excessive schema creation due to simple mispellings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I agree that making it configurable would be best. But for now, shall we just remove the create_tables? I want the default to be create_tables=True (that is in future, create_if_missing=True to be the default).

Copy link
Contributor

@eywalker eywalker Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now thinking that there should be separate flags for create_schema and create_tables. I can see your point of wanting to avoid creating a schema if it doesn't already exist when using create_virtual_module, but I would still want to use it to create additional tables in the schema. So I think default should be create_schema=False, create_tables=True for create_virtual_module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should add create_schema to Schema.__init__ too, defaulting to True.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we are going to keep create_tables as is, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with adding this change to this PR since it does not affect existing functionality.

Copy link
Contributor

@eywalker eywalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor question about defaults - but otherwise looks great.



def create_virtual_module(modulename, dbname):
def create_virtual_module(module_name, schema_name, create_schema=False, create_tables=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the reason to have create_tables default to False? I totally agree with making create_schema=False default but don't see strong reason to prevent usage of schema in defining additional tables in the schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t have an opinion either way. This use case is rare and I think should be discouraged in favor of the more common approach using the dj.schema. Users can explicitly set create_tables=True if they do intend to add new tables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the idea is to treat the dj.create_virtual_module to be by default a "read-only" schema module? I guess that could make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright have a couple more minutes of thought I don't have a strong reason against this default "read-only" behavior for virtual modules, especially given that user does have an option to set create tables to true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge then and make a release.

@ixcat ixcat merged commit b13ffda into datajoint:master Aug 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants