Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dataclass function idempotent. Fix for issue #92052 #92406

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hsolbrig
Copy link

@hsolbrig hsolbrig commented May 6, 2022

While it is already possible to determine whether a class
or one of its ancestors has been processed by the dataclass
function, there is no way to determine whether the first
argument, _cls, has. Because of this, dataclass(dataclass(C)) is not valid. Third party libraries
(e.g. pydantic) need invoke the dataclass wrapper if it
has not already been invoked
before adding their on functions. This patch:
a) Provides a mechanism, is_direct_datacclass, to determine whether
dataclass has been applied to a given class
b) Uses this mechanism to make dataclass idempotent.

While it is already possible to determine whether a class
or one of its ancestors has been processed by the `dataclass`
function, there is no way to determine whether the first
argument, `_cls`, has.  Because of this, `dataclass(dataclass(C))` is not valid.  Third party libraries
(e.g. pydantic) need invoke the dataclass wrapper _if it
has not already been invoked_ before adding their on functions.  This patch:
a) Provides a mechanism, `is_direct_datacclass`, to determine whether
`dataclass` has been applied to a given class
b) Uses this mechanism to make dataclass idempotent.
@hsolbrig hsolbrig requested a review from ericvsmith as a code owner May 6, 2022
@cpython-cla-bot
Copy link

@cpython-cla-bot cpython-cla-bot bot commented May 6, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented May 6, 2022

Every change to Python requires a NEWS entry.

Please, add it using the blurb_it Web app or the blurb command-line tool.

@ericvsmith
Copy link
Member

@ericvsmith ericvsmith commented May 6, 2022

Can't you just check for if cls.__dict__.get(_FIELDS) is not None? That's what I've done in https://github.com/ericvsmith/cpython/tree/dataclass-only-once

@hsolbrig
Copy link
Author

@hsolbrig hsolbrig commented May 8, 2022

What we (pydantic in particular, but in general) need to be able to differentiate is the following:

# Base class 
@dataclass
class Foo:
      int i = 1
      int j = field(default_factory=list)
      int k = 2

# Wrapping an inherited data class.
class Bar(Foo):
      int m = 3

Foo1 = dataclass(Foo)       # re-wrapping Foo -- must either be disallowed -or- rendered idempotent
Bar1 = dataclass(Bar)       # New dataclass -- variable `m` needs to be processed

# We need to determine whether the passed class has already been processed.  is_dataclass (i.e. the presence of the
# _FIELDS property) can't do this
dataclass.is_dataclass(Foo)  == True
dataclass.is_dataclass(Bar) == True

# This patch gives us the ability to do this
dataclass.is_direct_dataclass(Foo) == True
dataclass.is_direct_dataclass(Bar) == False

The reason that this matters is because libraries such as pydantic are currently designed to handle:

from pydantic import mywrapper               # Note: name changed for clarity

# Base class 
@dataclass
class Foo:
      int i = 1
      int j = field(default_factory=list)
      int k = 2

# Pydantic wrapper will return:
#    mywrapper(dataclass(Bar1))
@mywrapper
class Bar1:
     int m = 3

# Pydantic wrapper will return
#     mywrapper(Foo)
Bar2 = mywrapper(Foo)

It had earlier been suggested that dataclass(dataclass(Foo)) might be considered illegal. IMO, you will still need
something similar to the suggested patch to do this, as how else can you differentiate the Foo (either disallowed or
idempotent) from Bar (legitmate dataclass inheritence)?

@hsolbrig
Copy link
Author

@hsolbrig hsolbrig commented May 8, 2022

As an aside -- I haven't been able to get the unit tests to run on my machine. Would be happy to supply examples and unit tests if someone can offer advice on how to get the test environment running. This is my current return from make:

Undefined symbols for architecture arm64:
  "_libintl_bindtextdomain", referenced from:
      __locale_bindtextdomain in _localemodule.o
  "_libintl_dcgettext", referenced from:
      __locale_dcgettext in _localemodule.o
  "_libintl_dgettext", referenced from:
      __locale_dgettext in _localemodule.o
  "_libintl_gettext", referenced from:
      __locale_gettext in _localemodule.o
  "_libintl_setlocale", referenced from:
      __locale_setlocale in _localemodule.o
      __locale_localeconv in _localemodule.o
  "_libintl_textdomain", referenced from:
      __locale_textdomain in _localemodule.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Programs/_freeze_module] Error 1

(I'm developing on a Mac Mini w/ an M1 chip.

@ericvsmith
Copy link
Member

@ericvsmith ericvsmith commented May 8, 2022

As I said, you can use cls.__dict__.get(_FIELDS) is not None:

>>> @dataclass
... class Foo:
...   x: int
...
>>> class Bar:
...   y: int
...
>>> Foo.__dict__.get("__dataclass_fields__") is not None
True
>>> Bar.__dict__.get("__dataclass_fields__") is not None
False

Admittedly this should be done somewhere inside the dataclasses module, not from user code.

What if dataclass() raised an AlreadyDataclassError exception if it were called on a class that was already a dataclass? AlreadyDataclassError would be a subclass of TypeError. That way you could at least tell what was going on.

My problem with calling dataclass(dataclass(Foo)) is that unless you check that all of the dataclass parameters are the same, it can lead to surprising results. dataclass(eq=False)(dataclass(eq=True)(Foo)), for example. I'd prefer for this to just be an error. Then you could write a wrapper like make_sure_this_is_a_dataclass(cls) that would try to call dataclass(cls), and if it raised AlreadyDataclassError, then just return the class.

I don't know if you'd want to pass in init, repr, etc. to make_sure_this_is_a_dataclass, but you could do whatever you want.

I'll probably change my patch for issue #92231 to raise AlreadyDataclassError.

@hsolbrig
Copy link
Author

@hsolbrig hsolbrig commented May 8, 2022

@ericvsmith --

Try:

class Bar(Foo):
    y: int

We still need to process y, but they both have dataclass_fields

@ericvsmith
Copy link
Member

@ericvsmith ericvsmith commented May 8, 2022

Oops, that's what the previous example was supposed to be:

>>> from dataclasses import *
>>> @dataclass
... class Foo:
...   x: int
...
>>> class Bar(Foo):
...   y: int
...
>>> Foo.__dict__.get("__dataclass_fields__") is not None
True
>>> Bar.__dict__.get("__dataclass_fields__") is not None
False

@hsolbrig
Copy link
Author

@hsolbrig hsolbrig commented May 9, 2022

@ericvsmith

Ah-ha! I learn something new about python every day. I hadn't grasped the difference between:

Bar.__dict__.get("__dataclass_fields__")    
# --and--
Bar.__dataclass_fields__

Because I was using an IDE, I just examined both Foo and Bar and concluded that the property is in Bar without
catching the __dict__ nuance. You are absolutely correct -- I don't all the clevernance of the set I proposed above, but I still need the is_direct_dataclass method, correct?

@ericvsmith
Copy link
Member

@ericvsmith ericvsmith commented May 9, 2022

but I still need the is_direct_dataclass method, correct?

No, not if my comment above about AlreadyDataclassError will solve your problem.

@hsolbrig
Copy link
Author

@hsolbrig hsolbrig commented May 9, 2022

No problem with the error -- your call on that one. I (speaking for pydantic) would still want to avoid triggering the error, which means I would want to invoke Foo.__dict__.get("__dataclass_fields__") is not None from my library. I was figuring that the method was probably safer than embedding dunder text, no?

@ericvsmith
Copy link
Member

@ericvsmith ericvsmith commented May 9, 2022

Can't you catch and ignore the exception, rather than the look-before-you-leap version? That would be the supported version, rather than poking around in the internals.

That said, the AlreadyDataclassError exception won't make it in to 3.11, so you'd probably have to use Foo.__dict__.get("__dataclass_fields__") for the time being, anyway.

@hsolbrig
Copy link
Author

@hsolbrig hsolbrig commented May 9, 2022

Will do (w/ the obvious substitution of _FIELDS). So, should I revise this PR to suggest the addition of the function or ...?

@hsolbrig
Copy link
Author

@hsolbrig hsolbrig commented May 9, 2022

Wrt the catcha-and-ignore exception -- I've always shyed away from that sort of solution as a wee bit opaque, but I'll leave it up to @samuelcolvin to decide. Thanks again!

Also - we wouldn't have that one available until 3.11 anyway, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants