Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Altair incompatible with Modin #5438

Closed
3 tasks done
labanyamukhopadhyay opened this issue Dec 14, 2022 · 2 comments
Closed
3 tasks done

BUG: Altair incompatible with Modin #5438

labanyamukhopadhyay opened this issue Dec 14, 2022 · 2 comments
Labels
bug 🦗 Something isn't working Integration ➕➕ Issues with integrating Modin into other libraries P2 Minor bugs or low-priority feature requests

Comments

@labanyamukhopadhyay
Copy link
Contributor

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import modin.pandas as pd
import altair as alt

source = pd.DataFrame({
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

alt.Chart(source).mark_bar().encode(
    x='a',
    y='b'
).interactive()

Issue Description

Altair.Chart() requires data to be a pandas.DataFrame.

Expected Behavior

Screen Shot 2022-12-14 at 11 34 19 AM

Error Logs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/vegalite/v4/api.py:2020, in Chart.to_dict(self, *args, **kwargs)
   2018     copy.data = core.InlineData(values=[{}])
   2019     return super(Chart, copy).to_dict(*args, **kwargs)
-> 2020 return super().to_dict(*args, **kwargs)

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/vegalite/v4/api.py:384, in TopLevelMixin.to_dict(self, *args, **kwargs)
    381 kwargs["context"] = context
    383 try:
--> 384     dct = super(TopLevelMixin, copy).to_dict(*args, **kwargs)
    385 except jsonschema.ValidationError:
    386     dct = None

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/utils/schemapi.py:326, in SchemaBase.to_dict(self, validate, ignore, context)
    324     result = _todict(self._args[0], validate=sub_validate, context=context)
    325 elif not self._args:
--> 326     result = _todict(
    327         {k: v for k, v in self._kwds.items() if k not in ignore},
    328         validate=sub_validate,
    329         context=context,
    330     )
    331 else:
    332     raise ValueError(
    333         "{} instance has both a value and properties : "
    334         "cannot serialize to dict".format(self.__class__)
    335     )

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/utils/schemapi.py:60, in _todict(obj, validate, context)
     58     return [_todict(v, validate, context) for v in obj]
     59 elif isinstance(obj, dict):
---> 60     return {
     61         k: _todict(v, validate, context)
     62         for k, v in obj.items()
     63         if v is not Undefined
     64     }
     65 elif hasattr(obj, "to_dict"):
     66     return obj.to_dict()

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/utils/schemapi.py:61, in <dictcomp>(.0)
     58     return [_todict(v, validate, context) for v in obj]
     59 elif isinstance(obj, dict):
     60     return {
---> 61         k: _todict(v, validate, context)
     62         for k, v in obj.items()
     63         if v is not Undefined
     64     }
     65 elif hasattr(obj, "to_dict"):
     66     return obj.to_dict()

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/utils/schemapi.py:56, in _todict(obj, validate, context)
     54 """Convert an object to a dict representation."""
     55 if isinstance(obj, SchemaBase):
---> 56     return obj.to_dict(validate=validate, context=context)
     57 elif isinstance(obj, (list, tuple, np.ndarray)):
     58     return [_todict(v, validate, context) for v in obj]

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/utils/schemapi.py:326, in SchemaBase.to_dict(self, validate, ignore, context)
    324     result = _todict(self._args[0], validate=sub_validate, context=context)
    325 elif not self._args:
--> 326     result = _todict(
    327         {k: v for k, v in self._kwds.items() if k not in ignore},
    328         validate=sub_validate,
    329         context=context,
    330     )
    331 else:
    332     raise ValueError(
    333         "{} instance has both a value and properties : "
    334         "cannot serialize to dict".format(self.__class__)
    335     )

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/utils/schemapi.py:60, in _todict(obj, validate, context)
     58     return [_todict(v, validate, context) for v in obj]
     59 elif isinstance(obj, dict):
---> 60     return {
     61         k: _todict(v, validate, context)
     62         for k, v in obj.items()
     63         if v is not Undefined
     64     }
     65 elif hasattr(obj, "to_dict"):
     66     return obj.to_dict()

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/utils/schemapi.py:61, in <dictcomp>(.0)
     58     return [_todict(v, validate, context) for v in obj]
     59 elif isinstance(obj, dict):
     60     return {
---> 61         k: _todict(v, validate, context)
     62         for k, v in obj.items()
     63         if v is not Undefined
     64     }
     65 elif hasattr(obj, "to_dict"):
     66     return obj.to_dict()

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/utils/schemapi.py:56, in _todict(obj, validate, context)
     54 """Convert an object to a dict representation."""
     55 if isinstance(obj, SchemaBase):
---> 56     return obj.to_dict(validate=validate, context=context)
     57 elif isinstance(obj, (list, tuple, np.ndarray)):
     58     return [_todict(v, validate, context) for v in obj]

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/altair/vegalite/v4/schema/channels.py:44, in FieldChannelMixin.to_dict(self, validate, ignore, context)
     40             raise ValueError("{} encoding field is specified without a type; "
     41                              "the type cannot be inferred because it does not "
     42                              "match any column in the data.".format(shorthand))
     43         else:
---> 44             raise ValueError("{} encoding field is specified without a type; "
     45                              "the type cannot be automatically inferred because "
     46                              "the data is not specified as a pandas.DataFrame."
     47                              "".format(shorthand))
     48 else:
     49     # Shorthand is not a string; we pass the definition to field,
     50     # and do not do any parsing.
     51     parsed = {'field': shorthand}

ValueError: a encoding field is specified without a type; the type cannot be automatically inferred because the data is not specified as a pandas.DataFrame.

Installed Versions

INSTALLED VERSIONS

commit : 4114183
python : 3.9.7.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

Modin dependencies

modin : 0.18.0+3.g4114183f
ray : 2.0.1
dask : 2022.11.1
distributed : 2022.11.1
hdk : None

pandas dependencies

pandas : 1.5.2
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 57.4.0
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.11.0
gcsfs : None
matplotlib : 3.6.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : None
pyxlsb : None
s3fs : 2022.11.0
scipy : 1.9.3
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@labanyamukhopadhyay labanyamukhopadhyay added bug 🦗 Something isn't working Triage 🩹 Issues that need triage labels Dec 14, 2022
@anmyachev anmyachev added Integration ➕➕ Issues with integrating Modin into other libraries and removed Triage 🩹 Issues that need triage labels Dec 15, 2022
@mvashishtha mvashishtha added the P2 Minor bugs or low-priority feature requests label Apr 12, 2023
@sugizo
Copy link

sugizo commented Jul 27, 2023

steps on google colab
pip install -U modin pyarrow sqlalchemy matplotlib seaborn altair plotly

the code works well on pandas

eco = 'eco'
count = 'count'
top = 10

if result == 'all':
    groupby = df[(df[eco].notnull() ) & (df[eco] != '') ].groupby([eco] ).size().reset_index(name = count)
else:
    groupby = df[(df[eco].notnull() ) & (df[eco] != '') ].query(query).groupby([eco] ).size().reset_index(name = count)

sort_groupby = groupby.sort_values([count, eco], ascending = [False, True] ).head(top)

but not in modin that have an error occured

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/altair/vegalite/v5/api.py](https://localhost:8080/#) in to_dict(self, *args, **kwargs)
   2518             copy.data = core.InlineData(values=[{}])
   2519             return super(Chart, copy).to_dict(*args, **kwargs)
-> 2520         return super().to_dict(*args, **kwargs)
   2521 
   2522     def add_params(self, *params) -> Self:

13 frames
[/usr/local/lib/python3.10/dist-packages/pyarrow/interchange/from_dataframe.py](https://localhost:8080/#) in validity_buffer_from_mask(validity_buff, validity_dtype, describe_null, length, offset, allow_copy)
    436     null_kind, sentinel_val = describe_null
    437     validity_kind, _, _, _ = validity_dtype
--> 438     assert validity_kind == DtypeKind.BOOL
    439 
    440     if null_kind == ColumnNullType.NON_NULLABLE:

AssertionError:
alt.Chart(...)

best regards

@labanyamukhopadhyay
Copy link
Contributor Author

This issue is resolved via altair adopting the dataframe interchange protocol and thanks to #6523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working Integration ➕➕ Issues with integrating Modin into other libraries P2 Minor bugs or low-priority feature requests
Projects
None yet
Development

No branches or pull requests

4 participants