@@ -11,8 +11,8 @@ with R's ``factor``.
11
11
12
12
`Categoricals ` are a pandas data type corresponding to categorical variables in
13
13
statistics. A categorical variable takes on a limited, and usually fixed,
14
- number of possible values (`categories `; `levels ` in R). Examples are gender,
15
- social class, blood type, country affiliation, observation time or rating via
14
+ number of possible values (`categories `; `levels ` in R). Examples are gender,
15
+ social class, blood type, country affiliation, observation time or rating via
16
16
Likert scales.
17
17
18
18
In contrast to statistical categorical variables, categorical data might have an order (e.g.
@@ -133,7 +133,7 @@ This conversion is likewise done column by column:
133
133
Controlling Behavior
134
134
~~~~~~~~~~~~~~~~~~~~
135
135
136
- In the examples above where we passed ``dtype='category' ``, we used the default
136
+ In the examples above where we passed ``dtype='category' ``, we used the default
137
137
behavior:
138
138
139
139
1. Categories are inferred from the data.
@@ -170,8 +170,8 @@ are consistent among all columns.
170
170
categories for each column, the ``categories `` parameter can be determined programmatically by
171
171
``categories = pd.unique(df.to_numpy().ravel()) ``.
172
172
173
- If you already have ``codes `` and ``categories ``, you can use the
174
- :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
173
+ If you already have ``codes `` and ``categories ``, you can use the
174
+ :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
175
175
during normal constructor mode:
176
176
177
177
.. ipython :: python
@@ -184,7 +184,7 @@ during normal constructor mode:
184
184
Regaining Original Data
185
185
~~~~~~~~~~~~~~~~~~~~~~~
186
186
187
- To get back to the original ``Series `` or NumPy array, use
187
+ To get back to the original ``Series `` or NumPy array, use
188
188
``Series.astype(original_dtype) `` or ``np.asarray(categorical) ``:
189
189
190
190
.. ipython :: python
@@ -222,7 +222,7 @@ This information can be stored in a :class:`~pandas.api.types.CategoricalDtype`.
222
222
The ``categories `` argument is optional, which implies that the actual categories
223
223
should be inferred from whatever is present in the data when the
224
224
:class: `pandas.Categorical ` is created. The categories are assumed to be unordered
225
- by default.
225
+ by default.
226
226
227
227
.. ipython :: python
228
228
@@ -277,7 +277,7 @@ All instances of ``CategoricalDtype`` compare equal to the string ``'category'``
277
277
Description
278
278
-----------
279
279
280
- Using :meth: `~DataFrame.describe ` on categorical data will produce similar
280
+ Using :meth: `~DataFrame.describe ` on categorical data will produce similar
281
281
output to a ``Series `` or ``DataFrame `` of type ``string ``.
282
282
283
283
.. ipython :: python
@@ -292,9 +292,9 @@ output to a ``Series`` or ``DataFrame`` of type ``string``.
292
292
Working with categories
293
293
-----------------------
294
294
295
- Categorical data has a `categories ` and a `ordered ` property, which list their
296
- possible values and whether the ordering matters or not. These properties are
297
- exposed as ``s.cat.categories `` and ``s.cat.ordered ``. If you don't manually
295
+ Categorical data has a `categories ` and a `ordered ` property, which list their
296
+ possible values and whether the ordering matters or not. These properties are
297
+ exposed as ``s.cat.categories `` and ``s.cat.ordered ``. If you don't manually
298
298
specify categories and ordering, they are inferred from the passed arguments.
299
299
300
300
.. ipython :: python
@@ -314,7 +314,7 @@ It's also possible to pass in the categories in a specific order:
314
314
315
315
.. note ::
316
316
317
- New categorical data are **not ** automatically ordered. You must explicitly
317
+ New categorical data are **not ** automatically ordered. You must explicitly
318
318
pass ``ordered=True `` to indicate an ordered ``Categorical ``.
319
319
320
320
@@ -338,8 +338,8 @@ It's also possible to pass in the categories in a specific order:
338
338
Renaming categories
339
339
~~~~~~~~~~~~~~~~~~~
340
340
341
- Renaming categories is done by assigning new values to the
342
- ``Series.cat.categories `` property or by using the
341
+ Renaming categories is done by assigning new values to the
342
+ ``Series.cat.categories `` property or by using the
343
343
:meth: `~pandas.Categorical.rename_categories ` method:
344
344
345
345
@@ -385,7 +385,7 @@ Categories must also not be ``NaN`` or a `ValueError` is raised:
385
385
Appending new categories
386
386
~~~~~~~~~~~~~~~~~~~~~~~~
387
387
388
- Appending categories can be done by using the
388
+ Appending categories can be done by using the
389
389
:meth: `~pandas.Categorical.add_categories ` method:
390
390
391
391
.. ipython :: python
@@ -397,8 +397,8 @@ Appending categories can be done by using the
397
397
Removing categories
398
398
~~~~~~~~~~~~~~~~~~~
399
399
400
- Removing categories can be done by using the
401
- :meth: `~pandas.Categorical.remove_categories ` method. Values which are removed
400
+ Removing categories can be done by using the
401
+ :meth: `~pandas.Categorical.remove_categories ` method. Values which are removed
402
402
are replaced by ``np.nan ``.:
403
403
404
404
.. ipython :: python
@@ -421,8 +421,8 @@ Removing unused categories can also be done:
421
421
Setting categories
422
422
~~~~~~~~~~~~~~~~~~
423
423
424
- If you want to do remove and add new categories in one step (which has some
425
- speed advantage), or simply set the categories to a predefined scale,
424
+ If you want to do remove and add new categories in one step (which has some
425
+ speed advantage), or simply set the categories to a predefined scale,
426
426
use :meth: `~pandas.Categorical.set_categories `.
427
427
428
428
@@ -618,10 +618,10 @@ When you compare two unordered categoricals with the same categories, the order
618
618
Operations
619
619
----------
620
620
621
- Apart from :meth: `Series.min `, :meth: `Series.max ` and :meth: `Series.mode `, the
621
+ Apart from :meth: `Series.min `, :meth: `Series.max ` and :meth: `Series.mode `, the
622
622
following operations are possible with categorical data:
623
623
624
- ``Series `` methods like :meth: `Series.value_counts ` will use all categories,
624
+ ``Series `` methods like :meth: `Series.value_counts ` will use all categories,
625
625
even if some categories are not present in the data:
626
626
627
627
.. ipython :: python
@@ -666,7 +666,7 @@ that only values already in `categories` can be assigned.
666
666
Getting
667
667
~~~~~~~
668
668
669
- If the slicing operation returns either a ``DataFrame `` or a column of type
669
+ If the slicing operation returns either a ``DataFrame `` or a column of type
670
670
``Series ``, the ``category `` dtype is preserved.
671
671
672
672
.. ipython :: python
@@ -681,7 +681,7 @@ If the slicing operation returns either a ``DataFrame`` or a column of type
681
681
df.loc[" h" :" j" , " cats" ]
682
682
df[df[" cats" ] == " b" ]
683
683
684
- An example where the category type is not preserved is if you take one single
684
+ An example where the category type is not preserved is if you take one single
685
685
row: the resulting ``Series `` is of dtype ``object ``:
686
686
687
687
.. ipython :: python
@@ -702,7 +702,7 @@ of length "1".
702
702
The is in contrast to R's `factor ` function, where ``factor(c(1,2,3))[1] ``
703
703
returns a single value `factor `.
704
704
705
- To get a single value ``Series `` of type ``category ``, you pass in a list with
705
+ To get a single value ``Series `` of type ``category ``, you pass in a list with
706
706
a single value:
707
707
708
708
.. ipython :: python
@@ -756,7 +756,7 @@ That means, that the returned values from methods and properties on the accessor
756
756
Setting
757
757
~~~~~~~
758
758
759
- Setting values in a categorical column (or ``Series ``) works as long as the
759
+ Setting values in a categorical column (or ``Series ``) works as long as the
760
760
value is included in the `categories `:
761
761
762
762
.. ipython :: python
@@ -836,9 +836,9 @@ Unioning
836
836
837
837
.. versionadded :: 0.19.0
838
838
839
- If you want to combine categoricals that do not necessarily have the same
839
+ If you want to combine categoricals that do not necessarily have the same
840
840
categories, the :func: `~pandas.api.types.union_categoricals ` function will
841
- combine a list-like of categoricals. The new categories will be the union of
841
+ combine a list-like of categoricals. The new categories will be the union of
842
842
the categories being combined.
843
843
844
844
.. ipython :: python
@@ -887,8 +887,8 @@ using the ``ignore_ordered=True`` argument.
887
887
b = pd.Categorical([" c" , " b" , " a" ], ordered = True )
888
888
union_categoricals([a, b], ignore_order = True )
889
889
890
- :func: `~pandas.api.types.union_categoricals ` also works with a
891
- ``CategoricalIndex ``, or ``Series `` containing categorical data, but note that
890
+ :func: `~pandas.api.types.union_categoricals ` also works with a
891
+ ``CategoricalIndex ``, or ``Series `` containing categorical data, but note that
892
892
the resulting array will always be a plain ``Categorical ``:
893
893
894
894
.. ipython :: python
@@ -1179,8 +1179,8 @@ Setting the index will create a ``CategoricalIndex``:
1179
1179
Side Effects
1180
1180
~~~~~~~~~~~~
1181
1181
1182
- Constructing a ``Series `` from a ``Categorical `` will not copy the input
1183
- ``Categorical ``. This means that changes to the ``Series `` will in most cases
1182
+ Constructing a ``Series `` from a ``Categorical `` will not copy the input
1183
+ ``Categorical ``. This means that changes to the ``Series `` will in most cases
1184
1184
change the original ``Categorical ``:
1185
1185
1186
1186
.. ipython :: python
0 commit comments