Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename CodePointSet to CodePointInversionList #2230

Merged
merged 9 commits into from
Jul 26, 2022
8 changes: 8 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ Handy commands (run from the root directory):

See the [Testing](#testing) section below for more information on the various testsuites available.

There are various files that auto-generated across the ICU4X repository. Here are some of the commands that you may
need to run in order to recreate them. These files may be run in more comprehensive tests such as those included in `cargo make ci-job-test` or `cargo make ci-all`.

- `cargo make testdata` - regenerates all test data in the `provider/testdata` directory.
- `cargo make generate-readmes` - generates README files according to Rust docs. Output files must be committed in git for check to pass.
- `cargo make diplomat-gen` - recreates the Diplomat generated files in the `ffi/diplomat` directory.

### Testing

It's recommended to run `cargo test --all-features` in crates you're modifying to ensure that nothing is breaking, and `cargo quick` to get a reasonable check that everything still builds and lint checks pass.
Expand All @@ -59,6 +66,7 @@ Our wider testsuite is organized as `ci-job-foo` make tasks corresponding to eac
- `cargo make tidy`: A quick test that ensures that `cargo fmt` has been run, that code has the appropriate license headers and files and that READMEs are in sync. This is run as two separate tasks on CI (`ci-job-fmt` and `ci-job-tidy`) to ensure early results.
- `cargo make ci-job-test`: Runs `cargo test` on all the crates. This takes a while but is the main way of ensuring that nothing has been broken.
- `cargo make ci-job-clippy`: Runs `cargo clippy` on all the crates.
- `cargo doc --no-deps --all-features`: Recreates API docs locally; any warning should be fixed since it will be treated as an error in CI.
- `cargo make ci-job-ffi`: Runs all of the FFI tests; mostly important if you're changing the FFI interface. This has several additional dependencies:
+ Rust toolchain `nightly-2022-04-05`: `rustup install nightly-2022-04-05`
* `rust-src` for that toolchain: `rustup component add --toolchain nightly-2022-04-05 rust-src`
Expand Down
2 changes: 1 addition & 1 deletion components/collator/src/elements.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1711,7 +1711,7 @@ where
// Let's just set this flag here instead of trying to make
// it more granular and, therefore, more error-prone.
// After all, this flag is just about optimizing away one
// `CodePointSet` check in the common case.
// `CodePointInversionList` check in the common case.
may_have_contracted_starter = true;
debug_assert!(pending_removals.is_empty());
loop {
Expand Down
4 changes: 2 additions & 2 deletions components/icu/examples/tui.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ use icu::datetime::{
};
use icu::locid::{locale, Locale};
use icu::plurals::{PluralCategory, PluralRules};
use icu_uniset::CodePointSetBuilder;
use icu_uniset::CodePointInversionListBuilder;
use std::env;

fn print<T: AsRef<str>>(_input: T) {
Expand Down Expand Up @@ -65,7 +65,7 @@ fn main(_argc: isize, _argv: *const *const u8) -> isize {
}

{
let mut builder = CodePointSetBuilder::new();
let mut builder = CodePointInversionListBuilder::new();
// See http://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
builder.add_range(&('\u{0000}'..='\u{00FF}'));
let latin1_set = builder.build();
Expand Down
16 changes: 8 additions & 8 deletions components/properties/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,30 +6,30 @@ retrieving property data in an appropriate data structure.
This module is published as its own crate ([`icu_properties`](https://docs.rs/icu_properties/latest/icu_properties/))
and as part of the [`icu`](https://docs.rs/icu/latest/icu/) crate. See the latter for more details on the ICU4X project.

APIs that return a [`CodePointSet`] exist for binary properties and certain enumerated
APIs that return a [`CodePointSetData`] exist for binary properties and certain enumerated
properties. See the [`sets`] module for more details.

APIs that return a [`CodePointTrie`] exist for certain enumerated properties. See the
APIs that return a [`CodePointMapData`] exist for certain enumerated properties. See the
[`maps`] module for more details.

## Examples

### Property data as `CodePointSet`s
### Property data as `CodePointSetData`s

```rust
use icu::properties::{maps, sets, GeneralCategory};

let provider = icu_testdata::get_provider();

// A binary property as a `CodePointSet`
// A binary property as a `CodePointSetData`

let data = sets::get_emoji(&provider).expect("The data should be valid");
let emoji = data.as_borrowed();

assert!(emoji.contains('🎃')); // U+1F383 JACK-O-LANTERN
assert!(!emoji.contains('木')); // U+6728

// An individual enumerated property value as a `CodePointSet`
// An individual enumerated property value as a `CodePointSetData`

let data = maps::get_general_category(&provider).expect("The data should be valid");
let gc = data.as_borrowed();
Expand All @@ -39,7 +39,7 @@ assert!(line_sep.contains_u32(0x2028));
assert!(!line_sep.contains_u32(0x2029));
```

### Property data as `CodePointTrie`s
### Property data as `CodePointMapData`s

```rust
use icu::properties::{maps, Script};
Expand All @@ -55,8 +55,8 @@ assert_eq!(script.get('木'), Script::Han); // U+6728

[`ICU4X`]: ../icu/index.html
[Unicode Properties]: https://unicode-org.github.io/icu/userguide/strings/properties.html
[`CodePointSet`]: icu_uniset::CodePointSet
[`CodePointTrie`]: icu_codepointtrie::CodePointTrie
[`CodePointSetData`]: crate::sets::CodePointSetData
[`CodePointMapData`]: crate::maps::CodePointMapData
[`sets`]: crate::sets

## More Information
Expand Down
16 changes: 8 additions & 8 deletions components/properties/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,30 +8,30 @@
//! This module is published as its own crate ([`icu_properties`](https://docs.rs/icu_properties/latest/icu_properties/))
//! and as part of the [`icu`](https://docs.rs/icu/latest/icu/) crate. See the latter for more details on the ICU4X project.
//!
//! APIs that return a [`CodePointSet`] exist for binary properties and certain enumerated
//! APIs that return a [`CodePointSetData`] exist for binary properties and certain enumerated
//! properties. See the [`sets`] module for more details.
//!
//! APIs that return a [`CodePointTrie`] exist for certain enumerated properties. See the
//! APIs that return a [`CodePointMapData`] exist for certain enumerated properties. See the
//! [`maps`] module for more details.
//!
//! # Examples
//!
//! ## Property data as `CodePointSet`s
//! ## Property data as `CodePointSetData`s
//!
//! ```
//! use icu::properties::{maps, sets, GeneralCategory};
//!
//! let provider = icu_testdata::get_provider();
//!
//! // A binary property as a `CodePointSet`
//! // A binary property as a `CodePointSetData`
//!
//! let data = sets::get_emoji(&provider).expect("The data should be valid");
//! let emoji = data.as_borrowed();
//!
//! assert!(emoji.contains('🎃')); // U+1F383 JACK-O-LANTERN
//! assert!(!emoji.contains('木')); // U+6728
//!
//! // An individual enumerated property value as a `CodePointSet`
//! // An individual enumerated property value as a `CodePointSetData`
//!
//! let data = maps::get_general_category(&provider).expect("The data should be valid");
//! let gc = data.as_borrowed();
Expand All @@ -41,7 +41,7 @@
//! assert!(!line_sep.contains_u32(0x2029));
//! ```
//!
//! ## Property data as `CodePointTrie`s
//! ## Property data as `CodePointMapData`s
//!
//! ```
//! use icu::properties::{maps, Script};
Expand All @@ -57,8 +57,8 @@
//!
//! [`ICU4X`]: ../icu/index.html
//! [Unicode Properties]: https://unicode-org.github.io/icu/userguide/strings/properties.html
//! [`CodePointSet`]: icu_uniset::CodePointSet
//! [`CodePointTrie`]: icu_codepointtrie::CodePointTrie
//! [`CodePointSetData`]: crate::sets::CodePointSetData
//! [`CodePointMapData`]: crate::maps::CodePointMapData
//! [`sets`]: crate::sets

// https://github.com/unicode-org/icu4x/blob/main/docs/process/boilerplate.md#library-annotations
Expand Down
4 changes: 2 additions & 2 deletions components/properties/src/maps.rs
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ impl<T: TrieValue> CodePointMapData<T> {
/// ```
pub fn get_set_for_value(&self, value: T) -> CodePointSetData {
let set = self.data.get().get_set_for_value(value);
CodePointSetData::from_code_point_set(set)
CodePointSetData::from_code_point_inversion_list(set)
}

/// Construct a new one from loaded data
Expand Down Expand Up @@ -233,7 +233,7 @@ impl<'a, T: TrieValue> CodePointMapDataBorrowed<'a, T> {
/// ```
pub fn get_set_for_value(&self, value: T) -> CodePointSetData {
let set = self.map.get_set_for_value(value);
CodePointSetData::from_code_point_set(set)
CodePointSetData::from_code_point_inversion_list(set)
}
}

Expand Down
10 changes: 5 additions & 5 deletions components/properties/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
use crate::script::ScriptWithExtensions;
use icu_codepointtrie::{CodePointTrie, TrieValue};
use icu_provider::prelude::*;
use icu_uniset::CodePointSet;
use icu_uniset::CodePointInversionList;
use zerofrom::ZeroFrom;

/// A set of characters with a particular property.
Expand All @@ -29,7 +29,7 @@ use zerofrom::ZeroFrom;
#[non_exhaustive]
pub enum PropertyCodePointSetV1<'data> {
/// The set of characters, represented as an inversion list
InversionList(#[cfg_attr(feature = "serde", serde(borrow))] CodePointSet<'data>),
InversionList(#[cfg_attr(feature = "serde", serde(borrow))] CodePointInversionList<'data>),
// new variants should go BELOW existing ones
// Serde serializes based on variant name and index in the enum
// https://docs.rs/serde/latest/serde/trait.Serializer.html#tymethod.serialize_unit_variant
Expand Down Expand Up @@ -86,12 +86,12 @@ impl<'data> PropertyCodePointSetV1<'data> {
}

#[inline]
pub(crate) fn from_code_point_set(l: CodePointSet<'static>) -> Self {
pub(crate) fn from_code_point_inversion_list(l: CodePointInversionList<'static>) -> Self {
Self::InversionList(l)
}

#[inline]
pub(crate) fn to_code_point_set(&'_ self) -> CodePointSet<'_> {
pub(crate) fn to_code_point_inversion_list(&'_ self) -> CodePointInversionList<'_> {
match *self {
Self::InversionList(ref l) => ZeroFrom::zero_from(l),
}
Expand All @@ -108,7 +108,7 @@ impl<'data, T: TrieValue> PropertyCodePointMapV1<'data, T> {
}

#[inline]
pub(crate) fn get_set_for_value(&self, value: T) -> CodePointSet<'static> {
pub(crate) fn get_set_for_value(&self, value: T) -> CodePointInversionList<'static> {
match *self {
Self::CodePointTrie(ref t) => t.get_set_for_value(value),
}
Expand Down
10 changes: 5 additions & 5 deletions components/properties/src/script.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ use core::iter::FromIterator;
use core::ops::RangeInclusive;
use icu_codepointtrie::{CodePointTrie, TrieValue};
use icu_provider::prelude::*;
use icu_uniset::CodePointSet;
use icu_uniset::CodePointInversionList;
#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};
use zerovec::{ule::AsULE, VarZeroVec, ZeroSlice};
Expand Down Expand Up @@ -538,7 +538,7 @@ impl<'data> ScriptWithExtensions<'data> {
.map(|cpm_range| RangeInclusive::new(*cpm_range.range.start(), *cpm_range.range.end()))
}

/// Returns a [`CodePointSet`] for the given [`Script`] which represents all
/// Returns a [`CodePointInversionList`] for the given [`Script`] which represents all
/// code points for which `has_script` will return true.
///
/// # Examples
Expand Down Expand Up @@ -569,8 +569,8 @@ impl<'data> ScriptWithExtensions<'data> {
/// assert!(syriac.contains_u32(0x1DFA)); // COMBINING DOT BELOW LEFT
/// assert!(!syriac.contains_u32(0x1DFB)); // COMBINING DELETION MARK
/// ```
pub fn get_script_extensions_set(&self, script: Script) -> CodePointSet {
CodePointSet::from_iter(self.get_script_extensions_ranges(script))
pub fn get_script_extensions_set(&self, script: Script) -> CodePointInversionList {
CodePointInversionList::from_iter(self.get_script_extensions_ranges(script))
}
}

Expand Down Expand Up @@ -631,7 +631,7 @@ pub type ScriptWithExtensionsResult =
/// assert!(swe.has_script(0x0650, Script::Syriac));
/// assert!(!swe.has_script(0x0650, Script::Thaana));
///
/// // get a `CodePointSet` for when `Script` value is contained in `Script_Extensions` value
/// // get a `CodePointInversionList` for when `Script` value is contained in `Script_Extensions` value
/// let syriac = swe.get_script_extensions_set(Script::Syriac);
/// assert!(syriac.contains_u32(0x0650)); // ARABIC KASRA
/// assert!(!syriac.contains_u32(0x0660)); // ARABIC-INDIC DIGIT ZERO
Expand Down
43 changes: 23 additions & 20 deletions components/properties/src/sets.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

//! The functions in this module return a [`CodePointSet`] containing
//! The functions in this module return a [`CodePointSetData`] containing
//! the set of characters with a particular Unicode property.
//!
//! The descriptions of most properties are taken from [`TR44`], the documentation for the
//! Unicode Character Database. Some properties are instead defined in [`TR18`], the
//! documentation for Unicode regular expressions. In particular, Annex C of this document
//! defines properties for POSIX compatibility.
//!
//! [`CodePointSet`]: icu_uniset::CodePointSet
//! [`CodePointSetData`]: crate::sets::CodePointSetData
//! [`TR44`]: https://www.unicode.org/reports/tr44
//! [`TR18`]: https://www.unicode.org/reports/tr18

Expand All @@ -19,7 +19,7 @@ use crate::provider::*;
use crate::*;
use core::iter::FromIterator;
use icu_provider::prelude::*;
use icu_uniset::CodePointSet;
use icu_uniset::CodePointInversionList;

/// A wrapper around code point set data, returned by property getters for
/// unicode sets.
Expand Down Expand Up @@ -116,24 +116,24 @@ impl CodePointSetData {
}
}

/// Construct a new one an owned [`CodePointSet`]
pub fn from_code_point_set(set: CodePointSet<'static>) -> Self {
let set = PropertyCodePointSetV1::from_code_point_set(set);
/// Construct a new one an owned [`CodePointInversionList`]
pub fn from_code_point_inversion_list(set: CodePointInversionList<'static>) -> Self {
let set = PropertyCodePointSetV1::from_code_point_inversion_list(set);
CodePointSetData::from_data(DataPayload::<ErasedSetlikeMarker>::from_owned(set))
}

/// Convert this type to a [`CodePointSet`], borrowing if possible,
/// otherwise allocating a new [`CodePointSet`].
/// Convert this type to a [`CodePointInversionList`], borrowing if possible,
/// otherwise allocating a new [`CodePointInversionList`].
///
/// The data backing this is extensible and supports multiple implementations.
/// Currently it is always [`CodePointSet`]; however in the future more backends may be
/// Currently it is always [`CodePointInversionList`]; however in the future more backends may be
/// added, and users may select which at data generation time.
///
/// If using this function it is preferable to stick to [`CodePointSet`] representations
/// If using this function it is preferable to stick to [`CodePointInversionList`] representations
/// in the data, however exceptions can be made if the performance hit is considered to
/// be okay.
pub fn to_code_point_set(&self) -> CodePointSet<'_> {
self.data.get().to_code_point_set()
pub fn to_code_point_inversion_list(&self) -> CodePointInversionList<'_> {
self.data.get().to_code_point_inversion_list()
}
}

Expand Down Expand Up @@ -1705,8 +1705,8 @@ pub fn get_for_general_category_group(
.iter_ranges()
.filter(|cpm_range| (1 << cpm_range.value as u32) & enum_val.0 != 0)
.map(|cpm_range| cpm_range.range);
let set = CodePointSet::from_iter(matching_gc_ranges);
Ok(CodePointSetData::from_code_point_set(set))
let set = CodePointInversionList::from_iter(matching_gc_ranges);
Ok(CodePointSetData::from_code_point_inversion_list(set))
}

#[cfg(test)]
Expand Down Expand Up @@ -1748,27 +1748,30 @@ mod tests {
fn test_gc_groupings() {
use icu::properties::{maps, sets};
use icu::properties::{GeneralCategory, GeneralCategoryGroup};
use icu_uniset::CodePointSetBuilder;
use icu_uniset::CodePointInversionListBuilder;

let provider = icu_testdata::get_provider();

let test_group = |category: GeneralCategoryGroup, subcategories: &[GeneralCategory]| {
let category_set = sets::get_for_general_category_group(&provider, category)
.expect("The data should be valid");
let category_set = category_set.to_code_point_set();
let category_set = category_set.to_code_point_inversion_list();

let data = maps::get_general_category(&provider).expect("The data should be valid");
let gc = data.as_borrowed();

let mut builder = CodePointSetBuilder::new();
let mut builder = CodePointInversionListBuilder::new();
for subcategory in subcategories {
builder.add_set(&gc.get_set_for_value(*subcategory).to_code_point_set());
builder.add_set(
&gc.get_set_for_value(*subcategory)
.to_code_point_inversion_list(),
);
}
let combined_set = builder.build();
println!("{:?} {:?}", category, subcategories);
assert_eq!(
category_set.get_inversion_list(),
combined_set.get_inversion_list()
category_set.get_inversion_list_vec(),
combined_set.get_inversion_list_vec()
);
};

Expand Down
6 changes: 3 additions & 3 deletions experimental/casemapping/src/internals.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ use icu_codepointtrie::CodePointTrieHeader;
use icu_codepointtrie::{CodePointTrie, TrieValue};
use icu_locid::Locale;
use icu_provider::{yoke, zerofrom};
use icu_uniset::CodePointSetBuilder;
use icu_uniset::CodePointInversionListBuilder;
#[cfg(feature = "datagen")]
use std::collections::HashMap;
use zerovec::ule::{AsULE, RawBytesULE};
Expand Down Expand Up @@ -1036,12 +1036,12 @@ pub trait ClosureSet {
fn add_string(&mut self, string: &str);
}

impl ClosureSet for CodePointSetBuilder {
impl ClosureSet for CodePointInversionListBuilder {
fn add_char(&mut self, c: char) {
self.add_char(c)
}

// The current version of CodePointSet doesn't include strings.
// The current version of CodePointInversionList doesn't include strings.
// Trying to add a string is a no-op that will be optimized away.
#[inline]
fn add_string(&mut self, _string: &str) {}
Expand Down
Loading