Skip to content

Commit 06f4b49

Browse files
committed
Cleanup minor spelling in README docs
1 parent 0e13188 commit 06f4b49

File tree

9 files changed

+26
-26
lines changed

9 files changed

+26
-26
lines changed

README.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ ivory_repository/
2929
│ └── stores
3030
│ ├── feature_store1
3131
│ └── feature_store2
32-
└── fact_sets
32+
└── factsets
3333
├── fact_set1
3434
└── fact_set2
3535
```
@@ -74,8 +74,8 @@ my_fact_set/
7474
7575
```
7676

77-
In this fact set, facts are partioned across two namespaces: `widgets` and `demo`. The *widget* facts
78-
are spread accross three dates, while *demographic* facts are constrained to one. Note also that
77+
In this fact set, facts are partitioned across two namespaces: `widgets` and `demo`. The *widget* facts
78+
are spread across three dates, while *demographic* facts are constrained to one. Note also that
7979
a given namespace-partition can contain multiple EAVT files.
8080

8181
EAVT files are simply pipe-delimited text files with one EAVT record per line. For example, a line in
@@ -112,10 +112,10 @@ The ordering is important as it allows facts to be overriden. When a feature sto
112112
with the same entity, attribute and time are identified, the value from the fact contained in the most recent fact
113113
set will be used, where most recent means listed higher in the feature store file.
114114

115-
Because a feature store can be speified by just referencing fact sets, Ivory can support poor-man versioning giving
115+
Because a feature store can be specified by just referencing fact sets, Ivory can support poor-man versioning giving
116116
rise to use cases such as:
117117

118-
* overrding buggy values with corrected ones;
118+
* overriding buggy values with corrected ones;
119119
* combining *production* features with *ad-hoc* features.
120120

121121

@@ -138,7 +138,7 @@ feature identifier the following metadata:
138138

139139
* A human-readable *description*.
140140

141-
In Ivory, feature metadata is seperated from the features store (facts) in its own set of text files known
141+
In Ivory, feature metadata is separated from the features store (facts) in its own set of text files known
142142
as *feature dictionaries*. Dictionary text files are also pipe-delimited and of the following form:
143143

144144
```
@@ -165,7 +165,7 @@ check that the encoding types specified for features in the dictionary are consi
165165
> ivory validate --feature-store feature_store.txt --dictionary feature_dictionary.txt
166166
```
167167

168-
We can also use Ivory to generate statistics for the values of specific features accross a feature store using the
168+
We can also use Ivory to generate statistics for the values of specific features across a feature store using the
169169
`inspect` command. This will compute statistics such as density, ranges (for numerical features), factors (for
170170
categorical features), historgrams, means, etc. Inspections can filter both the features of interest as well which
171171
facts to considered by time:
@@ -181,7 +181,7 @@ Querying
181181
Ivory supports two types of queries: *snapshots* and *chords*.
182182

183183

184-
A `snaphot` query is used to extract the feature values for entities at a certain point in time. Snapshoting can filter
184+
A `snapshot` query is used to extract the feature values for entities at a certain point in time. Snapshotting can filter
185185
the set of features and/or entities considered. By default the output is in *EAVT* format, but can be output in
186186
row-oriented form (i.e. column per feature) using the `--pivot` option. When a `snapshot` query is performed, the most
187187
recent feature value for a given entity-attribute, with respect to the snapshot time, will be returned in the output:
@@ -233,7 +233,7 @@ This outputs two files:
233233
The format of the feature flag file is:
234234

235235
```
236-
namespace|name|sparcity|fequency
236+
namespace|name|sparcity|frequency
237237
```
238238

239239
An example is:
@@ -258,4 +258,4 @@ Versioning
258258
The format of fact sets are versioned. This allows the format of fact sets to be modified in the future but still maintain feature stores that
259259
reference fact sets persisted in an older format.
260260

261-
A fact set format version is specifed by a `.version` file that is stored at the root directory of a given fact set.
261+
A fact set format version is specified by a `.version` file that is stored at the root directory of a given fact set.

doc/dates.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ Ivory supports a sub-set of ISO 8601 timestamps.
7474

7575
`yyyy-MM-dd` -
7676

77-
Date with a day granularitry in the local time zone. Example: `2012-01-15`,
77+
Date with a day granularity in the local time zone. Example: `2012-01-15`,
7878
`2014-12-31`.
7979

8080
#### Local Date And Time
@@ -267,7 +267,7 @@ E4|A1|3|2010-03-03T14:30:00+11:00
267267

268268
##### `Ingestion Solution 2`
269269
Perform individual ingestions for each timezone, using the
270-
"Local date / time" format, but specificy an overriding
270+
"Local date / time" format, but specify an overriding
271271
ingestion timezone for the whole dataset. The ingestion
272272
will then translate each row into the repository timezone.
273273

@@ -308,7 +308,7 @@ To address this we could do one of two things:
308308
- annotate DST overlapped hours with an extra bit in the time field; or
309309
- offset time by an additional interval to handle the gained time.
310310

311-
However, both of these things require non-standard treament of "second
311+
However, both of these things require non-standard treatment of "second
312312
of day" and will require code changes to ivory to handle.
313313

314314
To be clear, at this point ivory handles "second of day" based only on
@@ -326,7 +326,7 @@ There are number of key pieces of this which are not complete:
326326
there is no "standard library" for dealing with dates in a
327327
consistent way within ivory.
328328

329-
- The ISO 8601 variants are not complete and not uniformally
329+
- The ISO 8601 variants are not complete and not uniformly
330330
supported.
331331

332332
- Ingestion incorrectly forces a timezone to be specified for

doc/quality.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ Current Quality Hitlist
66
* Remove duplication, there is too much conceptual duplication in storage
77
* Remove "hole" in the middle anti-pattern, composition first.
88
* Configuration goes in as arguments. Remove mix of "configuration" styles with implicits and readers.
9-
* Consist effect handling, unsafePerformIO's go at the top.
9+
* Consistent effect handling, unsafePerformIO's go at the top.

doc/remotes.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ repos for individuals to cook up their own features:
3434

3535
* improve performance of snapshots as of "now"
3636
* steps:
37-
1. generate a snaphsot using the traditional approach
37+
1. generate a snapshot using the traditional approach
3838
2. store the snapshot as a fact set in another repo, the "snapshot" repo
3939
3. in the "snapshot repo" create a feature store that includes the snapshot fact set and all fact sets
4040
added since the first snapshot
@@ -61,7 +61,7 @@ Generalising the idea of versioning
6161
One of the core ideas of Ivory is that it is an immutable *database* of facts. Immutable views or *versions* of
6262
the database are constructed by combining a specific feature store and dictionary together. All queries, then,
6363
should be with respect to a particular *version*. Whilst the design of ivory allows for the notion of versions,
64-
it is currently not a first class citizen. Furthermore, it were to be made a first class citizen, the mechansim
64+
it is currently not a first class citizen. Furthermore, it were to be made a first class citizen, the mechanism
6565
for dealing with remote repos may fall out more naturally.
6666

6767
There are a number of *objects* in our data model that should be versioned:
@@ -73,9 +73,9 @@ There are a number of *objects* in our data model that should be versioned:
7373

7474
It may be worth borrowing ideas from Git on how this is designed. For example:
7575

76-
* Version identifers are hashes of their content. For fact sets we could use CRCs associated with the data.
76+
* Version identifiers are hashes of their content. For fact sets we could use CRCs associated with the data.
7777
* Have human-readable references to identifiers, i.e. *branches* and *tags*.
7878
* The concept of branches is interesting in that it suggests a lineage between different versions. Given the
79-
changes to dictionaries and feature store are typcially incremental in nature, the idea of a version being
79+
changes to dictionaries and feature store are typically incremental in nature, the idea of a version being
8080
a delta applied to a *parent* version may be worth while.
8181
* This all, of course, plays in to the *remote* concept. That is, remote fact sets can be referenced by version.

ivory-ingest/src/main/scala/com/ambiata/ivory/ingest/mr.scala

+3-3
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ object IngestJob {
4848
job.setMapOutputKeyClass(classOf[LongWritable]);
4949
job.setMapOutputValueClass(classOf[BytesWritable]);
5050

51-
/* partiton & sort */
51+
/* partition & sort */
5252
job.setPartitionerClass(classOf[IngestPartitioner])
5353
job.setGroupingComparatorClass(classOf[LongWritable.Comparator])
5454
job.setSortComparatorClass(classOf[LongWritable.Comparator])
@@ -122,8 +122,8 @@ object IngestJob {
122122
/**
123123
* Partitioner for ivory-ingest.
124124
*
125-
* Keys are partitioned by the extrnalized feature id (held in the top 32 bits of the key)
126-
* into pre-determined buckets. We use the predtermined buckets as upfront knowledge of
125+
* Keys are partitioned by the externalized feature id (held in the top 32 bits of the key)
126+
* into predetermined buckets. We use the predetermined buckets as upfront knowledge of
127127
* the input size is used to reduce skew on input data.
128128
*/
129129
class IngestPartitioner extends Partitioner[LongWritable, BytesWritable] with Configurable {

ivory-mr/src/main/scala/com/ambiata/ivory/mr/DistCache.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ import org.apache.hadoop.conf.Configuration
1414
import org.apache.hadoop.mapreduce.Job
1515

1616
/**
17-
* This is module for managing passing data-types via tha distributed cache. This is
17+
* This is module for managing passing data-types via the distributed cache. This is
1818
* _unsafe_ at best, and should be used with extreme caution. The only valid reason to
1919
* use it is when writing raw map reduce jobs.
2020
*/

ivory-mr/src/main/scala/com/ambiata/ivory/mr/TextCache.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ import org.apache.hadoop.conf.Configuration
1414
import org.apache.hadoop.mapreduce.Job
1515

1616
/**
17-
* This is module for managing passing text data-types via tha distributed cache. This is
17+
* This is module for managing passing text data-types via the distributed cache. This is
1818
* _unsafe_ at best, and should be used with extreme caution. The only valid reason to
1919
* use it is when writing raw map reduce jobs.
2020
*/

ivory-mr/src/main/scala/com/ambiata/ivory/mr/ThriftCache.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ import org.apache.hadoop.conf.Configuration
1616
import org.apache.hadoop.mapreduce.Job
1717

1818
/**
19-
* This is module for managing passing thrift data-types via tha distributed cache. This is
19+
* This is module for managing passing thrift data-types via the distributed cache. This is
2020
* _unsafe_ at best, and should be used with extreme caution. The only valid reason to
2121
* use it is when writing raw map reduce jobs.
2222
*/

ivory-scoobi/src/main/scala/com/ambiata/ivory/scoobi/Groupings.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ object Groupings {
1010

1111
/**
1212
* This grouping will take a map of partitions to index and send each key to the reducer associated with the index.
13-
* If the key is not found it will use the String Grouping to determin which reducer to go to.
13+
* If the key is not found it will use the String Grouping to determine which reducer to go to.
1414
* The index is mod'd with the total number of reducers so it will wrap if its greater.
1515
*/
1616
def partitionGrouping(partitions: Map[String, Int]) = new Grouping[String] {

0 commit comments

Comments
 (0)