Cleanup minor spelling in README docs

charleso · charleso · commit 06f4b49879f9 · 2014-07-02T16:16:39.000+10:00
diff --git a/README.md b/README.md
@@ -29,7 +29,7 @@ ivory_repository/
 │   └── stores
 │       ├── feature_store1
 │       └── feature_store2
-└── fact_sets
+└── factsets
     ├── fact_set1
     └── fact_set2
 ```
@@ -74,8 +74,8 @@ my_fact_set/
 
 ```
 
-In this fact set, facts are partioned across two namespaces: `widgets` and `demo`. The *widget* facts
-are spread accross three dates, while *demographic* facts are constrained to one. Note also that
+In this fact set, facts are partitioned across two namespaces: `widgets` and `demo`. The *widget* facts
+are spread across three dates, while *demographic* facts are constrained to one. Note also that
 a given namespace-partition can contain multiple EAVT files.
 
 EAVT files are simply pipe-delimited text files with one EAVT record per line. For example, a line in
@@ -112,10 +112,10 @@ The ordering is important as it allows facts to be overriden. When a feature sto
 with the same entity, attribute and time are identified, the value from the fact contained in the most recent fact
 set will be used, where most recent means listed higher in the feature store file.
 
-Because a feature store can be speified by just referencing fact sets, Ivory can support poor-man versioning giving
+Because a feature store can be specified by just referencing fact sets, Ivory can support poor-man versioning giving
 rise to use cases such as:
 
-* overrding buggy values with corrected ones;
+* overriding buggy values with corrected ones;
 * combining *production* features with *ad-hoc* features.
 
 
@@ -138,7 +138,7 @@ feature identifier the following metadata:
 
 * A human-readable *description*.
 
-In Ivory, feature metadata is seperated from the features store (facts) in its own set of text files known
+In Ivory, feature metadata is separated from the features store (facts) in its own set of text files known
 as *feature dictionaries*. Dictionary text files are also pipe-delimited and of the following form:
 
 ```
@@ -165,7 +165,7 @@ check that the encoding types specified for features in the dictionary are consi
 > ivory validate --feature-store feature_store.txt --dictionary feature_dictionary.txt
 ```
 
-We can also use Ivory to generate statistics for the values of specific features accross a feature store using the
+We can also use Ivory to generate statistics for the values of specific features across a feature store using the
 `inspect` command. This will compute statistics such as density, ranges (for numerical features), factors (for
 categorical features), historgrams, means, etc. Inspections can filter both the features of interest as well which
 facts to considered by time:
@@ -181,7 +181,7 @@ Querying
 Ivory supports two types of queries: *snapshots* and *chords*.
 
 
-A `snaphot` query is used to extract the feature values for entities at a certain point in time. Snapshoting can filter
+A `snapshot` query is used to extract the feature values for entities at a certain point in time. Snapshotting can filter
 the set of features and/or entities considered. By default the output is in *EAVT* format, but can be output in
 row-oriented form (i.e. column per feature) using the `--pivot` option. When a  `snapshot` query is performed, the most
 recent feature value for a given entity-attribute, with respect to the snapshot time, will be returned in the output:
@@ -233,7 +233,7 @@ This outputs two files:
 The format of the feature flag file is:
 
 ```
-namespace|name|sparcity|fequency
+namespace|name|sparcity|frequency
 ```
 
 An example is:
@@ -258,4 +258,4 @@ Versioning
 The format of fact sets are versioned. This allows the format of fact sets to be modified in the future but still maintain feature stores that
 reference fact sets persisted in an older format.
 
-A fact set format version is specifed by a `.version` file that is stored at the root directory of a given fact set.
+A fact set format version is specified by a `.version` file that is stored at the root directory of a given fact set.
diff --git a/doc/dates.md b/doc/dates.md
@@ -74,7 +74,7 @@ Ivory supports a sub-set of ISO 8601 timestamps.
 
  `yyyy-MM-dd` -
 
-    Date with a day granularitry in the local time zone. Example: `2012-01-15`,
+    Date with a day granularity in the local time zone. Example: `2012-01-15`,
     `2014-12-31`.
 
 #### Local Date And Time
@@ -267,7 +267,7 @@ E4|A1|3|2010-03-03T14:30:00+11:00
 
 ##### `Ingestion Solution 2`
   Perform individual ingestions for each timezone, using the
-  "Local date / time" format, but specificy an overriding
+  "Local date / time" format, but specify an overriding
   ingestion timezone for the whole dataset. The ingestion
   will then translate each row into the repository timezone.
 
@@ -308,7 +308,7 @@ To address this we could do one of two things:
    - annotate DST overlapped hours with an extra bit in the time field; or
    - offset time by an additional interval to handle the gained time.
 
-However, both of these things require non-standard treament of "second
+However, both of these things require non-standard treatment of "second
 of day" and will require code changes to ivory to handle.
 
 To be clear, at this point ivory handles "second of day" based only on
@@ -326,7 +326,7 @@ There are number of key pieces of this which are not complete:
     there is no "standard library" for dealing with dates in a
     consistent way within ivory.
 
-  - The ISO 8601 variants are not complete and not uniformally
+  - The ISO 8601 variants are not complete and not uniformly
     supported.
 
   - Ingestion incorrectly forces a timezone to be specified for
diff --git a/doc/quality.md b/doc/quality.md
@@ -6,4 +6,4 @@ Current Quality Hitlist
   * Remove duplication, there is too much conceptual duplication in storage
   * Remove "hole" in the middle anti-pattern, composition first.
   * Configuration goes in as arguments. Remove mix of "configuration" styles with implicits and readers.
-  * Consist effect handling, unsafePerformIO's go at the top.
+  * Consistent effect handling, unsafePerformIO's go at the top.
diff --git a/doc/remotes.md b/doc/remotes.md
@@ -34,7 +34,7 @@ repos for individuals to cook up their own features:
 
 * improve performance of snapshots as of "now"
 * steps:
-  1. generate a snaphsot using the traditional approach
+  1. generate a snapshot using the traditional approach
   2. store the snapshot as a fact set in another repo, the "snapshot" repo
   3. in the "snapshot repo" create a feature store that includes the snapshot fact set and all fact sets
   added since the first snapshot
@@ -61,7 +61,7 @@ Generalising the idea of versioning
 One of the core ideas of Ivory is that it is an immutable *database* of facts. Immutable views or *versions* of
 the database are constructed by combining a specific feature store and dictionary together. All queries, then,
 should be with respect to a particular *version*. Whilst the design of ivory allows for the notion of versions,
-it is currently not a first class citizen. Furthermore, it were to be made a first class citizen, the mechansim
+it is currently not a first class citizen. Furthermore, it were to be made a first class citizen, the mechanism
 for dealing with remote repos may fall out more naturally.
 
 There are a number of *objects* in our data model that should be versioned:
@@ -73,9 +73,9 @@ There are a number of *objects* in our data model that should be versioned:
 
 It may be worth borrowing ideas from Git on how this is designed. For example:
 
-* Version identifers are hashes of their content. For fact sets we could use CRCs associated with the data.
+* Version identifiers are hashes of their content. For fact sets we could use CRCs associated with the data.
 * Have human-readable references to identifiers, i.e. *branches* and *tags*.
 * The concept of branches is interesting in that it suggests a lineage between different versions. Given the
-changes to dictionaries and feature store are typcially incremental in nature, the idea of a version being
+changes to dictionaries and feature store are typically incremental in nature, the idea of a version being
 a delta applied to a *parent* version may be worth while.
 * This all, of course, plays in to the *remote* concept. That is, remote fact sets can be referenced by version.
diff --git a/ivory-ingest/src/main/scala/com/ambiata/ivory/ingest/mr.scala b/ivory-ingest/src/main/scala/com/ambiata/ivory/ingest/mr.scala
@@ -48,7 +48,7 @@ object IngestJob {
     job.setMapOutputKeyClass(classOf[LongWritable]);
     job.setMapOutputValueClass(classOf[BytesWritable]);
 
-    /* partiton & sort */
+    /* partition & sort */
     job.setPartitionerClass(classOf[IngestPartitioner])
     job.setGroupingComparatorClass(classOf[LongWritable.Comparator])
     job.setSortComparatorClass(classOf[LongWritable.Comparator])
@@ -122,8 +122,8 @@ object IngestJob {
 /**
  * Partitioner for ivory-ingest.
  *
- * Keys are partitioned by the extrnalized feature id (held in the top 32 bits of the key)
- * into pre-determined buckets. We use the predtermined buckets as upfront knowledge of
+ * Keys are partitioned by the externalized feature id (held in the top 32 bits of the key)
+ * into predetermined buckets. We use the predetermined buckets as upfront knowledge of
  * the input size is used to reduce skew on input data.
  */
 class IngestPartitioner extends Partitioner[LongWritable, BytesWritable] with Configurable {
diff --git a/ivory-mr/src/main/scala/com/ambiata/ivory/mr/DistCache.scala b/ivory-mr/src/main/scala/com/ambiata/ivory/mr/DistCache.scala
@@ -14,7 +14,7 @@ import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.mapreduce.Job
 
 /**
- * This is module for managing passing data-types via tha distributed cache. This is
+ * This is module for managing passing data-types via the distributed cache. This is
  * _unsafe_ at best, and should be used with extreme caution. The only valid reason to
  * use it is when writing raw map reduce jobs.
  */
diff --git a/ivory-mr/src/main/scala/com/ambiata/ivory/mr/TextCache.scala b/ivory-mr/src/main/scala/com/ambiata/ivory/mr/TextCache.scala
@@ -14,7 +14,7 @@ import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.mapreduce.Job
 
 /**
- * This is module for managing passing text data-types via tha distributed cache. This is
+ * This is module for managing passing text data-types via the distributed cache. This is
  * _unsafe_ at best, and should be used with extreme caution. The only valid reason to
  * use it is when writing raw map reduce jobs.
  */
diff --git a/ivory-mr/src/main/scala/com/ambiata/ivory/mr/ThriftCache.scala b/ivory-mr/src/main/scala/com/ambiata/ivory/mr/ThriftCache.scala
@@ -16,7 +16,7 @@ import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.mapreduce.Job
 
 /**
- * This is module for managing passing thrift data-types via tha distributed cache. This is
+ * This is module for managing passing thrift data-types via the distributed cache. This is
  * _unsafe_ at best, and should be used with extreme caution. The only valid reason to
  * use it is when writing raw map reduce jobs.
  */
diff --git a/ivory-scoobi/src/main/scala/com/ambiata/ivory/scoobi/Groupings.scala b/ivory-scoobi/src/main/scala/com/ambiata/ivory/scoobi/Groupings.scala
@@ -10,7 +10,7 @@ object Groupings {
 
   /**
    * This grouping will take a map of partitions to index and send each key to the reducer associated with the index.
-   * If the key is not found it will use the String Grouping to determin which reducer to go to.
+   * If the key is not found it will use the String Grouping to determine which reducer to go to.
    * The index is mod'd with the total number of reducers so it will wrap if its greater.
    */
   def partitionGrouping(partitions: Map[String, Int]) = new Grouping[String] {