You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As @WilliamHYZhang and I discussed we need shuffling for data loaders. I would like to make a proposal to facilitate this.
We need to store an array of indices on a data loader. The type of Element in this array will be set by a type alias by the implementor of S5TFDataLoader which itself will need an associatedtype Index. An implementation of S5TFDataLoader needs to be able to load an element at this Index so we add the following line to the protocol:
indices would be an array of row numbers in CSVDataLoader, an array of IDs in MNIST and comparable datasets and an array of paths in Imagenette for example.
This uses copy() which should be another requirement of S5TFDataLoader. This will allow for more default implementations like the following:
func batched(_ batchSize:Int)->Self{guard batchSize >=1else{fatalError("Batch size must be greater than or equal to 1")}guard batchSize <= count else{fatalError("Batch size equal to or smaller than the number of items.")}varmutableSelf=self.copy()
mutableSelf. batchSize = batchSize
return mutableSelf
}
getElement(at:) should be used by next. If we write a function createBatch(from indices): we can write next in advance too. Implementing a data loader would just involve writing a few basic functions because everything else will be handled by S5TFDataLoader. The implementor should not have to duplicate the next() functionality for every dataset because it will always be roughly the same code.
indices also allows for
varcount:Int{return indices.count
}
This will save a lot of work in future implementations of datasets.
The text was updated successfully, but these errors were encountered:
I have been experimenting with this on a local branch. We might need to make the current implementation of S5TFDataLoader a struct, rename it (I have been using S5TFDataIterator) and make its sole responsibility making data in data loaders accessible.
As @WilliamHYZhang and I discussed we need shuffling for data loaders. I would like to make a proposal to facilitate this.
We need to store an array of
indices
on a data loader. The type ofElement
in this array will be set by a type alias by the implementor ofS5TFDataLoader
which itself will need anassociatedtype Index
. An implementation ofS5TFDataLoader
needs to be able to load an element at thisIndex
so we add the following line to the protocol:where
DataType
is the associated data type discussed in s5tf-team/datasets#14 (comment).indices
would be an array of row numbers in CSVDataLoader, an array of IDs in MNIST and comparable datasets and an array of paths in Imagenette for example.We should add
shuffled()
in a protocol extension:This uses
copy()
which should be another requirement ofS5TFDataLoader
. This will allow for more default implementations like the following:getElement(at:)
should be used bynext
. If we write a functioncreateBatch(from indices):
we can writenext
in advance too. Implementing a data loader would just involve writing a few basic functions because everything else will be handled byS5TFDataLoader
. The implementor should not have to duplicate thenext()
functionality for every dataset because it will always be roughly the same code.indices
also allows forThis will save a lot of work in future implementations of datasets.
The text was updated successfully, but these errors were encountered: