-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Unexpected Prediction with SGDClassifier #652
Comments
Thanks for your question. Yes, this can happen, ML models don't always get things right. Moreover, the To get reproducible results is not easy: you have to ensure that every source of randomness is eliminated. The biggest source of randomness is the random number generator (RNG). Scikit-Learn uses NumPy's RNG. You can ensure it always produces the same results by setting the random seed with Lastly, libraries sometimes evolve, algorithms may be slightly tweaked, so results may not be exactly identical to what they were in the past. So to get the same results as I did, you may have to run the exact same libraries on the same platform. If you use Colab, that's tricky since they keep updating the libraries, you would have to uninstall Scikit-Learn and reinstall an old version, but I don't recommend it, it's not worth the effort. In short, ensuring perfect reproducibility can be very difficult, and in general I would argue that it's not worth the effort. It's more important to ensure that your model has approximately the same performance on the validation set. If my model had, say, 90.13% accuracy and yours has 90.12%, it's not a problem. The results may vary for specific instances, but overall the performance is roughly the same. I hope this helps! |
Thank you for the explanation! Now it makes a lot more sense. I was initially confused because, at the beginning of the code, the same SGDClassifier algorithm was used, and it was able to correctly predict that the value 5 in X[0] was True. I hadn't considered how much randomness could influence the results. Now I see that even with the same algorithm, factors like initialization and execution environment can lead to different outcomes. Your explanation really clarified things for me! |
I'm glad I could help! 👍 |
I just ran the SGDClassifier from Chapter 3, but it predicted incorrectly on the first attempt (expected a 5, got a 3). Could this really happen, or is it due to a mistake in my code?
The text was updated successfully, but these errors were encountered: