As you have probably already done it, saving machine learning models is quite easy but there are a good and bad way.
One common way to save and load models is using pickle(is a binary protocol to transform Python objects into a stream of bytes) but that isn’t a very good idea for several reasons.
Pickle has some negative sides, flaws that turn that solution as a bad one. The most important flaws are insecurity and the fact that pickle is bounded to Python language.
Python module came with a warning about using Pickle.
Warning The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
But this isn’t the only one, there are others flaws (Old pickles look like old code, Implicit, Over-serializes, __init__ isn’t called, Python only, Unreadable, Appears to pickle code and Slow) detailed on this blog.
Ok I agree with you that if you save your model yourself and reuse it later, the danger is quite limited or even non-existent.
But think if your model will be sent to a partner or others teams that they don’t know you or they use other programming languages. They will rightly ask the question if they can load the model and if they will not load a malicious files.
For me, this is the principal reason that you should avoid to save your data/models using pickle for the feature but like I said, if you are the only one who will reuse this saved model, pickle maybe a good solution.
If you use Joblib, it is based on Python pickle serialization model so you can extends some flaws that we already explain above.
II. Save your model using Json
A simple way to save your model is using Json. A fitted model can’t be saved using Json directly but, we can create a function or class to save all parameters and used data while fitting.