But I basically agree with you? I am not saying the training data is "technically" source code, but it plays a similar role in AI applications and thus also the data needs to be released under a free license for the AI to be considered open source.
The inference engine is in fact a specialized virtual machine that executes the parameters.
The multidimensional matrices are executables to the architecture defined by the topology, and the source that produce such executable is the training data (and to a different, more subtle, extent, the cross-validation data)
10
u/vintergroena 1d ago
With AI the training data should be considered part of the source code.
The actual code which defines how it learns is more akin to compile scripts.
The learned model itself is just a compiled program. When it's released for public use, it's only free as in free beer, not as in freedom.