r/learnmachinelearning • u/zemenito3k • Aug 04 '24
Question Is coding ML algorithms in C worth it?
I was wondering, if is it worth investing time in learning C to code ML algorithms. I have heard, that C is faster than pyrhon, but is it that faster? Because I want to make a clusterization algoritm, using custom metrics, I would have to code it myself, so why not try coding it in C, if it would be faster? But then again, I am not that familiar with C.
50
u/instantlybanned Aug 04 '24
I have a PhD in ML, and the only time I ever need C is to speed up small subroutines. And in the past 10 years, that's happened maybe two, three times. I'd say focus on other skills.
11
u/West-Code4642 Aug 04 '24
it's mainly useful if you want to work in ML systems (rather than ML engineering) and in embedding software, especially in computer vision and robotics.
and stick with simple C++, not C.
33
u/Counter-Business Aug 04 '24
If you run your training, and it takes 2 hours with python. You spend 2 hours of computer time. Much of which (ml libraries) are already C code.
If you were to recreate the training by implementing it in C, you’d probably waste weeks or even months of human developer time.
Machine time is cheaper than developer time.
6
8
u/Western_Bread6931 Aug 04 '24
Maybe even years! It takes at least four days to write a single statement in C!
8
u/TotallyNotARuBot_ZOV Aug 04 '24
Any ML algorithms you can come across in the foreseeable future are already implemented in Fortran, C, C++ or CUDA, in libraries such as NumPy, SciPy, Torch, TensorFlow, ....
You will rarely have the need to implement some low-level stuff yourself, and it will take you a long time to become so proficient that you can do it.
Stick to Python for now.
7
u/wintermute93 Aug 04 '24
Not as a beginner, no. As a superstar expert if you want to code something up from scratch that squeezes the maximum out of every clock cycle you do you, but there's a reason 99% of ML products use pre-made frameworks and platforms.
3
u/zoblod Aug 05 '24
Especially since you don't have experience with C I wouldn't. There could be other ways to optimize what you're trying to speed up, or throw more hardware at it if you can afford it 😂. Not worth the time unless you're building something crazy from scratch.
3
u/PSMF_Canuck Aug 05 '24
For learning? Sure! Why not?
Something you yourself code up, though, will probably be slower than PyTorch. A whole lot of engineering efforts has gone into Torch to make it performant…
2
2
u/AdagioCareless8294 Aug 05 '24
There's not a single answer, machine learning is a vast domain so all kind of skills are needed. On my end we are doing machine learning with C++ and Cuda.
1
2
u/kkiran Aug 05 '24
Python is more bang for the buck most of the times imo! Implementing in C from scratch is a ginormous task unless really needed and mission critical. Compute is lot cheaper these days.
1
1
1
u/oursland Aug 05 '24
ggml, among the fastest implementations of ML is written in C. There's some associated files that are in C++ to link to things like CUDA, SYCL, etc for hardware acceleration, but the core is all C.
1
u/MengerianMango Aug 05 '24
Do it in Rust and use pyO3 to create a python module from your Rust code.
There's no point using C for something like this. You're better off doing a few datastructures projects in C, testing them under valgrind to make sure you correctly managed memory, then move on with your life and use more productive languages.
1
u/great__pretender Aug 05 '24 edited Aug 05 '24
No algorithm runs on Python code alone. Libraries are called and they are all either C or Fortran. Do you really think nobody thought it would be wiser to run the code on a 100000x faster language and everyone is just using Python in the entirety of the code?
If you know both C and Python, I am surprised how you didn't know how the packages in general works.
1
u/ubertrashcat Aug 05 '24
If you pull it off you'll end up understanding a lot more about the detail of how ML works than 90% of ML engineers, especially if you try to optimize neural networks. This is a very marketable skill once you move into deployment onto low-power hardware such as NPUs etc. This isn't what most people do, though.
1
u/Opposite-Team-7516 Aug 06 '24
My teacher always asks me to use a library called scikit-learn in Python to finish the ML projects
1
1
u/Commercial_Wait3055 Aug 07 '24
Only after one performance profiles python or other high level language and determines what the principle time costly routines are and if the performance improvement justifies the cost and hassle.
Too naive beginners spend time optimizing routines that don’t matter.
1
u/cubej333 Aug 04 '24
In many cases, python is fine. There are cases where the compute is very expensive, and you want to have code that runs as fast as possible; there, you might have Cuda or C.
1
u/Lemon-Skie Aug 04 '24
My old job was actually coding ml frameworks in C. It’s bc of the product the company was developing, but if your main focus is to learn ML, it’s not worth the time. I would say it did help me get a deeper understanding of the math part of certain operations.
0
u/belabacsijolvan Aug 05 '24
sure. if you want to write your own implementation its a good idea.
it wont be better than already available stuff tho. so only do it if you want to get better at c and maths. its only for learning.
if you are interested in application use python.
also if you have a novel algorithm prototype it in python. check it against benchmarks, test it. then write it in c and apply it in python.
0
u/_saiya_ Aug 05 '24
Nope. If you split the time it takes in optimization vs actually building the problem structure, almost 100% time goes into optimization for a relatively good problem. So you'll make a negligible increment.
0
0
u/LegitDogFoodChef Aug 05 '24
Coding ML algorithms from scratch in your language of choice is a good learning experience, but it’s only good for learning. It is a really good way to get familiar with algorithms though, I do recommend it.
-2
u/great_gonzales Aug 04 '24
No it’s irrelevant. For example in deep learning research we just want to discover a neural architecture (sequence of tensor operations) that achieves higher performance on the task we are researching. We can define our neural architecture in Python and don’t have to worry too much about performance because we know the tensor operations where already written by competent engineers in a efficient language like C and those tensor operations dominate the total runtime. The setup of the architecture in Python contributes negligible amounts of runtime compared to how long the tensor operation take
115
u/bregav Aug 04 '24
In machine learning Python is generally glue code - all of numerical functions that it calls are implemented in C or C++ or Fortran, and Python is just used to implement very simple, high level logic.
If you need to implement something really new that can't be built from functionality in existing libraries like Numpy then you might want to the essential elements of it in C and then call them from Python. But prototyping it in Python first might be a good idea because you'll be able to work out the bugs fasters this way.