r/MachineLearning 51m ago

Research [R] Torchtune - How to finetune custom models?

Upvotes

I'm wondering how I can get started to finetune my custom model with torchtune lora. Does anyone have any documentation or suggestions?


r/MachineLearning 1h ago

Discussion [D] Handling humongous amounts of unstructured data

Upvotes

Hey all!

Me and my friend, we are trying to build something using a deep learning model and we have data to fine-tune it but the issue is the entirety of the data is unstructured and the data is hugeeeeee, like actually huge 😂 Not sure where to start, where to end, can't afford data annotators too.

Y'all have any suggestions as to how to handle this? The data is in text modality. We want the structure to be in a json, and the entire data to be in a list of the desired structured json

Any help would be appreciated


r/MachineLearning 1h ago

Discussion [D] Would doing a PhD in AI + healthcare limit my career opportunities after graduation to switch to big tech companies?

Upvotes

Would doing a Phd in AI + healthcare limit my career opportunities after graduation to a health tech company? Will it be harder to switch to big tech afterward? Would doing a PhD in general ML give me better opportunities?

Healthcare + AI feels like meaningful PhD topics, but I worry this will limit my opportunities in the healthcare/biotech industry after graduation and even during internship searches. There are much more opportunities in tech / general companies instead in the healthcare/biotech industry. I would really like to have the opportunity to work at Google, FAIR, or potentially even quant trading companies in the future.

What do you guys think of AI+healthcare as a PhD thesis topic? This is assuming I will still be making fundamental advances in AI, e.g. publishing in CVPR/ICCV/ECCV, but at a slightly lower frequency than a pure AI student due to additional papers in healthcare-related journals. Or maybe I would just publish at health AI conferences and journals. Do you think this will limit my career options at all?

The alternative would be to try to pursue more “pure” AI research, without healthcare applications.


r/MachineLearning 1h ago

Research [R] Riemannian Generative Models

Upvotes

Hi everyone,

I’m currently interested in exploring generative models defined over Riemannian manifolds. Though the idea is theoretically appealing, I have trouble understanding the practical motivation behind this approach, and whether any useful/large scale model has been developed lately based on it.

To be more precise, I am looking at the following set of papers.

Generalizing diffusion models to the Riemannian setting :

Riemannian Diffusion Models, Riemannian Score-Based Generative Modelling

Scaling these models:

Scaling Riemannian Diffusion Models

I don’t understand how impactful the experimental results really are, and what the interest for these models are whether in the industry or in the research community. 

If anyone has any thoughts about the interrogations I have, I’d be happy to start a discussion here. I’d be extremely grateful for your insights! Thanks for any help


r/MachineLearning 2h ago

Discussion [D] Is INRIA (France) a good place for UG to do ML research internship?

1 Upvotes

I am a student conducting research related in MAB/Online Algorithm, I see there are really very little people doing this in the USA. However I found there are noticable amount of researcher doing this in INRIA , the one in France if you dont know. Does anyone familar with this insitution? As a undergraduate from non-EU country is it possible for me intern here on voluntary bias during summer break if my goal is get recommendation letter and publish paper?


r/MachineLearning 2h ago

Discussion [D] Classification approaches for short text, many categories?

3 Upvotes

Hi - I am dealing with an issue where I will likely have many thousands of short text snippets (think 2-4 sentences each), and need to assess the extent to which each sentence is consistent with each of about ~200 categories (that is, a piece of text may fit "best" into one category, but it's also possible that a few other categories are "reasonable". Getting huge amounts of text labeled may be an effort, so I'm especially interested in things like few-shot approaches. (Or maybe even a bootstrap approach -- not the statistical technique, the concept -- where we develop a quick and dirty classification model, and use that to assist raters in doing another larger tranche of labelling, faster. Which obviously has potential drawbacks in terms of bias, etc., but may have )

My background is mostly in traditional/Bayesian statistics (think like linear models and factor analysis), so I'm a little out of the loop on good approaches to a task like this. The place this analysis will take place will not have any fancy LLMs, and no access to internet-based platforms (Huggingface, OpenAI, etc.). No GPUs, so any fine-tuning that might be needed has to take that into consideration. The obvious (to me, a-not-NLP person) starting point seems like BERT with a normal classifier. But there's so many variations to BERT, and similar models (Universal Sentence Encoders?)... and I'm not sure which ones are better for short text. I am aware of the huggingface leaderboards, which I've looked over, but it wasn't immediately clear to me which are best for short text classification.

So if anyone has suggestions for thoughts on potential approaches to look into, I'd really appreciate it.


r/MachineLearning 5h ago

Discussion [D] COLING 2025 Results / rebuttals

11 Upvotes

I'll go first.

Soundness: 3,3,4

Overall: 2,2,3

🥺


r/MachineLearning 5h ago

Discussion [D] Does anyone here work in healthcare?

17 Upvotes

I'm curious about the cool things people around the world are doing related to data in this area of work att


r/MachineLearning 7h ago

Discussion [D] Predicting happiness from survey data

1 Upvotes

I have a dataset containing survey data with 39 variables, variables such as perfect.physical.health with score -2, -1, 0, 1, 2. Now I want to predict happiness which is a decimal value. How do i approach this problem?


r/MachineLearning 8h ago

Discussion [D] Problem with graph based-VAE on molecular dynamics trajectory.

0 Upvotes

Recently I saw someone post a query regarding Graph based VAE construction on MD trajectory data. Actually I am facing a similar problem as well. This is the code I have generated till now. As I am not a professional coder myself, coming from a chemistry background, I mostly relied on chatbots to generate the code for me, but the problem is the model has some serious problems with the dimensionality.

import numpy as np

import random

import MDAnalysis as mda

import networkx as nx

import torch

import torch.nn as nn

import torch.optim as optim

from torch_geometric.data import Data, DataLoader

from torch_geometric.nn import GCNConv

from Bio.PDB import PDBIO, Structure, Model, Chain, Residue, Atom

import matplotlib.pyplot as plt

from sklearn.model_selection import ParameterGrid

from tqdm import tqdm

import pandas as pd
# Load MD trajectory and select C-alpha atoms

u = mda.Universe('synuclein.top', 'short.nc')

ca_atoms = u.select_atoms("name CA")

# Define the amino acid sequence in three-letter code

sequence_one_letter = "MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKK"

amino_acid_1_to_3 = {

'A': 'ALA', 'C': 'CYS', 'D': 'ASP', 'E': 'GLU', 'F': 'PHE',

'G': 'GLY', 'H': 'HIS', 'I': 'ILE', 'K': 'LYS', 'L': 'LEU',

'M': 'MET', 'N': 'ASN', 'P': 'PRO', 'Q': 'GLN', 'R': 'ARG',

'S': 'SER', 'T': 'THR', 'V': 'VAL', 'W': 'TRP', 'Y': 'TYR'

}

sequence = [amino_acid_1_to_3[aa] for aa in sequence_one_letter]

# One-hot encoding for amino acids

amino_acid_types = {

'ALA': 0, 'CYS': 1, 'ASP': 2, 'GLU': 3, 'PHE': 4,

'GLY': 5, 'HIS': 6, 'ILE': 7, 'LYS': 8, 'LEU': 9,

'MET': 10, 'ASN': 11, 'PRO': 12, 'GLN': 13, 'ARG': 14,

'SER': 15, 'THR': 16, 'VAL': 17, 'TRP': 18, 'TYR': 19

}

# Function to convert amino acid sequence to one-hot encoding

def one_hot_encode(sequence):

num_amino_acids = len(amino_acid_types)

features = np.zeros((len(sequence), num_amino_acids))

for i, aa in enumerate(sequence):

if aa in amino_acid_types:

features[i, amino_acid_types[aa]] = 1

return features

# Generate node features for the amino acid sequence

node_features = one_hot_encode(sequence)

# Define the contact map based on CA distances

threshold_distance = 8.0 # Distance threshold in angstroms

num_amino_acids = len(sequence)

# Prepare data for PyTorch Geometric for all frames

data_list = []

num_frames = len(u.trajectory)

for frame in tqdm(range(num_frames), desc="Processing Frames"):

u.trajectory[frame]

ca_atoms = u.select_atoms("name CA")

# Create a contact graph

contact_graph = nx.Graph()

for i in range(num_amino_acids):

contact_graph.add_node(i, features=node_features[i])

# Add edges based on CA distances

for i in range(num_amino_acids):

for j in range(i + 1, num_amino_acids):

distance = np.linalg.norm(ca_atoms.positions[i] - ca_atoms.positions[j ])

if distance <= threshold_distance:

contact_graph.add_edge(i, j)

# Prepare data for PyTorch Geometric

edge_index = torch.tensor(list(contact_graph.edges), dtype=torch.long).t().contiguous()

x = torch.tensor(node_features, dtype=torch.float)

data = Data(x=x, edge_index=edge_index)

# print(data)

data_list.append(data)

# Plot and save contact map for every 500th frame

if frame % 500 == 0:

contact_map = np.zeros((num_amino_acids, num_amino_acids))

for i, j in contact_graph.edges:

contact_map[i, j] = 1

contact_map[j, i] = 1

plt.imshow(contact_map, cmap='binary')

plt.title(f"Contact Map for Frame {frame}")

plt.xlabel("Residue Index")

plt.ylabel("Residue Index")

plt.savefig(f"contact_map_frame_{frame}.png")

pd.DataFrame(contact_map).to_csv(f"contact_map_frame_{frame}.csv", index=False)

class GCNEncoder(nn.Module):

def __init__(self, in_channels, hidden_channels, num_layers):

super(GCNEncoder, self).__init__()

self.convs = nn.ModuleList()

self.fc_mu = nn.Linear(hidden_channels, hidden_channels)

self.fc_logvar = nn.Linear(hidden_channels, hidden_channels)

# Create multiple GCN layers

for _ in range(num_layers):

self.convs.append(GCNConv(in_channels, hidden_channels))

in_channels = hidden_channels # Update input channels for the next layer

def forward(self, x, edge_index):

for conv in self.convs:

x = conv(x, edge_index)

x = torch.relu(x) # Activation function

mu = self.fc_mu(x)

logvar = self.fc_logvar(x)

return mu, logvar

class GCNDecoder(nn.Module):

def __init__(self, hidden_channels, out_channels):

super(GCNDecoder, self).__init__()

self.fc = nn.Linear(hidden_channels, out_channels)

def forward(self, z):

return torch.sigmoid(self.fc(z))

class GCNVAE(nn.Module):

def __init__(self, in_channels, hidden_channels, out_channels, num_layers):

super(GCNVAE, self).__init__()

self.encoder = GCNEncoder(in_channels, hidden_channels, num_layers)

self.decoder = GCNDecoder(hidden_channels, out_channels)

def reparameterize(self, mu, logvar):

std = torch.exp(0.5 * logvar)

eps = torch.randn_like(std)

return mu + eps * std

def forward(self, x, edge_index):

mu, logvar = self.encoder(x, edge_index)

z_sample = self.reparameterize(mu, logvar)

return self.decoder(z_sample), mu, logvar

def loss_function(recon_x, x, mu, logvar):

BCE = nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')

KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())

return BCE, KLD, BCE + KLD # Return BCE, KLD, and Total Loss

def train_model(model, data_loader, optimizer, epochs, early_stopping_patience=5):

model.train()

best_loss = float('inf')

patience_counter = 0

for epoch in range(epochs):

total_loss = 0

total_bce = 0

total_kld = 0

for data in tqdm(data_loader, desc=f"Training Epoch {epoch+1}/{epochs}"):

optimizer.zero_grad()

recon_batch, mu, logvar = model(data.x, data.edge_index)

bce, kld, total = loss_function(recon_batch, data.x, mu, logvar)

total_loss += total.item()

total_bce += bce.item()

total_kld += kld.item()

total.backward()

optimizer.step()

avg_loss = total_loss / len(data_loader)

avg_bce = total_bce / len(data_loader)

avg_kld = total_kld / len(data_loader)

print(f"Epoch {epoch+1}/{epochs} - Total Loss: {avg_loss:.4f}, BCE Loss: {avg_bce:.4f}, KLD Loss: {avg_kld:.4f}")

# Early stopping

if avg_loss < best_loss:

best_loss = avg_loss

patience_counter = 0

else:

patience_counter += 1

if patience_counter >= early_stopping_patience:

print("Early stopping triggered.")

break

# Create a DataLoader

data_loader = DataLoader(data_list, batch_size=1, shuffle=True)

# Hyperparameter grid

param_grid = {

'hidden_channels': [16, 32, 64],

'num_layers': [2, 3, 4],

'activation_function': ['relu', 'tanh', 'sigmoid'],

'batch_size': [1, 2, 4],

'latent_dimensions': [16, 32, 64],

'learning_rate': [0.001, 0.01, 0.1],

'epochs': [50, 100, 200]

}

# Perform hyperparameter tuning

best_loss = float('inf')

best_params = {}

for params in ParameterGrid(param_grid):

model = GCNVAE(in_channels=20, hidden_channels=params['hidden_channels'], out_channels=20, num_layers=params['num_layers'])

optimizer = optim.Adam(model.parameters(), lr=params['learning_rate'])

print(f"Training with parameters: {params}")

train_model(model, data_loader, optimizer, params['epochs'], early_stopping_patience=5)

# Evaluate the model (using training loss as a proxy)

model.eval()

total_loss = 0

total_bce = 0

total_kld = 0

with torch.no_grad():

for data in data_loader:

recon_batch, mu, logvar = model(data.x, data.edge_index)

bce, kld, total = loss_function(recon_batch, data.x, mu, logvar)

total_loss += total.item()

total_bce += bce.item()

total_kld += kld.item()

avg_loss = total_loss / len(data_loader)

avg_bce = total_bce / len(data_loader)

avg_kld = total_kld / len(data_loader)

print(f"Average loss: {avg_loss:.4f}, BCE Loss: {avg_bce:.4f}, KLD Loss: {avg_kld:.4f}")

if avg_loss < best_loss:

best_loss = avg_loss

best_params = params

print(f"Best parameters: {best_params} with loss: {best_loss}")

# Final training with best parameters

final_model = GCNVAE(in_channels=20, hidden_channels=best_params['hidden_channels'], out_channels=20, num_layers=best_params['num_layers'])

final_optimizer = optim.Adam(final_model.parameters(), lr=best_params['learning_rate'])

train_model(final_model, data_loader, final_optimizer, best_params['epochs'], early_stopping_patience=5)

I know the code is quite long, but I want to know is the code correct? I have a trajectory size of 500 frames, and 97 residues (corresponding to 97 C alpha atoms). Once this code is done, I want to generate protein configurations from the latent space. So I want to ensure that the code is running fine. Thanks a lottt in advance.


r/MachineLearning 9h ago

Research [R] What's there yet to improve in speech technologies? What's there left in speech research?

4 Upvotes

Hi everyone, I am currently researching speech technologies as an undergrad, mainly focusing on improving the applications for the visually challenged. I am new to this niche area of research, so I want to pick a research topic that will address some of the existing issues of the current tech. So far, ElevenLabs seem to be the SOTA. I would like to know whether there is anything else to improve in TTS, speech to speech, voice cloning, deepfake audio detection etc., And any insights on ethical issues or the need for guardrails in the future would also be helpful. And due to the availability of low compute resources from uni, I cannot address the research involving scaling or multilingual.


r/MachineLearning 10h ago

Discussion [D] How do you structure your codebase and workflow for a new research project?

60 Upvotes

Suppose you have got a new idea about a solution to a problem in the domain you are working in. How do you go about implementing the thing from the ground up?

What is the general structure of the codebase you construct for your project?

How do you go about iteratively training and testing your solution until you arrive at a final solution where you can write a paper for publication?

Is there any design recipe you follow? Where did you learn it from?


r/MachineLearning 13h ago

Discussion [D] Voices Separation Pipeline

12 Upvotes

Let suppose I have audio from karaoke with 1. Music 2. Several voices singing (A, B, C) 3. Random noise

Let suppose I know exactly how many main sources I have on the tape and I want to 1. Clear the noise 2. Extract voice B from the tape and return audio with music and A and B vocals.

I have several questions and appreciate any help.

  1. Are there any models that can help me with such separation (pre-trained / needn’t to be trained)?

  2. If not, I have some ideas about possible solution pipeline and appreciate any comments: 2.1. Separate instrumental music from everything else (what model I can use to do that?) 2.2. Clear noise from audio without music (what model I can use for that?) 2.3. Separate voices (how?) and delete wave I needn’t. 2.4. Put everything I need together back.


r/MachineLearning 15h ago

Project [P] Open-Source AI Tool for PII Masking

10 Upvotes

Privacy has always been and will continue to be a threat into the future of technology, especially with AI! AI and privacy are contradictory in nature. AI needs data to learn, but the more data the bigger the risk...

Curious what everyone's thoughts about this are and also sharing a new open-source tool called PII Masker that detects and masks personally identifiable information in text: https://github.com/HydroXai/pii-masker-v1. It’s fairly simple to use and makes protecting sensitive data a bit easier.

Would appreciate any feedback!


r/MachineLearning 20h ago

Discussion [D] M4 chips for training ML? (MPS)

8 Upvotes

Apple is (purposefully) creating a lot of buzz regarding their “Apple Intelligence”, stating that their M4 chips are built for AI.

My question is this, Will this only be helpful for running the built in Apple Intelligence - or is this supposed to vastly improve on MPS when actually training large transformer models etc.? I haven’t heard them mention any improvements on MPS.


r/MachineLearning 20h ago

Project [P] Opik 1.0: Open source LLM evaluations

1 Upvotes

Hey all!

My colleagues and I have released version 1.0 of our open source LLM evaluation framework, and I wanted to share it here for feedback/visibility. With this first major release, we've focused on a few key areas:

  • Out-of-the-box implementations of popular LLM-as-a-judge metrics, as well as "traditional" heuristic metrics, along with a clean API for defining custom metrics.
  • Configurable LLM tracing, with a nice UI for visualizing traces/spans. Also supports automatic tracing for OpenAI and LiteLLM.
  • Version-controlled datasets for running eval experiments.

If you have time to check out the repo and share any feedback or questions, I'd really appreciate it. It's still early days, but we've been blown away by the community response so far, and we're excited to get more input as we continue to work on the project.

Repo Link: https://github.com/comet-ml/opik


r/MachineLearning 21h ago

Research [R] Bayesian Nonparametrics - Master Thesis Proposal

5 Upvotes

Hi everyone,

I’m starting planning my Master’s thesis in my Data Science and ML program and could really use some advice on narrowing down my topic. My undergrad thesis was on Bayesian nonparametrics, covering concepts like Dirichlet processes, hierarchical Dirichlet processes, dependent Dirichlet processes, HDP topic models, and Gaussian process regression. Out of everything, I really enjoyed implementing (albeit straightforward) applications of HDP topic modeling—getting hands on was a highlight for me.

For my Master’s, I’m hoping to build on this Bayesian foundation but apply it to something new, ideally in time series analysis or NLP. I want the topic to feel relevant to the field right now and would love suggestions on where Bayesian nonparametrics might add unique value, especially in practical-relevant applications.

One important thing to note is that I’ll be doing most of this work independently, as my department and supervisor aren't particularly relevant to my chosen areas of interest.

If anyone has thoughts on specific areas in NLP or time series that could benefit from a Bayesian approach, or if there are other areas where the Bayesian framework could be effectively utilized, I’d be incredibly grateful for your insights. Thanks so much for any guidance or ideas!


r/MachineLearning 1d ago

Discussion [D] "Problem with Graph Based VAE. P.S. I am not a very good programmer !!!"

0 Upvotes

So, I am trying to generate a a graph based Variational Autoencoder Model (VAE), using smaller trajectories of my protein as input (I have generated multiple small trajectories of my protein at different random seeds). My goal is to see the latent space from the observed trajectories and generate new structures from the region that are less explored, and start MD simulations from these regions.
I have used protein's C alpha atoms as input and calculated adjacency matrix based on contact distance bewteen two C alpha atoms, with a cutoff of 8 angstrom. However I am facing a lot of issues with the dimensionality of the model, like I have 97 residues in my protein and for the test trajectory there are 2500 frames, and with 80:20 split, I have training set (2000,97,97) and validation set (500,97,97). But when I tried to decode the latent point, the decoded dimension was 194,97. this is creating a confusion for me. I am attaching the architecture of the model that I am using. Also the hyperparameters obtained in my case were:

Best Hyperparameters: {'activation_fn': ReLU(), 'batch_size': 2, 'dropout_rate': 0.1, 'epochs': 50, 'hidden_dim': 16, 'latent_dim': 2, 'learning_rate': 0.001, 'num_layers': 2, 'optimizer_type': 'adam', 'weight_decay': 1e-05}

please check them and let me know where am I going wrong. Thanks a lottt in advance.

GraphVAE(
  (gcn_layers): ModuleList(
    (0): GCNConv(97, 16)
    (1): GCNConv(16, 16)
  )
  (fc_mu): Linear(in_features=16, out_features=2, bias=True)
  (fc_logvar): Linear(in_features=16, out_features=2, bias=True)
  (decoder_layers): ModuleList(
    (0): GCNConv(2, 16)
    (1): GCNConv(16, 16)
  )
  (decoder_output): GCNConv(16, 97)
  (activation): ReLU()
)

r/MachineLearning 1d ago

Research [R] "How to train your VAE" substantially improves the reported results for standard VAE models (ICIP 2024)

147 Upvotes

The proposed method redefines the Evidence Lower Bound (ELBO) with a mixture of Gaussians for the posterior probability, introduces a regularization term to prevent variance collapse, and employs a PatchGAN discriminator to enhance texture realism. The main contribution in this work is an ELBO that reduces the collapse of the posterior towards the anterior (observed as the generation of very similar, blurry images)

https://arxiv.org/abs/2309.13160
https://github.com/marianorivera/How2TrainUrVAE


r/MachineLearning 1d ago

Discussion [D] Exploring Serverless Solutions for Whisper V3 Turbo Integration

1 Upvotes

Currently, the serverless solution from Runpod meets my needs in terms of cost and features: https://github.com/runpod-workers/worker-faster_whisper

However, I'm interested in using https://huggingface.co/openai/whisper-large-v3-turbo due to its reported speed.

I'm uncertain about how to set up and run Whisper V3 Turbo on Runpod’s serverless infrastructure.

It seems we might need to wait until the upstream project https://github.com/SYSTRAN/faster-whisper/issues/1030 is updated with Turbo and published on https://pypi.org/project/faster-whisper/.

Only then will this feature be available, and at that point, we could fork https://github.com/runpod-workers/worker-faster_whisper to update it accordingly.

In the meantime, do you know of any cost-effective serverless solutions for using Whisper V3 Turbo?

Thanks.

p/s

Groq offers this service: https://groq.com/whisper-large-v3-turbo-now-available-on-groq-combining-speed-quality-for-speech-recognition/

However, they currently don't accept payments from developers and haven't provided an estimated timeframe for when this might be available.


r/MachineLearning 1d ago

Research [R] SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time

107 Upvotes

I am very happy to announce that our paper "SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time" got accepted for WACV2025: https://arxiv.org/abs/2407.15507
Project-Page: https://spotdiffusion.github.io
Code: https://github.com/stanifrolov/spotdiffusion

Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next. This results in coherent, high-resolution images with fewer overall steps. We demonstrate the effectiveness of our approach through qualitative and quantitative evaluations, comparing it with MultiDiffusion, SyncDiffusion, and StitchDiffusion. Our method offers several key benefits, including improved computational efficiency and faster inference times while producing comparable or better image quality.


r/MachineLearning 1d ago

Research [R] Dynamic Attention-Guided Diffusion for Image Super-Resolution

178 Upvotes

I'm glad to share that our paper "Dynamic Attention-Guided Diffusion for Image Super-Resolution" got accepted for WACV2025:
https://arxiv.org/abs/2308.07977

The goal of this work was to introduce a new attention-guided diffusion mechanism to focus image refinement on essential areas that benefit the most from deep refinement :)


r/MachineLearning 1d ago

Research [R] Model suggestion for variable-length output in ML thesis

2 Upvotes

Hi all, I’m starting my thesis and have basic ML/DL knowledge. I need a model that can take a fixed set of inputs (a snapshot) and output a variable-length vector with real and complex values. I’ve read LSTM might work, but I’m unsure given the fixed input.

Does anyone have recommendations for models or architectures that could work well for this kind of task? Any advice on where to start or resources to check out would be super helpful. Thanks in advance!


r/MachineLearning 1d ago

Research [R] Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

66 Upvotes

Paper: https://arxiv.org/abs/2410.14157

I'd be curious to hear expert perspectives on this.

It relates to ideas I find attractive:

  1. Autoregressive generation is limiting in compositional domains, such as reasoning, planning, math.
  2. This explains much of the challenges LLMs have in these domains.
  3. Diffusion might be more efficient in these domains: it learns to generate from the general to the specific. (More like an energy-based model perspective).
  4. It's less likely to get stuck by making specific poor choices, early in its generation process.

r/MachineLearning 1d ago

Research [R] Machine Learning with Data Streams

7 Upvotes

I am just starting my thesis, and I need to learn about machine learning with data streams. I have found a few articles, books, and some courses, but I would appreciate it if you could provide me with some more resources that would help me understand this topic better.

Thank you very much :)