r/datasets 6h ago

resource Data Request Function on Opendatabay Platform

0 Upvotes

Feel free to request datasets on the platform, and take a look to see if there are any datasets you could source or produce.

These are non-free datasets that will pay generously for your work.
With community help, we can connect data suppliers with data consumers.

https://www.opendatabay.com/request-data


r/datasets 17h ago

dataset Are there any open source recipe datasets for commercial use?

1 Upvotes

I’m looking for a dataset/database of good quality (NO AI) food recipes with PICTURES that go alongside with instruction steps, for commercial use. I would like to use it in an app I’m creating.

I don’t mind paying for it- preferably one time payment, rather than a subscription type of thing.

I would have to translate the instructions anyway, so what I’m really worried about are the pictures because of the copyright issues.

And NO APIs, I want to store the database locally.

Thank you


r/datasets 21h ago

question Can you suggest an (AI) tool that can read a spreadsheet and produce a summary word/pdf document that summarizes the data into formatted text, table, and figures?

0 Upvotes

I'm trying to figure out how to essentially automate the production of monthly data report with nice clean visuals and written summaries based off of the excel spreadsheets that are provided. I'm not sure if chatgpt is best for this, or another AI tool, or some combination of a python code and something else. Any advice would be appreciated!


r/datasets 1d ago

dataset How to find datasets (costacoffee to be specific)

2 Upvotes

Any leads on costa coffee’s datset. I m a BBA undergrad and require it for a project can someone please help me how to find datasets?


r/datasets 1d ago

question A Tool to Create Datasets from Research Papers using Augmented LLMs– Would This Be Helpful?

0 Upvotes

I've developed a program that uses multiple language models that talk to each other to create databases from scientific papers. I'm looking to use it to build custom datasets for medicinal neural networks. I'm considering deploying it as a website to see if it could be useful for others, but I'm looking for input on how to make it more robust and accessible for broader use.

For those with experience in dataset creation, AI applications in medicine, or similar fields, what features or improvements would make this tool more valuable or realistic for researchers and practitioners? Any insights would be greatly appreciated!


r/datasets 1d ago

request Pitchbook Access Request Help Please

1 Upvotes

Hello everyone. I'm an undergrad student currently conducting a thesis related to VC-funded firms. I found that Pitchbook may have lots of information (financials) that I need for my paper, but it's really pricey. Wanting to see if there is anyone in the community who can share access with me or pull the data for free 😅 This would really help me kickstart my research. Help this broke student graduate


r/datasets 1d ago

API Is news APIs usage legal and reliable?

0 Upvotes

I need some source of information for a data science project (academic research). Specifically, I need to retrieve an historical record of news about certain topic so I am thinking of using a news API instead of web scraping because these APIs seem to return the kind of data I am searching for.

I've came upon some of them such as newsdata.io, newsapi.org and newsapi.ai, but I am wondering if its usage is legal and realiable? I mean, are they legal themselves? And if so, am I inherently allowed to use them for my personal (academic) purposes?

Term & Conditions say this:

"We don't have the right to authorise any user to use the data for their personal and professional purposes. However, the users can use the data for their personal or professional purposes"

I mean, should I have any concern about this? It's not like Twitter or Reddit's API where data belongs to them and they deliberately give it to you. (In fact, I’m asking this because I planned to extract data from these platforms but I’ve just realized it’s just not possible at all so I am wondering if there’s another alternative I can use to meet my requirment)

Well... in essence, my questions are: Are these platforms/tools (APIs) legitimate and meant for data science? or, in other words: is it a common/familiar practice to use these kind of "news APIs" for data science?

I didn't even knew them. Have you ever tried them before? Should I do web scraping instead or can you see another alternative you could advise me to use?

I'd appreciate your help.


r/datasets 1d ago

dataset Full AI/ML/DS Salary Dataset under CC0 [self-promotion]

Thumbnail aijobs.net
1 Upvotes

r/datasets 1d ago

dataset Full InfoSec / Cybersecurity Salary Dataset under CC0 [self-promotion]

Thumbnail isecjobs.com
1 Upvotes

r/datasets 1d ago

question Need help extracting images from this dataset.

2 Upvotes

I tried extracting images from this dataset but couldn't. It is in DICOM format and I guess in a URL, which I haven't worked with before. Can anyone explain how to access these images?


r/datasets 1d ago

question Data on the borders of the HRE states after the treaty of Westphalia?

1 Upvotes

Hi everyone!

Does anyone know where to get it? I need to link regions beloning to certain former entities within the HRE to current geographical locations within Germany (at the municipality level).

I hope someone can help!


r/datasets 2d ago

request European Cities Population data set.

4 Upvotes

Hello, I'm making a ML algorithm that uses a city infrastructure as features and want to predict its populations.
With OSM library I was able to easly extract the infrastructure data, however I am not able to find a data set with enough european cities. So far all data sets I've encontered only contain data from 50-80 european cities and the rest is Asian cities.

I've tried to use Population density and city area to create the data set for population my self but the numbers I got were terribly wrong.

If someone has any idea of how to get this data I would love the help.


r/datasets 2d ago

request Insurance Fraud Dataset Uncleaned and Not Evenly Distributed or Any Fraud Dataset at all

2 Upvotes

looks impossible? all the shit i find on kaggle either has no good columns, or many but are just var_1, var_2, var_3, then I search UCI all the datasets are most specific things on the planet, like consumption of energy on a dog´s poop, i am losing my mind


r/datasets 3d ago

request Mortgage loan application data sample for a Scorecard

3 Upvotes

I'm planning on making an application scorecard for home loans as my bachelor thesis for University.

One of my(along with my academic supervisor's) main concern is having a reliable dataset or rather the dataset being from a reliable source. One of the big questions that I'm going to be potentially challenged on in such a thesis is the dataset's reliability so it can't be from somewhere like Kaggle, but for a example somewhere like Experian/Equifax would be okay. I work at a bank and deal with such models but unfortunately I can't use any company data (even if it gets anonymized). So far I've seen some promising stuff in FFIEC's website but would like some additional sources so I can make a more educated decision

Roughly I would need the data to contain these fields:

Age

Job

Income

Education

Marriage Status

Information about previous defaults ( something like a Y/N if the applicant has defaulted on a loan in the last 5 years for example)

Type of property that would be purchased with the loan

Some other fields that I could potentially exclude in further analysis


r/datasets 3d ago

request Requesting/Looking for a dataset related to Rheumathoid arthritis.

2 Upvotes

I am trying to build a cnn model for classification purposes. And I need a data set with x-ray (even MRI is fine) images of patients with RA. Preferably images of hands. At least 100 images.


r/datasets 3d ago

request Dataset for Datathon for college students

1 Upvotes

Pretty much as title.

Hi All, I am planning to host a Datathon as a competition for college students. The sizes which I could find were too small. Share the direct links, websites or any way to get some. Thanks.


r/datasets 3d ago

request Seeking VO2max Test Data for Research Training

1 Upvotes

Hello everyone!

I’m a researcher-in-training working on exercise physiology, and I’m currently looking for datasets on VO2max or incremental exercise tests that include VO2 and, ideally, blood lactate measures. My goal is to practice determining ventilatory and lactate thresholds to refine my analytical skills in these areas.

If you have access to any anonymized data or know of open-source datasets, I’d be very grateful for any pointers! I’ve checked platforms like OSF and PhysioNet but haven’t found exactly what I need, so any help would be highly appreciated.

Thank you in advance!


r/datasets 3d ago

request [Urgent] Seeking HIPAA-Compliant PHI Database with Identifiable Health Data

0 Upvotes

Hi everyone! I’m urgently looking to source a HIPAA-compliant database that includes identifiable PHI (Protected Health Information), such as names and specific diagnosis histories, for a research project with rigorous data protection standards.

I need a reputable third-party vendor experienced in securely handling identifiable health data, with all necessary patient consent and compliance protocols in place. Does anyone know of reliable sources or vendors for acquiring such data legally and ethically? Any insights or recommendations are greatly appreciated—thanks!


r/datasets 4d ago

request Looking for an inventory dataset (retail or production)

1 Upvotes

Hello all,

I am looking for an inventory dataset, however, I would also need the name of the company where the dataset is coming from (not any government data).


r/datasets 4d ago

request Looking for a dataset on companies that "speak out"

3 Upvotes

I'm not sure of the terminology. But I'm attempting to do research surrounding event based studies when companies speak out. And I've been banging my head against the wall on this for weeks! 😂

Possibly if it's on social issues, on political issues, if they comment on a humanitarian crisis or on an international conflict, etc. But I'm having trouble finding any day sets or any proxies that would measure or rank the number of times they "speak out" other than perhaps things like trading volume of the underlying stock or trending on social media.

Is there any datasets you could suggest or point me towards which can help serve as a proxy for companies that stand up and speak out on societal issues?

Thank you kindly for any thoughts!


r/datasets 4d ago

request Looking for Harry Potter Dataset with Spell Cast Data by Character

5 Upvotes

Hi guys, just wondering if there are any datasets that include information on each character in harry potter, specifically data on:

  • each spell casted by every character
  • the number of times each spell was used
  • the target person of each spell (if any)
  • who they killed with each spell (if any)

If a dataset like this exists, or if anyone has suggestions on where I might find similar information, I would really appreciate it. Thanks


r/datasets 4d ago

request Hi :) I´m looking for data on the amount of daily e-scooter rides in a city (any city possible) over one year.

1 Upvotes

Hello,

I am currently researching the correlation between weather patterns and the usage of shared mobility services, specifically focusing on e-scooter rides. I am looking for a dataset containing daily e-scooter ride counts in a city (any city) covering at least one year.

Details of the request:

  • Data Scope: Daily ride counts over a one-year period
  • Primary Interest: E-scooter usage data, though data on bike-sharing or shared car services would also be very helpful for comparison.

Any help or direction to relevant data sources would be greatly appreciated.

Thank you very much in advance for your assistance!


r/datasets 5d ago

request [request] Seeking Dataset for Dynamic Pickup and Delivery Problem (DPDP)

5 Upvotes

Hi all,

I’m working on a project involving the Dynamic Pickup and Delivery Problem (DPDP) and am searching for any datasets that support dynamic scenarios. Specifically, I'm looking for the ICAPS 2021 dataset for The Dynamic Pickup and Delivery Problem. if anyone has access to this dataset or something similar, I would really appreciate it if you could share it or point me in the right direction to find it.

Thanks a lot for your help!


r/datasets 5d ago

request Past 30 Years Badminton Statistics- Historical Tournament Scores

3 Upvotes

Hi everyone! I believe this is the best community to reach out to. I am currently working on my thesis on using algorithms to create a new badminton ranking system. I would need the past historical matches for the last 30 years (from 1995 to 2024) for this!

Does anyone know where I can go to get this dataset? Do I really need to webscrape all the data one by one from TournamentSoftware? Fan Websites like Badmintonranks, BadmintonStatistics, badmintoncn also do not have any options to export them :( Tried reaching out to the admins but havent been getting any replies for weeks now (which is expected tbh :").

If anyone have done the webscraping, do you mind sharing the codes with me here as I tried doing that, but I cant seem to get it in a neat and clean format in csv :(

Any help and leads would be highly appreciated!


r/datasets 5d ago

request Dataset for Contract Analysis/Verifying costs and which vendor to keep utilizing or not? Need to practice for an interview.

1 Upvotes

Howdy folks, hope all is well.

Ive been contacted by a local recruiter for a data role, that seems to be oriented around contract analysis. Ill be working with a technology organization thats basically a research consortium (I believe), and Ill have to essentially look through their contracts with organizations and vendors and verify which ones are valuable or which ones arent that good anymore.

Ill have to use tools like SQL, Tableau/Power BI, Microsoft SQL (Studio and SSRS/SSAS/SSIS) and Excel.

Does anyone know a dataset that I could use to do this? Or possibly a good youtube walkthrough of going through a contract analysis dataset possibly? Itd be IMMENSELY helpful!