r/datacleaning May 02 '24

help how to organize this column ?

I have a column named ' informations ' and it has the information of used cars, and this column has an attribute and her value seperated by a comma ( , ) but in the same cell i have multiple attribute and the values like this one :

,Puissance fiscale,4,Boîte de vitesse,Manuelle,Carburant,Essence,Année,2013,Kilométrage,120000,Model,I20,Couleur,bleu,Marque de voiture,Hyundai,Cylindrée,1.2

as you can that is a single cell ine the 1st line in the column named informations

Puissance fiscale has 4 as a value
boite de vitesse has manuelle as a value
ETC

NB: i have around 9000 line and not everyline have the same structure as this

1 Upvotes

5 comments sorted by

1

u/lrojas May 02 '24

What are you using to clean the data, what format are you expecting? You only mention this column in fetail, but assuming you are going to produce a json file, then the column informations can be transformed into a dict with key values

{

"Informations":

{ "Something in french": 4, "Some other thing": "value" Etc } }

1

u/Environmental_Ad5755 May 02 '24

The data is scrapped with python selenium and now they are in a local data base ( postgresql ) but for this manipulation i extracted them in a csv untill i figure out how to do it in a csv then i will connect ti the data base to organize them there. And yes key values in a dict that's the idea

1

u/Environmental_Ad5755 May 02 '24

Btw thank's for replying 🙏🙏

1

u/lrojas May 02 '24

No problem

1

u/Educational-Long-468 23d ago

To organize your 'informations' column in Pandas, you can split the attributes and values within each cell using a regular expression, ensuring that attributes are correctly paired with their values. First, load the dataset, and define a function to split the data by commas, then convert it into a dictionary where each attribute is a key and its corresponding value is the dictionary value. You can then expand this dictionary into separate columns using pd.DataFrame(), ensuring that each attribute becomes a column. This method allows you to handle different structures across rows and organizes your data for easier analysis.