import pandas as pd
df = pd.read_csv('data/fruits.csv')
df



s = df.set_index('name')['fruits']
s.str.count(r'\|') + 1

name
Ana        5
Bill       2
Calvin     2
Dean       2
Elias      3
Felicia    2
George     5
Henry      2
Name: fruits, dtype: int64



s.str.contains('banana').sum()

3



s.str.contains('(?=.*orange)(?=.*banana)')

name
Ana         True
Bill       False
Calvin     False
Dean       False
Elias      False
Felicia    False
George      True
Henry       True
Name: fruits, dtype: bool



df1 = s.str.get_dummies('|')
df1



df1.sum(axis=1)

name
Ana        5
Bill       2
Calvin     2
Dean       2
Elias      3
Felicia    2
George     5
Henry      2
dtype: int64



df1['banana'].sum()

3



df1.query('orange + banana == 2')



df1 @ df1.T

Use the Pandas String-Only get_dummies Method to Instantly Restructure your Data

Attempt to answer questions in current form¶

Better formatting of the data¶

Master Data Analysis with Python¶

Master Data Analysis with Python

Start learning Data Science using Python with our free courses!

Register for a free account

	name	fruits
0	Ana	mango\|orange\|pear\|nectarine\|banana
1	Bill	orange\|peach
2	Calvin	pear\|mango
3	Dean	mango\|apple
4	Elias	nectarine\|pear\|mango
5	Felicia	mango\|pear
6	George	mango\|pear\|orange\|nectarine\|banana
7	Henry	orange\|banana

	apple	banana	mango	nectarine	orange	peach	pear
name
Ana	0	1	1	1	1	0	1
Bill	0	0	0	0	1	1	0
Calvin	0	0	1	0	0	0	1
Dean	1	0	1	0	0	0	0
Elias	0	0	1	1	0	0	1
Felicia	0	0	1	0	0	0	1
George	0	1	1	1	1	0	1
Henry	0	1	0	0	1	0	0