Autor/a

Carlos Lesmes

Fecha de publicación

1 de julio de 2025

1 Traer el conjunto de datos tips

Código
import plotly.express as px
tips = px.data.tips()
tips
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
... ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2

244 rows × 7 columns

2 Usar pandas para el manejo de dataframes

Código
import pandas as pd

3 Explorar un dataframe

Código
tips.dtypes
total_bill    float64
tip           float64
sex            object
smoker         object
day            object
time           object
size            int64
dtype: object
Código
tips.head
<bound method NDFrame.head of      total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2

[244 rows x 7 columns]>
Código
tips.tail
<bound method NDFrame.tail of      total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2

[244 rows x 7 columns]>
Código
tips.head(3)
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
Código
tips.index
RangeIndex(start=0, stop=244, step=1)
Código
tips.describe()
total_bill tip size
count 244.000000 244.000000 244.000000
mean 19.785943 2.998279 2.569672
std 8.902412 1.383638 0.951100
min 3.070000 1.000000 1.000000
25% 13.347500 2.000000 2.000000
50% 17.795000 2.900000 2.000000
75% 24.127500 3.562500 3.000000
max 50.810000 10.000000 6.000000
Código
tips.columns
Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')
Código
tips.sort_values(by="tip")
total_bill tip sex smoker day time size
67 3.07 1.00 Female Yes Sat Dinner 1
236 12.60 1.00 Male Yes Sat Dinner 2
92 5.75 1.00 Female Yes Fri Dinner 2
111 7.25 1.00 Female No Sat Dinner 1
0 16.99 1.01 Female No Sun Dinner 2
... ... ... ... ... ... ... ...
141 34.30 6.70 Male No Thur Lunch 6
59 48.27 6.73 Male No Sat Dinner 4
23 39.42 7.58 Male No Sat Dinner 4
212 48.33 9.00 Male No Sat Dinner 4
170 50.81 10.00 Male Yes Sat Dinner 3

244 rows × 7 columns

Código
tips['smoker']
0       No
1       No
2       No
3       No
4       No
      ... 
239     No
240    Yes
241    Yes
242     No
243     No
Name: smoker, Length: 244, dtype: object
Código
tips[0:3]
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3

4 Importar seaborn para graficar

Para más información vea seaborn y Waskom (2021) .

Código
import seaborn as sns
import matplotlib.pyplot as plt 
sns.set_theme()

4.1 conjuntos de datos en seaborn

Código
sns.get_dataset_names()
['anagrams',
 'anscombe',
 'attention',
 'brain_networks',
 'car_crashes',
 'diamonds',
 'dots',
 'dowjones',
 'exercise',
 'flights',
 'fmri',
 'geyser',
 'glue',
 'healthexp',
 'iris',
 'mpg',
 'penguins',
 'planets',
 'seaice',
 'taxis',
 'tips',
 'titanic']

4.1.1 Boxplot

Código
sns.boxplot(x=tips["total_bill"])

4.1.2 Densidad con rug datos al eje

Código
sns.kdeplot(data=tips, x="total_bill")
sns.rugplot(data=tips, x="total_bill")

4.1.3 Diagrama de dispersión

Código
sns.relplot(
    data=tips,
    x="total_bill", y="tip", col="time",
    hue="smoker", style="smoker", size="size",
)

4.1.4 Histograma con densidad

Código
sns.displot(data=tips, x="total_bill", col="time", kde=True)

4.1.5 Variable numérica y categórica

Código
sns.catplot(data=tips, kind="swarm", x="day", y="total_bill", hue="smoker")

4.2 Conjunto de datos penguins

Código
penguins = sns.load_dataset("penguins")
sns.jointplot(data=penguins, x="flipper_length_mm", y="bill_length_mm", hue="species")

4.2.1 Pares de variables

Código
sns.pairplot(data=penguins, hue="species")    

4.2.2 Dispersión

Código
sns.relplot(
    data=penguins,
    x="bill_length_mm", y="bill_depth_mm", hue="body_mass_g"
)

4.2.3 Histograma

Código
sns.histplot(data=penguins, x="flipper_length_mm", hue="species", multiple="stack")

Código
sns.displot(data=penguins, x="flipper_length_mm", hue="species", col="species")

Código
sns.relplot(data=tips, x="total_bill", y="tip", hue="time", col="day", col_wrap=2)

Dispersión por hora y día

Referencias

Waskom, Michael L. 2021. «seaborn: statistical data visualization». Journal of Open Source Software 6 (60): 3021. https://doi.org/10.21105/joss.03021.