반응형
단원별 심화 연습 문제¶
In [1]:
!pip install seaborn==0.13.0
Defaulting to user installation because normal site-packages is not writeable
Collecting seaborn==0.13.0
Downloading seaborn-0.13.0-py3-none-any.whl (294 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 294.6/294.6 kB 2.1 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: matplotlib!=3.6.1,>=3.3 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (3.6.0)
Requirement already satisfied: pandas>=1.2 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.4.2)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in ./.local/lib/python3.9/site-packages (from seaborn==0.13.0) (1.23.3)
Requirement already satisfied: cycler>=0.10 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (0.11.0)
Requirement already satisfied: python-dateutil>=2.7 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (9.3.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.4.4)
Requirement already satisfied: fonttools>=4.22.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (4.38.0)
Requirement already satisfied: pyparsing>=2.2.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (3.0.9)
Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (21.3)
Requirement already satisfied: contourpy>=1.0.1 in ./.local/lib/python3.9/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.0.6)
Requirement already satisfied: pytz>=2020.1 in ./.local/lib/python3.9/site-packages (from pandas>=1.2->seaborn==0.13.0) (2022.5)
Requirement already satisfied: six>=1.5 in ./.local/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.3->seaborn==0.13.0) (1.16.0)
Installing collected packages: seaborn
Attempting uninstall: seaborn
Found existing installation: seaborn 0.12.0
Uninstalling seaborn-0.12.0:
Successfully uninstalled seaborn-0.12.0
Successfully installed seaborn-0.13.0
[notice] A new release of pip available: 22.2.2 -> 24.1.1
[notice] To update, run: pip install --upgrade pip
In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns
import glob
# set floating point formatting
pd.options.display.float_format = '{:,.6f}'.format
Q3¶
범위
- (이전 범위 포함)
- 결측치 채우기
In [3]:
import seaborn as sns
titanic = sns.load_dataset('titanic')
titanic.head()
Out[3]:
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | male | 22.000000 | 1 | 0 | 7.250000 | S | Third | man | True | NaN | Southampton | no | False |
| 1 | 1 | 1 | female | 38.000000 | 1 | 0 | 71.283300 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | female | 26.000000 | 0 | 0 | 7.925000 | S | Third | woman | False | NaN | Southampton | yes | True |
| 3 | 1 | 1 | female | 35.000000 | 1 | 0 | 53.100000 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | male | 35.000000 | 0 | 0 | 8.050000 | S | Third | man | True | NaN | Southampton | no | True |
각 컬럼별 결측치를 출력하세요
In [5]:
# 코드를 입력해 주세요
titanic.isnull().sum()
Out[5]:
survived 0 pclass 0 sex 0 age 177 sibsp 0 parch 0 fare 0 embarked 2 class 0 who 0 adult_male 0 deck 688 embark_town 2 alive 0 alone 0 dtype: int64
[출력 결과]
survived 0 pclass 0 sex 0 age 177 sibsp 0 parch 0 fare 0 embarked 2 class 0 who 0 adult_male 0 deck 688 embark_town 2 alive 0 alone 0 dtype: int64
age 컬럼의 결측 데이터를 다음의 조건에 맞도록 채워 주세요
who가man인 데이터에서age가 결측치인 데이터의 값을 남자 나이의 median값으로 결측치를 채워 주세요who가woman인 데이터에서age가 결측치인 데이터의 값을 여자 나이의 25% Quantile값으로 결측치를 채워 주세요who가child인 데이터에서age가 결측치인 데이터의 값을 아이 나이의 평균값으로 결측치를 채워 주세요
In [6]:
# 코드를 입력해 주세요
df = titanic
# 각 그룹별 통계값 계산
man_median_age = df[df['who'] == 'man']['age'].median()
woman_quantile_age = df[df['who'] == 'woman']['age'].quantile(0.25)
child_mean_age = df[df['who'] == 'child']['age'].mean()
# 결측치 채우기
df.loc[(df['who'] == 'man') & (df['age'].isna()), 'age'] = man_median_age
df.loc[(df['who'] == 'woman') & (df['age'].isna()), 'age'] = woman_quantile_age
df.loc[(df['who'] == 'child') & (df['age'].isna()), 'age'] = child_mean_age
In [7]:
# 검증코드
print(f"결측치: {titanic['age'].isnull().sum()}")
print(f"age mean: {titanic['age'].mean():.4f}")
결측치: 0 age mean: 29.3425
[출력 결과]
결측치: 0 age mean: 29.3425
제출¶
제출을 위해 새로 로드된 타이타닉 데이터셋에서 age 컬럼의 결측치를 다음 조건에 맞춰 채운 결과를 result_df에 저장하세요.
who가man인 데이터에서age가 결측치인 데이터의 값을 남자 나이의 median값으로 결측치를 채워 주세요who가woman인 데이터에서age가 결측치인 데이터의 값을 여자 나이의 25% Quantile값으로 결측치를 채워 주세요who가child인 데이터에서age가 결측치인 데이터의 값을 아이 나이의 평균값으로 결측치를 채워 주세요
In [10]:
titanic = sns.load_dataset('titanic')
df = titanic
# 각 그룹별 통계값 계산
man_median_age = df[df['who'] == 'man']['age'].median()
woman_quantile_age = df[df['who'] == 'woman']['age'].quantile(0.25)
child_mean_age = df[df['who'] == 'child']['age'].mean()
# 결측치 채우기
df.loc[(df['who'] == 'man') & (df['age'].isna()), 'age'] = man_median_age
df.loc[(df['who'] == 'woman') & (df['age'].isna()), 'age'] = woman_quantile_age
df.loc[(df['who'] == 'child') & (df['age'].isna()), 'age'] = child_mean_age
result_df = df
df.head()
Out[10]:
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | male | 22.000000 | 1 | 0 | 7.250000 | S | Third | man | True | NaN | Southampton | no | False |
| 1 | 1 | 1 | female | 38.000000 | 1 | 0 | 71.283300 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | female | 26.000000 | 0 | 0 | 7.925000 | S | Third | woman | False | NaN | Southampton | yes | True |
| 3 | 1 | 1 | female | 35.000000 | 1 | 0 | 53.100000 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | male | 35.000000 | 0 | 0 | 8.050000 | S | Third | man | True | NaN | Southampton | no | True |
반응형
'Biusiness Insight > Data Science' 카테고리의 다른 글
| [Python] Pandas 고급 전처리와 피벗테이블 (1) | 2024.06.30 |
|---|---|
| [Python] Pandas 전처리, 추가, 삭제, 데이터 변환 (0) | 2024.06.30 |
| [Python] Pandas 복제, 결측치 (0) | 2024.06.30 |
| [Python] Pandas 통계 실습 (0) | 2024.06.30 |
| [Python] Pandas 통계 (0) | 2024.06.30 |