일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
Tags
- Python Code
- 프로그래머스
- kaggle
- 파이썬
- Generator
- 715. Range Module
- iterator
- 운영체제
- Convert Sorted List to Binary Search Tree
- shiba
- Regular Expression
- Protocol
- 43. Multiply Strings
- Substring with Concatenation of All Words
- 밴픽
- attribute
- DWG
- 109. Convert Sorted List to Binary Search Tree
- Class
- 30. Substring with Concatenation of All Words
- concurrency
- Decorator
- 시바견
- 315. Count of Smaller Numbers After Self
- t1
- 컴퓨터의 구조
- Python
- LeetCode
- Python Implementation
- data science
Archives
- Today
- Total
Scribbling
Pandas Basics 본문
1. Pandas Series
1.1. Series
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
1.2. with index
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
1.3. set of (keys, values)
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories, index = ["day1", "day2"])
print(myvar)
2. DataFrame
2.1. DataFrame
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
2.2. locate row/rows
- single bracket returns pandas series
- double bracket returns pandas dataframe
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df.loc[0])
print(type(df.loc[0]))
print(df.loc[[0]])
print(type(df.loc[[0]]))
2.3. index naming
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
- now we can use names to locate rows
print(df.loc[['day1']])
3. Cleaning Data
Dataset
index,name,age,occupation,hobby,etc
0,morgan,29,student,game,coffee
1,sujan,27,doctor,hiking,
2,mark,92,,sleep,householder
3,kim,13,student,math,science
4,ihn,3s3,assa$>sin,killing,gunner
5,ahri,21,magician,flirting,fox
3.1. dropna(inplace=True)
df.dropna(inplace=True)
print(df)
3.2. fillna(value, inplace=True)
df.fillna('NULL', inplace=True)
print(df)
3.3 replace only for specific columns
df['occupation'].fillna('NULL', inplace=True)
print(df)
3.4. replacing using mean, med, mode, ...
df['age'].fillna(df['age'].mean(), inplace=True)
3.5. cleaning data
for i in df.index:
if not df.loc[i, 'age'].isnumeric():
ret = ''.join(c for c in df.loc[i, 'age'] if c.isnumeric())
df.loc[i, 'age'] = ret
print(df)
'Computer Science > Basics of Basics' 카테고리의 다른 글
JavaScript Basics (0) | 2022.08.25 |
---|