'Computer Science/Data Science' 카테고리의 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2025/10 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

목록Computer Science/Data Science (6)

Scribbling

Data Science 101

Data Science 101 1. Data Type and Null Check all_df.info() 1.1. Nullity #missing data total = all_df.isnull().sum().sort_values(ascending=False) percent = (all_df.isnull().sum()/all_df.isnull().count()).sort_values(ascending=False) missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent']) missing_data = missing_data.drop(['SalePrice'], axis=0) missing_data.head(20) 1.2. Featu..

Computer Science/Data Science 2022. 12. 5. 01:49

Pandas Operations Repository

1. Appending Rows to empty DataFrame tt = pd.DataFrame() tt = tt.append({'Date': date, 'Location': country, 'Sadness': arr[0], 'Anger': arr[1], 'Joy': arr[2], 'Optimism': arr[3]}, ignore_index=True) 2. Row Display Option pd.set_option('display.max_columns', 100)

Computer Science/Data Science 2022. 11. 20. 02:52

(py)Spark Basics

In this post, I will review basics of Spark, especially pySpark. Spark is a framework for handling big data and has a great strength in distributed system with multiple nodes. To install pyspark, simply 'pip install pyspark'. For demonstration, I will use 'heart.csv' dataset from https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset. Now, let's get down to the code. In pyspark, we ca..

Computer Science/Data Science 2022. 10. 6. 09:44

Google Cloud Platform Certificate; Professional Machine Learning Engineer

Architecture for MLOps using TFX, Kubeflow Pipelines, and Cloud Build Link: https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build TFX, Kubeflow 파이프라인, Cloud Build를 사용하는 MLOps 아키텍처 | 클라우드 아키텍처 센터 | Googl 의견 보내기 TFX, Kubeflow 파이프라인, Cloud Build를 사용하는 MLOps 아키텍처 이 문서에서는 TensorFlow Extended(TFX) 라이브러리를 사용하는 머신러닝(ML) 시스템의 전반적인 아키텍처를 설명합니다. 또한 C clou..

Computer Science/Data Science 2022. 4. 12. 19:22

데이터 분석 방법의 기초 - Kaggle 타이타닉 예제

데이터 분석 방법의 기초, 두번째 포스트 Kaggle 타이타닉 예제를 다룬다. https://www.kaggle.com/c/titanic 1. 데이터 살펴보기 아래 명령어로 pandas display 옵션을 설정 가능하다. pd.set_option("display.max_columns", 50) pd.set_option("display.max_rows", 50) 데이터 타입 train_df.dtypes 기초 통계량 train_df.describe() Null 확인 train_df.isnull().sum() 2. 개별 특성 분석하기 특성 갯수 세기 train_df['Cabin'].value_counts() all_df.Pclass.value_counts().plot.bar() 개별 특성과 목적 변수 관계 ..

Computer Science/Data Science 2021. 11. 23. 12:11

데이터 분석 방법의 기초 - Kaggle 주택 가격 예측 예제

이 포스트에서는 캐글 주택 가격 예측 예제를 풀어보며, 데이터 분석 방법의 기초를 다져보고자 한다. 캐글 주택 가격 예측 예제 링크 https://www.kaggle.com/c/house-prices-advanced-regression-techniques 1. 데이터 불러오기 및 데이터 살펴보기 필자는 구글 코랩을 사용하며, 구글 드라이브에서 데이터를 불러온다. from google.colab import drive drive.mount('/content/drive') 데이터 파일 종류 및 경로는 아래와 같다. data_dir = "/content/drive/MyDrive/Colab Notebooks/kaggle/house_prices/data/" submit_dir = "/content/drive/My..

Computer Science/Data Science 2021. 11. 23. 10:41

이전 Prev 1 Next 다음

목록Computer Science/Data Science (6)

Scribbling

티스토리툴바