[크래프톤 정글 ] 파이썬 기초 & 웹 스크래핑 & mongodb

크래프톤 정글

[크래프톤 정글 ] 파이썬 기초 & 웹 스크래핑 & mongodb

하루이2222 2024. 7. 11. 18:16

파이썬 기초 및 응용 요점 정리

수업 목표

파이썬 기초 문법을 이해한다.
파이썬으로 알고리즘 문제를 풀 수 있다.

1. Python 기초

파이썬이란

네덜란드의 프로그래머 Guido van Rossum이 개발
읽기 쉬운 코드에 중점을 둠
문법이 쉬워 초보자가 배우기 좋은 언어

파이썬 설치

파이썬 설치 페이지

VSCode로 파이썬 파일 실행하기

프로젝트 폴더 생성 및 VSCode 실행
새 파일 생성 (예: hello.py)
파일에 코드 작성 및 저장
```
 print('Hello, jungle')
```
플레이 버튼을 눌러 실행

파이썬 기초 문법

변수 선언

파이썬은 let이나 const 없이 변수 선언 가능
```
  a = 3
  print(a)
```

자료형과 기본 연산

숫자형

  a = 7
  b = 2
  print(a + b)  # 9
  print(a / b)  # 3.5

문자열

  word1 = "jungle"
  word2 = 'coding'
  print(word1 + " " + word2)  # jungle coding

불 자료형

  x = True
  y = False
  print(x and y)  # False

리스트와 딕셔너리

리스트

  a_list = [1, 2, 3]
  print(a_list[0])  # 1

딕셔너리

  a_dict = {'name': 'bob', 'age': 21}
  print(a_dict['name'])  # bob

함수

함수 정의 및 호출

  def f(x):
      return 2 * x + 3
  print(f(2))  # 7

조건문

if / else

  def is_adult(age):
      if age > 20:
          print('성인입니다')
      else:
          print('청소년이에요')
  is_adult(30)  # 성인입니다

반복문

for 문

  fruits = ['사과', '배', '감', '귤']
  for fruit in fruits:
      print(fruit)

2. 파이썬 응용문제

Q1. 과일 갯수 세기 함수 만들기

과일 리스트에서 특정 과일의 갯수 세기

  def count_fruits(target, fruits):
      count = 0
      for fruit in fruits:
          if fruit == target:
              count += 1
      return count

  fruits = ['사과', '배', '배', '감', '수박', '귤', '딸기', '사과', '배', '수박']
  print(count_fruits('사과', fruits))  # 2

Q2. 사람의 나이 출력하기

사람 리스트에서 이름을 넣으면 나이를 돌려주는 함수

  def get_age(name, people):
      for person in people:
          if person['name'] == name:
              return person['age']
      return '해당하는 이름이 없습니다'

  people = [{'name': 'bob', 'age': 20}, {'name': 'carry', 'age': 38}, {'name': 'john', 'age': 7}]
  print(get_age('bob', people))  # 20

3. 파이썬 패키지

가상 환경 설치 및 확인

가상환경 생성

  python3 -m venv .venv
  source .venv/bin/activate  # Mac/Linux
  .venv\Scripts\activate  # Windows

pip 사용

패키지 설치
```
  pip install requests
```

4. 웹스크래핑

웹스크래핑이란?

웹 스크래핑(web scraping): 웹 페이지에서 원하는 부분의 데이터를 수집하는 것
참고:

웹스크래핑 해보기

목표: 다음 영화 랭킹 페이지에서 영화 제목들을 스크래핑하기
링크: https://movie.daum.net/ranking/boxoffice/yearly

1. HTML 구조 파악

크롬 브라우저에서 개발자도구(F12)를 열어 HTML 구조 파악
- 각 영화가 box_boxoffice 클래스를 갖는 div 안에 ol 태그의 li 태그로 구성
- 영화 제목은 tit_item 클래스를 갖는 strong 태그 안의 a 태그에 있음

2. BeautifulSoup 패키지 설치

pip install beautifulsoup4

3. 파이썬 코드 작성

import requests
from bs4 import BeautifulSoup

# 타겟 URL을 읽어서 HTML를 받아오고
headers = {'User-Agent': 'Mozilla/5.0'}
data = requests.get('https://movie.daum.net/ranking/boxoffice/yearly', headers=headers)

# HTML을 BeautifulSoup 라이브러리를 활용해 파싱
soup = BeautifulSoup(data.text, 'html.parser')
print(soup)  # HTML 확인

4. 영화 리스트 추출

# select를 이용해서, li들을 불러오기
movies = soup.select('.kakao_article > .section_ranking > .box_boxoffice > .list_movieranking > li')
print(len(movies))  # 50

for movie in movies:
    print(movie)

5. 영화 제목 추출

for movie in movies:
    tag_element = movie.select_one('.tit_item > a')
    print(tag_element)

6. 내용 있는 경우 텍스트 출력

for movie in movies:
    tag_element = movie.select_one('.tit_item > a')
    if not tag_element:
        continue
    print(tag_element.text)

BeautifulSoup 사용법 요약

select(): 조건을 만족하는 모든 요소를 리스트로 반환
select_one(): 조건을 만족하는 첫 번째 요소를 반환

선택자 예시:

  soup.select('태그명')
  soup.select('.클래스명')
  soup.select('#아이디명')
  soup.select('상위태그명 > 하위태그명')
  soup.select('태그명[속성="값"]')

웹스크래핑 더 해보기

목표: 제목, 개봉일, 관객 수, 포스터 이미지 파일의 URL, 영화 세부 정보의 URL 스크래핑

import requests
from bs4 import BeautifulSoup
from pymongo import MongoClient

client = MongoClient('localhost', 27017)  # MongoDB 연결
db = client.dbjungle                      # 'dbjungle' DB 생성

def insert_all():
    headers = {'User-Agent': 'Mozilla/5.0'}
    data = requests.get('https://movie.daum.net/ranking/boxoffice/yearly', headers=headers)
    soup = BeautifulSoup(data.text, 'html.parser')

    movies = soup.select('.kakao_article > .section_ranking > .box_boxoffice > .list_movieranking > li')
    print(len(movies))

    for movie in movies:
        tag_element = movie.select_one('.tit_item > a')
        if not tag_element:
            continue
        title = tag_element.text

        tag_element = movie.select_one('.txt_info > .info_txt:nth-child(1) > span')
        if not tag_element:
            continue
        open_date = tag_element.text
        (open_year, open_month, open_day) = [int(e) for e in open_date.split('.')]
        open_year += 2000

        tag_element = movie.select_one('.txt_info > .info_txt:nth-child(2)')
        if not tag_element:
            continue
        viewers = tag_element.findChild(string=True, recursive=False)
        viewers = int(''.join([c for c in viewers if c.isdigit()]))

        doc = {
            'title': title,
            'open

_year': open_year,
            'open_month': open_month,
            'open_day': open_day,
            'viewers': viewers,
        }
        db.movies.insert_one(doc)
        print('완료: ', title, open_year, open_month, open_day, viewers)

if __name__ == '__main__':
    db.movies.drop()  # 기존 컬렉션 삭제
    insert_all()      # 스크래핑 결과 DB에 저장

5. MongoDB

MongoDB 설치 및 설정

맥

터미널에서 Homebrew 설치

 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"

MongoDB 설치

 brew tap mongodb/brew
 brew install mongodb-community

MongoDB 실행
```
 brew services start mongodb-community
```

우분투 22.04

MongoDB 설치 참고

Studio 3T 설치 및 세팅

Studio 3T 다운로드
실행 후 New Connection 설정

pymongo로 MongoDB 조작하기

pymongo 설치

pip install pymongo

pymong 에서의 mongodb 커넥션 설정

mongodb 에서 admin 유저와 password 를 설정 했다면 아래의 설정을 따름

루트 경로 에 .env 환경변수 파일을 생성하고 몽고 디비의 user와 password 를 지정
```
export MONGO_USER= user 
export MONGO_PASS= password
```

환경 변수 파일을 로드 하고 몽고 클라이언드 에 연결 설정

  from pymongo import MongoClient //몽고 클라이언트 임포드 
  import os import urllib.parse 
  from dotenv import load_dotenv`

  .env 파일 로드  
  load\_dotenv()

  환경 변수에서 사용자 인증 정보 가져오기  
  mongo\_user = os.getenv('MONGO\_USER')  
  mongo\_pass = os.getenv('MONGO\_PASS')

  사용자 이름과 비밀번호를 URL에 맞게 인코딩  
  encoded\_user = urllib.parse.quote\_plus(mongo\_user)  
  encoded\_pass = urllib.parse.quote\_plus(mongo\_pass)  

  client = MongoClient(f'mongodb://{encoded\_user}:{encoded\_pass}@example.com:27017/')// 커넥션 연결

MongoDB 연결 설정

from pymongo import MongoClient
client = MongoClient('localhost', 27017)  # mongoDB는 27017 포트로 돌아갑니다.
db = client.jungle                        # 'jungle'라는 이름의 db를 만듭니다.

데이터 삽입

db.users.insert_one({'name':'bobby','age':21})
db.users.insert_one({'name':'kay','age':27})
db.users.insert_one({'name':'john','age':30})

모든 결과 값 조회

all_users = list(db.users.find({}))
for user in all_users:
    print(user)

특정 결과 값 조회 및 수정

특정 결과 값 조회

  user = db.users.find_one({'name':'bobby'})
  print(user)

수정

  db.users.update_one({'name':'bobby'},{'$set':{'age':19}})

웹스크래핑 결과를 MongoDB에 저장

import requests
from bs4 import BeautifulSoup
from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.dbjungle

def insert_all():
    headers = {'User-Agent': 'Mozilla/5.0'}
    data = requests.get('https://movie.daum.net/ranking/boxoffice/yearly', headers=headers)
    soup = BeautifulSoup(data.text, 'html.parser')

    movies = soup.select('.kakao_article > .section_ranking > .box_boxoffice > .list_movieranking > li')
    print(len(movies))

    for movie in movies:
        tag_element = movie.select_one('.tit_item > a')
        if not tag_element:
            continue
        title = tag_element.text

        tag_element = movie.select_one('.txt_info > .info_txt:nth-child(1) > span')
        if not tag_element:
            continue
        open_date = tag_element.text
        (open_year, open_month, open_day) = [int(e) for e in open_date.split('.')]
        open_year += 2000

        tag_element = movie.select_one('.txt_info > .info_txt:nth-child(2)')
        if not tag_element:
            continue
        viewers = tag_element.findChild(string=True, recursive=False)
        viewers = int(''.join([c for c in viewers if c.isdigit()]))

        doc = {
            'title': title,
            'open_year': open_year,
            'open_month': open_month,
            'open_day': open_day,
            'viewers': viewers,
        }
        db.movies.insert_one(doc)
        print('완료: ', title, open_year, open_month, open_day, viewers)

if __name__ == '__main__':
    db.movies.drop()
    insert_all()

find, update 연습하기

pymongo 기본 코드

from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client.dbjungle

Q1. 영화제목 '미니언즈2'의 개봉월을 가져오기

target_movie = db.movies.find_one({'title': '미니언즈2'})
print(target_movie['open_month'])

Q2. '미니언즈2'와 같은 월에 개봉한 영화 제목들 가져오기

target_movie = db.movies.find_one({'title': '미니언즈2'})
target_month = target_movie['open_month']

movies = list(db.movies.find({'open_month': target_month}))

for movie in movies:
    print(movie['title'])

Q3. 헌트 영화의 개봉 년도를 1999년으로 만들기

db.movies.update_one({'title': '헌트'}, {'$set': {'open_year': 1999}})

'크래프톤 정글' 카테고리의 다른 글

2~3주차 회고 (2)	2024.10.14
[크래프톤 정글 ] 1주차 회고 (2)	2024.09.20
[크래프톤 정글] 0 주차 회고 (4)	2024.09.19
[크래프톤 정글 ] html & css 사용하기 (0)	2024.07.11
[크래프톤 정글 ] chapter 2 - javascript & jquery (0)	2024.07.11

현재글[크래프톤 정글 ] 파이썬 기초 & 웹 스크래핑 & mongodb

하루이의 개발 노트

경험 하는 개발자가...되고 싶어요...

Today :
Yesterday :

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

[크래프톤 정글 ] 파이썬 기초 & 웹 스크래핑 & mongodb

파이썬 기초 및 응용 요점 정리

수업 목표

목차

1. Python 기초

파이썬이란

파이썬 설치

VSCode로 파이썬 파일 실행하기

파이썬 기초 문법

변수 선언

자료형과 기본 연산

리스트와 딕셔너리

함수

조건문

반복문

2. 파이썬 응용문제

Q1. 과일 갯수 세기 함수 만들기

Q2. 사람의 나이 출력하기

3. 파이썬 패키지

가상 환경 설치 및 확인

pip 사용

4. 웹스크래핑

웹스크래핑이란?

웹스크래핑 해보기

1. HTML 구조 파악

2. BeautifulSoup 패키지 설치

3. 파이썬 코드 작성

4. 영화 리스트 추출

5. 영화 제목 추출

6. 내용 있는 경우 텍스트 출력

BeautifulSoup 사용법 요약

웹스크래핑 더 해보기

5. MongoDB

MongoDB 설치 및 설정

맥

우분투 22.04

Studio 3T 설치 및 세팅

pymongo로 MongoDB 조작하기

pymongo 설치

pymong 에서의 mongodb 커넥션 설정

mongodb 에서 admin 유저와 password 를 설정 했다면 아래의 설정을 따름

MongoDB 연결 설정

데이터 삽입

모든 결과 값 조회

특정 결과 값 조회 및 수정

웹스크래핑 결과를 MongoDB에 저장

find, update 연습하기

pymongo 기본 코드

Q1. 영화제목 '미니언즈2'의 개봉월을 가져오기

Q2. '미니언즈2'와 같은 월에 개봉한 영화 제목들 가져오기

Q3. 헌트 영화의 개봉 년도를 1999년으로 만들기

'크래프톤 정글' 카테고리의 다른 글

'크래프톤 정글'의 다른글

관련글

티스토리툴바