# INTA – 1. cvičení
Projekty na GitLabu: https://gitlab.fjfi.cvut.cz/18inta-2024
__Všichni účastníci předmětu:__ přihlaste se na [gitlab.fjfi.cvut.cz](https://gitlab.fjfi.cvut.cz), navštivte stránku pro [skupinu 18inta-2024/students](https://gitlab.fjfi.cvut.cz/18inta-2024/students) a klikněte na tlačítko _"Request access"_ (je schované v [kebab menu](https://i.stack.imgur.com/OsXnO.png) vpravo nahoře).
Pokud máte problém s přihlášením na GitLab FJFI, kontaktujte vyučujícího.
## Zadání
Write a program that will download 10 images (of cats :cat: meow) from a search engine.
- [cat](https://duckduckgo.com/?t=h_&q=cat&iax=images&ia=images)
- [python](https://duckduckgo.com/?q=python&t=h_&iar=images&iax=images&ia=images)
## Užitečné odkazy
- [Základy programování v Pythonu](https://gitlab.fjfi.cvut.cz/ksi/zpro-2023-public)
- [instalace a použití VSCode/VSCodium](https://gitlab.fjfi.cvut.cz/ksi/zpro-2023-public/-/blob/main/14%20VSCodium.ipynb?ref_type=heads)
- [dokumentace Pythonu](https://docs.python.org/3/index.html)
- [requests](https://docs.python-requests.org/en/latest/index.html)
- ([beautifulsoup](https://beautiful-soup-4.readthedocs.io/en/latest/))
- [duckduckgo-search](https://pypi.org/project/duckduckgo-search/)
## Postup
1. vytvoření projektu ve VSCode
2. vytvoření virtuálního prostředí:
Mac/linux:
```
python3 -m venv .venv
source .venv/bin/activate
```
Windows:
```
py -m venv .venv
.venv/Scripts/activate.sh
```
3. nainstalování balíčků
pip install requests
pip install bs4
pip install duckduckgo-search
4. Implementace – zkuste dokončit (příště si ukážeme řešení)
Základ ze cvičení:
```python
import requests
from duckduckgo_search import DDGS
from pprint import pprint
##url = "https://duckduckgo.com/?t=h_&q=cat&iax=images&ia=images"
#url = "https://www.google.com/search?tbm=isch&q=cat&tbs=imgo:1"
#r = requests.get(url)
##print(r.content) # obecný obsah (bytes)
#print(r.text) # dekódovaný text
#print("<img" in r.text.lower())
with DDGS() as ddgs:
results = [r for r in ddgs.images("cat", max_results=5)]
pprint(results)
```
## Vzorové řešení
```python
from pathlib import Path
from pprint import pprint
import sys
import traceback
import typing
from urllib.parse import urlparse
import requests
from duckduckgo_search import DDGS
def search(keywords: str, **kwargs) -> typing.Iterator[dict]:
""" Search images using DuckDuckGo search engine.
Args:
keywords: the search terms (e.g. "cat")
kwargs: additional arguments of the search library, see
https://github.com/deedy5/duckduckgo_search?tab=readme-ov-file#3-images---image-search-by-duckduckgocom
Yields:
dict with the search results
"""
with DDGS() as ddgs:
for result in ddgs.images(keywords, **kwargs):
url = result["image"]
print(f"Downloading {url}")
try:
response = requests.get(url)
except requests.RequestException:
print(f"ERROR: failed to download {url}", file=sys.stderr)
continue
if response.status_code == 200:
result["image_data"] = response.content
yield result
else:
print(f"ERROR: got status {response.status_code}", file=sys.stderr)
def save(result: dict, directory: Path) -> None:
""" Save image of the search result to a file in a directory.
Args:
result: dict with the search result data
directory: where to save the file
"""
# ensure that the directory exists
directory.mkdir(parents=True, exist_ok=True)
# extract the file name
name = Path(urlparse(result["image"]).path)
if name.suffix.lower() not in [".jpg", ".jpeg", ".png", ".gif", ".webp"]:
print(f"Skipping result because an image suffix was not found in URL {result['image']}")
return
name = name.name
try:
with open(directory / name, "wb") as file:
file.write(result["image_data"])
except OSError as e:
#traceback.print_exception(e, file=sys.stderr)
print(f"OSError: {e}", file=sys.stderr)
# TODO: modify imgs_dir to separate different keywords, type_image, etc. into subdirectories
keywords = "cat"
imgs_dir = Path("images")
for result in search(keywords, max_results=100):
save(result, imgs_dir)
```