O proekte
Khochesh' skachat' vse faily s /b ili drugogo razdela? I bez dublikatov? I tol'ko faily bol'she/men'she X Kilobait? Ili tol'ko kartinki/tol'ko video? Ili faily tol'ko s konkretnogo treda? Ili tebe nuzhen treker, kotoryi budet otbirat' tredy po kliuchevym slovam? Vsio eto zdes'! I dazhe bol'she!
V repozitorii predstavlen gotovyi nabor skriptov dlia dvacha, vse skripty mozhno kastomizirovat' pod svoi zadachi. Pri minimal'nykh znaniiakh pitona mozhno s legkost'iu napisat' skript pod svoi nuzhdy. Vsia informatsiia nizhe.
Ustanovka
Ustanovite python
Skachaite zip arkhiv ili:
git clone https://github.com/diademoff/2ch
Ustanovite zavisimosti:
cd 2ch
pip install -r requirements.txt
Zapuskaite nuzhnyi skript:
python {nazvanie skripta}.py
Spisok skriptov
- Skachat' vse faily i posty treda
thread_saver.py - Uvedomleniia o novykh tredakh na doskakh
tracker.py - Samye populiarnye tredy na doske
popular.py - Skachivat' vse faily doski
board_media.py
Redaktirovanie skriptov
Vse skripty mozhno redaktirovat' pod vashi zadachi.
thread_saver.pyFOLDER = 'saver'- Izmenit' imia papki, v kotoruiu budut sokhraniat'sia failySAVE_MEDIA- Sokhraniat' li izobrazheniia i videoDELAY- Interval obnovleniia v sekundakh
tracker.pytext_limit = 155- Izmenit' dlinu strokiboard_names = 'b news sex v hw gg dev soc rf ma psy fet'- Izmenit' spisok dosok (pisat' cherez probel)KEY_WORDS- Ukazat' kliuchevye slova
popular.pytext_limit = 164- Dlina strokimax_lines = 55- Maksimal'noe kolichestvo strok v vyvodeboard_name = 'b'- Doska, kotoraia parsitsiaKEY_WORDS- Vyvodit' tredy tol'ko s kliuchevymi slovami
board_media.pyBOARD = 'b'- Imia bordy, s kotoroi skachivat' failyFOLDER_NAME = 'media'- Imia papki, v kotoruiu skachivat' failyKEY_WORDS = []- Otbirat' tredy po kliuchevym slovam, esli kliuchevye slova ne ukazany, to budut skachivat'sia faily vsekh tredovEXTENSIONS = []- Faily s kakimi rasshireniiami skachivat'MAX_FILE_SIZE- Zadat' maksimal'nyi razmer faila v KilobaitakhMIN_FILE_SIZE- Zadat' minimal'nyi razmer faila v Kilobaitakh
FAQ
- Skript ne zapuskaetsia.
Prover'te ustanovleny li zavisimosti: pip install -r requirements.txt. Prover'te kodirovku failov. Prover'te, chto u vas ustanovlena versiia Python > 3.
- Kak sravnivaiutsia izobrazheniia?
Izobrazheniia sravnivaiutsia po soderzhimomu. Dazhe esli u izobrazhenii raznoe rasshirenie png i jpg, ili raznyi razmer oni vsio ravno budut raspoznany kak odinakovye.
- Ty ispol'zuesh' api dvacha?
Da. A konkretno:
https://2ch.hk/makaba/mobile.fcgi?task=get_thread&board={board_name}&thread={num}&post=1
http://2ch.hk/{name}/threads.json
- Zachem tebe beautiful soup?
Preimushchestvenno chtoby ubirat' html tegi v postakh. Esli v poste zhirnyi tekst, to poluchaetsia tak:
tekst. Etot teg nuzhno ubrat', chtoby ostalsia tol'ko tekst.
- Kak ukazat' kliuchevye slova?
Otkroite nuzhnyi skript i otredaktiruite po obraztsu. Obratite vnimanie na formatirovanie, zapiatye i kavychki.
"tsui'",
"mp4"
]
- Skripty krossplatformennye?
Da. Skripty byli provereny na Linux i Windows.
Dlia razrabotchikov
Ves' api khranitsia v faile dvach.py. Podkliuchaem:
Struktura
- Board
name: str- Imia doskiposts: dict- Spisok postov, eto slovar'. Kliuch - eto nomer treda, znachenie - peremennaia tipaThreadjson_link: str- Ssylka na json tredovfrom_json()- Poluchit' ob'ektBoardiz json'ajson_download()- Skachat' json doskithread_exists()- Est' li na doske tred s ukazannym nomeromupdate_threads()- Obnovit' spisok tredov na doskesort_threads_by_posts()- Otsortirovat' spisok tredov po kolichestvu postov, chem blizhe element k nachalu spiska, tem bol'she v nem postovget_new_threads()- Sravnit' tekushchii spisok tredov s drugim i poluchit' slovar' novykh tredovget_dead_threads()- Sravnit' tekushchii spisok tredov s drugim i poluchit' slovar' utonuvshikh tredov
- Thread
comment: str- Tekst v OP postenum: str- Nomer tredaposts_count: int- Kolichestvo postovscore: float- Skol'ko ochkov u tredasubject: str- Sokrashchennyicommentviews: int- Kolichestvo prosmotrovunique_posters: int- Kolichestvo unikal'nykh prosmotrov (poiavitsia posle obnovleniia postov)board_name: str- Kakoi doske prinadlezhit tredposts = []- Spisok postovget_link: str- Ssylka na tredget_op_post: Post- Poluchit' OP-postjson_posts_link: str- Ssylka na json tredasave(path)- sokhranit' v html posty treda v ukazannuiu papkuIsOk()- Podkhodit li tred po zadannym kliuchevym slovamupdate_posts()- Skachat' json i obnovit' ikh spisok, vyzyvaet funktsiiuget_posts()get_posts()- Sparsit' json i obnovit'unique_postersipostsjson_download()- Poluchit' json postov v chistom vide
- Post
comment: str- Tekstdate: str- Data postaemail: strop: intnum: str- Nomerfiles: []- Spisok failov
- Post_file
displayname: str- Otobrazhaemoe imianame: str- Imiadownload_link: str- Ssylka na skachivaniewidth: int- Shirinaheight: int- Vysotasize: int- Razmer failaIsImage: bool- Iavliaetsia li fail izobrazheniemIsVideo: bool- Iavliaetsia li fail videosave()- Sokhranit' fail po ukazannomu putiIsOk()- Podkhodit li fail po zadannym rasshireniiam, maksimal'nomu i minimal'nomu razmeru
Doski
Klass Board pozvoliaet vzaimodeistvovat' s doskami (b, news, po, soc i t.d).
Ob'iavlenie:
Teper' v peremennoi board khranitsia doska b, no tam net nikakoi informatsii, krome nazvaniia doski. Chtoby poluchit' spisok tredov na doske:
Teper' v pole threads nakhoditsia slovar' s tredami. Kliuch - eto nomer treda, znachenie - eto tred (Thread).
Poluchit' spisok s nomerami tredov:
thread_nums = list(board.threads.keys())
Otsortiruem po populiarnosti i snova poluchim spisok nomerov tredov:
thread_nums = list(board.threads.keys())
Pervyi element teper' iavliaetsia nomerom samogo populiarnogo treda:
Tredy
My poluchili nomer samogo populiarnogo treda, teper' poluchim sam tred iz slovaria threads:
thread = board.threads[most_popular_num]
V etom slovare znachenie imeet tip Thread. Posmotrim tip peremennoi thread:
Poluchim:
Poluchim spisok postov v trede:
print(f"Kolichestvo postov (posts_count): {thread.posts_count}")
thread.update_posts()
print(f"Kolichestvo postov (dlina posts): {len(thread.posts)}")
print(f"Unikal'nykh prosmotrov: {thread.unique_posters}")
Na vykhode poluchim:
Kolichestvo postov (dlina posts): 0
Kolichestvo postov (posts_count): 60
Kolichestvo postov (dlina posts): 64
Unikal'nykh prosmotrov: 34
unique_posters - poiavliaetsia tol'ko posle vyzova update_posts() ili get_posts().
Poluchenie kolichestva postov s pomoshch'iu len(thread.posts) iavliaetsia bolee tochnym, no trebuet zagruzki vsekh postov, v to vremia kak thread.posts_count izvestno vo vremia polucheniia tredov na doske.
Sokhranenie treda v html
Dlia sokhraneniia treda ispol'zuite klass HtmlGenerator i metod get_thread_htmlpage. Etot metod vozvrashchaet html kod, kotoryi mozhno sokhranit' v fail.
img_path = os.path.normpath(f'./{op_file.name}') # Put', kuda my ee sokhranim
op_file.save(img_path) # Sokhraniaem kartinku
# Poluchaem html
html = dvach.HtmlGenerator.get_thread_htmlpage(thread, img_path)
# Sozdaiom fail
file = open(f'thread_{thread.num}.html', 'w')
# Zapisyvaet tuda html stranitsu
file.write(html)
Ili ispol'zuite funktsiiu:
thread.save('.')
Posty
Posle polucheniia spiska postov s pomoshch'iu update_posts() v pole posts poiavilsia spisok postov nachinaia s OP-posta.
Posmotrim vtoroi post v trede:
print(f"Nomer: {post.num}")
print(f"Tekst: {post.comment}")
print(f"Kolichestvo failov: {len(post.files)}")
Na vykhode poluchaem:
Nomer: 210762237
Tekst: Bamp
Kolichestvo failov: 1
Faily
Teper' poluchim pervyi fail v poste, esli fail est':
file = post.files[0]
print(type(file))
Na vykhode poluchim:
Posmotrim bol'she informatsii o faile:
print(f"Shirina: {file.width}")
print(f"Vysota: {file.height}")
print(f"Otobrazhaemoe imia: {file.displayname}")
print(f"Ssylka: {file.download_link}")
Na vykhode:
Imia faila: 16200245064090.jpg
Shirina: 3118
Vysota: 1754
Otobrazhaemoe imia: 1620024504280.jpg
Ssylka: https://2ch.hk/b/src/245763818/16200245064090.jpg
Mozhno legko sokhranit' fail:
Fail budet sokhranen v direktoriiu v kotoroi vypolniaetsia skript s imenem 16200245064090.jpg
Mozhno ukazat' kastomnyi put':
Itogo
Ves' kod, ispol'zuemyi v primerakh:
import os
# Ob'iavit' dosku
board = dvach.Board('b')
# Skachat' tredy
board.update_threads()
# Poluchit' spisok nomerov tredov
thread_nums = list(board.threads.keys())
# Otsortirovat' po kolichestvu postov
board.sort_threads_by_posts()
# Obnovit' spisok s nomerami tredov
thread_nums = list(board.threads.keys())
# Nomer samogo populiarnogo treda
most_popular_num = thread_nums[0]
# Samyi populiarnyi tred
thread = board.threads[most_popular_num]
# Posmotret' tip peremennoi
print(type(thread))
print(f"Kolichestvo postov (dlina posts): {len(thread.posts)}")
print(f"Kolichestvo postov (posts_count): {thread.posts_count}")
# Skachat' posty
thread.update_posts()
print(f"Kolichestvo postov (dlina posts): {len(thread.posts)}")
print(f"Unikal'nykh prosmotrov: {thread.unique_posters}")
op_file = thread.posts[0].files[0] # Kartinka v OP-poste
img_path = os.path.normpath(f'./{op_file.name}') # Put', kuda my ee sokhranim
op_file.save(img_path) # Sokhraniaem kartinku
# Poluchaem html
html = dvach.HtmlGenerator.get_thread_htmlpage(thread, img_path)
# Sozdaiom fail
file = open(f'thread_{thread.num}.html', 'w')
# Zapisyvaet tuda html stranitsu
file.write(html)
# Poluchit' vtoroi post (kotoryi srazu posle OP-posta)
post = thread.posts[1]
print(f"Nomer: {post.num}")
print(f"Tekst: {post.comment}")
print(f"Kolichestvo failov: {len(post.files)}")
if len(post.files) > 0:
# Poluchit' pervyi fail
file = post.files[0]
print(type(file))
print(f"Imia faila: {file.name}")
print(f"Shirina: {file.width}")
print(f"Vysota: {file.height}")
print(f"Otobrazhaemoe imia: {file.displayname}")
print(f"Ssylka: {file.download_link}")
# Sokhranit' fail
file.save(file.name)
# file.save(f"/home/username/{file.name}")