'2026/03/11 글 목록

[REVIEW] Streaming Long Video Understanding with Large Language Models

1) 어떤 문제를 해결하고 싶은가이 논문은 long video understanding에서 발생하는 핵심 병목인 video token explosion 문제를 해결하려고 한다. 긴 비디오를 그대로 Vision-Language Large Model에 넣으면 frame 수가 많아져 token 수가 급격히 늘고, 계산량이 커지며, 앞부분 문맥이 소실되기 쉽다.기존 long video 처리 방식은 대체로sparse temporal sampling으로 일부 frame만 뽑거나,spatio-temporal pooling / frame compression으로 token 수를 줄이거나,memory bank를 따로 두는 방식이었다. 그런데 이 방식들은 긴 시간축의 temporal information을 잃거나, spa..

딥러닝 논문/VLM(비전 랭귀지 모델) 딥러닝 논문 2026.03.11

« 2026/03 »

일

월

화

수

목

금

토

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

예비 대학원생의 논문 리뷰 뽀개기

2026/03/11 1

티스토리툴바