基於字彙樹的大規模商標影像檢索系統

September, 2010 Student : Wei-Ying Huang

【研究簡介】中文摘要：　　現今日常生活中，不論是生活資訊、學術資料、報章新聞等資訊，我們都可透過一個強而有力的文字搜索引擎，來得到即時所需的資料和解答。對於以文字進行檢索資料，目前已可達到精確的結果，像是 Google 搜尋引擎。但是存在網頁或其他媒體上的龐大影像資訊，目前則沒有一個強力的作法或系統，可達到準確的搜尋結果，如果使用者可透過一個「以圖找圖」的影像檢索系統來得取相關資訊，多媒體影像資源可在生活上帶來更多實際的效用甚至取代部分文字的表述。　　目前各機關組織和企業都有專屬的商標影像，然而大規模 (Large-Scale) 的影像數量在檢索上是耗時的，在本論文中資料庫影像總數約54 萬多張。商標往往單用文字是無法搜尋到最好的結果，即因為文字無法詮釋某些商標的概念和意境，而且商標影像檢索困難在於需尋找相似的影像。本論文將以中華民國註冊的商標為影像來源，利用Scale-Invariant Features Transform (SIFT) 為影像特徵描述，因為此種區域特徵具有強健性，並以字彙樹 (Vocabulary Tree) 的階層式量化架構演算法概念，對資料庫中所有影像特徵進行分群，利用樹狀架構可加快搜尋速度。並結合資訊檢索的反向索引 (Inverted Index) 技術，在龐大的影像數量下也可加快檢索系統的搜尋速度。最後將對實驗結果以不同的影像相似度計算方法進行準確度的評估。 Abstract : 　　Nowadays, people can use text search engine to obtain a lot of information, such as news, transportation schedules, research data and so on. Google is a typical example of the popular and powerful text search engine. Besides text data, there are huge digital multimedia data such as photos, video frames, trademarks, etc. on the world wide web (WWW). Existing methods and search engines still cannot help people to search multimedia data friendly and accurately. If a user can directly use an image as a query to search for similar images or some related information, this may bring more practical applications than using text search. We called this technique Content-based Image Retrieval (CBIR). In this thesis, we built a search system based on large-scale trademark data (about 0.54M images). Vocabulary tree approach based on SIFT features from database images is adopted to search in large-scale trademark database. Vocabulary tree is a hierarchical clustering method, that trains SIFT features into visual words. It can reduce the search time by using tree data structure. In addition, we also apply inverted index to accelerate the search speed of system. Finally, different image similarity measures are used to estimate our system performance in the experiments.

【相關流程】
		t 左圖為影像系統資料庫建立流程，步驟簡介如下利用爬蟲程式至置產局商標遠端檢索系統抓取所需商標影像資料並儲存。對抓取下來的商標進行SIFT影像區域特徵擷取動作，以便後續動作。對所有商標影像所產生的影像特徵，以字彙樹的階層式分群結構進行處理，產生可供搜尋的樹狀結構和影像字。對於所產生的影像字計算其權重，以 TF-IDF 的權重技術來計算；對於影像字和影像之間的關係，也將建立反向索引 (Inverted Index) 列表，以供系統查詢之用。
右圖為系統搜尋流程，步驟簡介如下 u 使用者輸入查詢商標影像 (Query) 於系統中。系統針對查詢影像擷取SIFT影像區域特徵，以供搜尋比對之用。影像特徵利用字彙樹資料結構，逐一計算所屬影像字，最後統計形成影像字向量；接著以影像字查詢資料庫中擁有相同影像字的相關資料庫影像，依據影像字權重分數計算影像相似度，產生相似度分數。將所有相關的資料庫影像依據相似度分數進行排序，最後傳回排序結果給使用者即完成系統的搜尋流程。

【結論】　　本論文利用了SIFT區域特徵結合字彙樹的搜尋演算架構，實做了一套自動化的商標檢索系統，可供使用者以圖找圖的方式來檢索商標，減少在利用文字在關鍵字上描述的困難和主觀偏異性。而在本研究中的大規模影像數量下，如果利用窮舉法的方式將耗費大量運算資源和檢索時間，在使用上不太可行，所以利用階層式的字彙樹搜尋方式，經實驗證明可大幅加快搜尋的速度，解決了大量資料檢索耗時的問題。準確率在Binary Match和L1-norm可達七成以上，L2-norm距離相似度評估下，也可達六成八以上。　　雖然目前準確率尚未達到很高的效能，但搜尋速度的加快，使大規模影像資料量下的搜尋可行性提高；至於將所有分類的影像特徵統整，利用可運算大量資料的雲端計算建立一棵完全字彙樹，進行效能的實驗評估應該是可嘗試的做法。而經過研究過程時發現商標影像的形狀相似性極高，也可考慮使用形狀特徵來增強商標影像的描述，至於如何提高商標影像在不同應用上檢索的準確率也是未來可研究的方向。

【論文全文】
Large-scale Trademark Image Retrieval System based on Vocabulary Tree 【PDF】
Large-scale Trademark Image Retrieval System based on Vocabulary Tree 【word】

【口試投影片】
Large-scale Trademark Image Retrieval System based on Vocabulary Tree 【pptx】

【主要相關論文】
●　D. Nistèr and H. Stewènius, “Scalable Recognition with a Vocabulary Tree,”Computer Vision and Pattern Recognition, Volume 2, pages 2161-2168, 2006. 【PDF】【website】 ●　D.G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,”International Journal of Computer Vision, Volume 60, Issue 2, pages 91-110, 2004. 【PDF】【website】

Author : sa092470@hotmail.com Wei-Ying Huang Advisor : jcliu@ncnu.edu.tw Jen-Chang Liu VIP Lab. @ CSIE NCNU