How it works – TopicZoom GmbH

Unternehmen
Lösungen
- Thematische Suche Thematische Suche Als thematische Suche (engl. „Topic Search“ oder „Thematic Search“) wird im Bereich der Informationstechnologie eine spezielle Form der semantischen Suche bezeichnet, bei der die Nutzer im Web, in einer digitalen Bibliothek oder in einem lokalen Archiv nach Dokumenten suchen, die sich mit einem bestimmten Thema beschäftigen. Beispielsweise können Benutzer daran interessiert sein, alle Dokumente zum Themenbereich „Herzkrankheiten” (oder „Kriminalität“, „Astronomie“, „Nachkriegszeit“ etc.) zu finden. Suchmaschinen, die rein „stichwortbasiert“ arbeiten, können solche Dokumente nur dann finden, wenn das erwünschte Thema selbst wörtlich als Begriff im Text auftritt. Dies ist jedoch oft nicht der Fall: viele interessante Texte behandeln einen speziellen Bereich oder Teilaspekt des betrachteten Themas, ohne dieses explizit zu nennen. So liefert ein Fachartikel über Vorhofflimmern interessante Informationen zu den Themen „Gesundheit“ oder „Herzkrankheiten“, auch wenn diese Wörter selbst im Text nicht auftreten. Ähnlich gehört ein Bericht über Galaxien zum Themenbereich „Astronomie”, auch wenn dieser Begriff nicht erwähnt ist. Übliche Volltext-Suchmaschinen sind nicht in der Lage, automatisiert Stichwörter dazugehörigen Themen zuzuordnen. Viele themenrelevante Dokumente werden darum schlicht nicht gefunden. Bessere Möglichkeiten bieten spezielle Suchtechniken und -Umgebungen, die thematische Zusammenhänge zwischen unterschiedlichen Begriffen systematisch erfassen und bei der Beschreibung von Dokumentinhalten im Suchindex und bei der Beantwortung von Nutzeranfragen…
- Anwendungen
- TopicZoom Nutzungsbedingungen
Umsetzung
Partner
Kontakt

TopicZoom uses a large semantic network for recognizing topics in texts. The nodes of this semantic network represent “concepts” such as topics, locations, time periods, persons, enterprises and organizations, events. Each node comes with a preferred standard name, which is always used to display the concept. In addition, language variants for a concept are stored with its node. Language variants are used during text analysis to recognize all mentionings of the concept in an input document. Mapping variants of concept names to a unique preferred name leads to a normalization and standardization of language expressions. Many concept names in the TopicZoom are multi-word expressions such as “New York State Opera”, “Angela Merkel”, or “French whine”. In this way, TopicZoom semantic text analysis does not rely on single words, but on real concept names.

The TopicZoom semantic net is organized as a hierarchical graph. Topmost nodes represent general fields, such as politics, sports, or econony. Children of a given node point to major subfields. In our human mental representation of the world, spatial notions such as “fields” and “subfields” help to organize the relationship between general thematic areas and specific topics. TopicZoom uses the same ordering principle not only for thematic fields, but also for geographic areas and temporal periods. Following the links of the net in downward direction leads from general fields to more and more specific fields, from large regions/periods to small subregions/subperiods. When looking in upward direction from a given node (“Economic policy”), we typically find several parent nodes (“Economy”, “Politics”).

During text analysis, if a hit is found for a concept (a node), all more general fields also receive a score. In this way, if “President Obama” is found in the text, then “U.S. politics” and furthermore also “United States of America” and “Politics” also obtain an improved count. For the thematic profile of an input documents, scores from all hits found in the text are accumulated. In this way, topics are recognized if subtopics are mentioned in the text. In the final TopicZoom ranking mechanism, a second scoring factor is added which guarantees that general fields are not preferred, but rather “eye-catching” topics of the text have the best ranking. For a given input text, the scores computed for the topics are semantic taggings which represent ideal subject metadata. These subject metadata can not only be used for text simple classification tasks, but provide an ideal basis for precise thematic search and subject-oriented search on a complete collection far beyond conventional keyword matching.

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others