Jobs bei mapegy (Technology Intelligence aus Berlin)
Ansprechpartner: Dr. ing. Peter WaldeNamed Entity Matching for companies and persons
Entity matching (also referred to as duplicate identification, record linkage, entity
resolution or reference reconciliation) is a crucial task for data integration and data
cleaning in the process of information refinement. It is about identifying entities referring
to the same real-world entity.
Entities considered are companies and persons in the patent database of the startup
Mapegy UG. The task of the theses will be
- surveying the state of the art of named entity matching,
- developing the technical framework of a well performing named entity matching of
patent assignees and inventors respectively and - examining it empirically in a real business case.
For a quick overview refer to:
- Köpcke, Rahm: „Frameworks for entity matching: A comparison“, Data Knowl. Eng.
(2009). - Moreau, Yvon, Cappé (2008): „Robust Similarity Measures for Named Entities
Matching“, Proceedings of the 22nd International Conference on Computational
Linguistics (Coling 2008), pages 593–600, Manchester.
Quality Assurance in model-driven process development
Quality Assurance (QA) is an important, omnipresent and continuous part of software
development. Mapegy develops complex algorithms and processes so as to find the
crucial facts from a vast amount of text data. In order to obtain reliable information QA
gets day and more important.
The task of the theses is
- surveying the state of art of quality assurance,
- developing a technical prototype of an efficient QA process or
- a qualified specification sheet therefore,
- examining it empirically in a real business case.
For a quick overview refer to:
- www.rapid-i.com
- Petrasch, Meimberg (2006): “Model Driven Architecture”
- Fowler (2004): “UML konzentriert”
- Ludewig, Lichter (2007): “Software Engineering“
Optimization of distributed model-driven business intelligence processes
In a world of increasing complexity, rapid change and data deluge it is crucial to take
business decisions quickly and efficiently based on facts and figures. In order to find the
crucial facts in todays digital world huge data sets have to be scanned and calculated.
The task of the theses is
- surveying the state of art of model-driven architectures and distributed processes,
- (re)developing and distributing of business intelligence processes,
- managing QA for these developed processes,
- examining it empirically in a real business case.
For a quick overview refer to:
- Sharifat, Reif, Kofler, Breuel (2010): “Pattern Recognition Engineering” Proceedings of
RCOMM 2010 - Petrasch, Meimberg (2006): “Model Driven Architecture”
- Ludewig, Lichter (2007): “Software Engineering“
- Agarwal, Rudolph, Abecker (2008): “Semantic Description of Distributed Business
Processes“ Proceedings of Fourth IEEE International Conference on Semantic
Computing.