Using OpenRefine Review

[The publisher of this book provide me with a free e-copy for review]

Link to the book: http://www.packtpub.com/openrefine-guide-for-data-analysis-and-linking-dataset-to-the-web/book

My familiarization with OpenRefine started with Google Refine. The main idea has evolved and become open for contributions of the community. This book is a nice cookbook, easy to follow written in a friendly language, to perform both simple and complex operations over semistructured data such as HTML tables, spreadsheets, csv files, among others, exploiting the linkeability of your datasets with the Linked Open Data (LOD). Even without previous knowledge any user can take this book and from scratch start to use OpenRefine.

Organization:

The book is divided in four chapters plus an appendix.

Chapter 1: Presents the first set of easy-to-follow recipes to get your hands into loading and preparing your data.

Chapter 2: The main contribution is on how to sort and create facets to select (or isolate) data based on regular expressions over the cell values, with the main goal of fix the datasets.

Chapter 3: Tells you how to deal with more advanced operations over your data, and gives a brief introduction to GREL, the language defined to manipulate cell values. GREL is simple and powerful enough to match and replace the cell’s content as needed by different purposes, that’s because they have an appendix with a deeper explanation.

Chapter 4: Once that all the data have been normalized this can be reconciliated with an external knowledge base or linked open dataset such as Freebase (can be whatever knowledge base appropriate for the data at hands) to enrich the semantics of the data.

What I like about this book:

The format of cookbook and the easy reading are the most positive aspects of this book. The authors made a good work explaining all the use cases using a real data example.

What I dislike about this book:

The smaller things that I dislike are part of the format used in the book. It would be better to have enumerated figures and sections to make references effortless. However, the figures are very self explicatives and ad-hoc with the explanations. Also, for easy referencing would be nice to have a consecutive recipes numeration and not a new numeration in each chapter.

Wrapping up, I would strongly recommend this book to any user interested in: Semantic Web/Linked Data, Linked Open Data publication, Data Integration, Named Entity Recognition; at student, lecturer, practitioner level, or just for hobby.

« Using Algorithm2e in ACM template Using Markdown for your presentations »

Emir Muñoz

Using OpenRefine Review