Virtuoso's Sponger Middleware layer generates Linked Data from a variety of disparate web content (ex: HTML, iCal, RDF Documents and statements, etc.).
Objective
This article demonstrates how to use Virtuoso for extracting, transforming, and loading (ETL) recipe data from BBC Good Food.
The final product of this tutorial can also be viewed using the live links provided throughout.
Requirements
Virtuoso Universal Server (Commercial/Open Source)
Linked Data Cardtridges VAD
Optional: OpenLink Structured Data Sniffer (Chrome, Firefox)
Observing Linked Data Embedded into Recipe Pages
Open a recipe web page from BBC GoodFood (Example)
2. If you have the OpenLink Structured Data Sniffer installed (Chrome, Firefox), click on the plug-in logo to view the human-friendly version of embedded structured data within the web page. This data is what will be extracted from each recipe web page.
The populated results for each category can be loaded into your Virtuoso instance using the following SPARQL query via the built-in SPARQL interfa e at http://{cname}:port/sparql
D FINE get:soft "replace" DEFINE input:grab-var "?recipePage" DEFINE input:grab-depth 1
We can also view a visual representation of the extracted recipe metadata through Virtuoso's built-in Faceted Browsing interface, by clicking on the result hyperlinks in the recipe column.
Finally, you can also use the same query to generate a PivotViewer Report that adds image processing and animated drill-down to the experience, as illustrated in the screenshots that follow.
Next Steps
Now that the data has been loaded, you can begin querying the newly loaded recipes in depth [Article to be released later this week].
Bonus
I'll repeat this to extract the entire BBC Good Food recipe set, and share the script if this post receives 100 claps 👏🏾