About: https://medium.com/virtuoso-blog/what-is-small-data-and-why-is-it-important-fbf5f267884?source=rss----f99cc4bb2945---4

Not logged in : Login

(Sponging disallowed)

Facets (new session)
Description
Metadata
Settings
- Rule:
- Inverse Functional Properties:
- "Same As":

About: https://medium.com/virtuoso-blog/what-is-small-data-and-why-is-it-important-fbf5f267884?source=rss----f99cc4bb2945---4 Goto Sponge NotDistinct Permalink

An Entity of Type : schema:BlogPosting, within Data Space : www.openlinksw.com associated with source document(s)
QRcode icon

http://www.openlinksw.com/describe/?url=https%3A%2F%2Fmedium.com%2Fvirtuoso-blog%2Fwhat-is-small-data-and-why-is-it-important-fbf5f267884%3Fsource%3Drss----f99cc4bb2945---4

Small Data is an alternative to Big Data for addressing Volume, Velocity, and Variety challenges posed by Data in a Hyper-connected world.

Attributes	Values
description	Small Data is an alternative to Big Data for addressing Volume, Velocity, and Variety challenges posed by Data in a Hyper-connected world.
described by	https://www.openlinksw.com/about/id/entity/urn/webdev:www.news https://www.openlinksw.com/about/id/entity/http/www.openlinksw.com/DAV/oplweb3/data/urn-openlink-news-rss-posts.ttl proxy:entity/urn/openlink:community:forum:posts proxy:entity/urn/openlink:news:rss:posts proxy:entity/urn/webdev:www.news https://www.openlinksw.com/about/id/entity/urn/openlink:community:forum:posts https://www.openlinksw.com/about/id/entity/urn/openlink:news:rss:posts
articleBody	In recent times, a “Big Data” meme has emerged around three fundamental characteristics of data - Volume, Velocity, and Variety - all of which have been magnified by the hyper-connectivity unleashed by HTTP ubiquity as exemplified by the World Wide Web. In addition to the original three V's, it has become abundantly clear that Veracity and Vulnerability are equally important characteristics which take the V-count of data to five. These five V's have collectively triggered an important inflection regarding the nature of DBMS architecture and deployment topologies. Loose-coupling of Database Management Systems (DBMS) and Database Files - moving from the traditional one-to-one association between a DBMS and its propriety Database Document Type, to a one-to-many association between a single DBMS and a variety of proprietary and/or open-standard Database Document Types (e.g., CSV, ORC, Avro, and others) Extending Storage using Clusters of Database Files - where having a single Database Document File (sometimes striped across a number of disk controllers) as the focal point of storage has been typical, newer network-oriented filesystems (HDFS, Apache Spark, etc.) have emerged that enable the use of commodity hardware to construct vast clusters of Database Document Files that collectively increase data storage volume Mapping the Content of Database Document Files to Relations - which are then operated on declaratively by a given DBMS using query languages, e.g., the Structured Query Language (SQL) The net effect of all of this is that the most obvious Data Challenges posed by the five Vs are being addressed by more physical storage made available via the loose-coupling of DBMS application software and Database File storage. Unfortunately, this doesn't address the deeper issues at hand and doesn't offer real cost-savings (whether measured in energy consumption, environmental impact, or pure dollars) over the long haul. Small Data as a solution to the challenges posed by the 5Vs of Data “Small Data” introduces a new paradigm where puzzle-pieces are progressively discovered and applied to solve a puzzle. This approach leverages the power of Hyperdata (rather than Hypertext) - where hyperlinks function as “Super Keys” that offer a “deceptively simple” mechanism for modernizing the following: Data Definition - every entity is unambiguously identified using a hyperlink and described using a 3-tuple structure (as opposed to the N-Tuple structure of traditional tables) where each part has a clear role based on existing “parts of speech” principles, i.e., subject, predicate, and object Data Access and Flow - using identifiers that resolve to a description of whatever they identify ensures access and flow across traditional boundaries that underly all Data Silos (e.g., software applications, host operating systems, host machines, and networks) Data Manipulation - using declarative query languages (e.g., SQL, SPARQL, or a combination of these, among others) to operate on data that flows across traditional boundaries as part of the solution production pipeline, i.e., constants and variables in the query body are resolved on an as-needed basis Benefits? Cost Savings - thanks to ubiquitous support of HTTP, you can refer to data from a variety of applications and services, wherever it might have been stored, and avoid building out new clusters of database documents spread over commodity hardware as a response to data volume Performance & Scale -by leveraging innovations in cache-invalidation schemes and data compression, alongside the rapid increases in network bandwidth, to access and manipulate specific chunks of data Data Privacy and Security - by fusing data representation with the logic that underlies entity descriptions (courtesy of the “predicate role” in “parts of speech”) access to data is controlled declaratively via fine-grained attribute-based access controls Enhanced Agility - through a more flexible approach to data access and manipulation, avenues are opened up for addressing mission critical challenges such as skill discovery and management, knowledge base (or knowledge graph) construction, and cultural changes around data management that increase participation and allow easier and more privacy-aware monetization of data How does it work? Adherence to Linked Data Principles implies that Entities are Identified using hyperlinks (ideally, an HTTP URI due to the combined effects of resolvability and global ubiquity) and described using a 3-tuple structure comprising a Subject, Predicate, and Object. The above implies that I can identify myself using the following hyperlink (an HTTP URI): http://kidehen3.solid.openlinksw.com:8444/profile/card#me Any user agent can look up (or "dereference") what that hyperlink identifies. This is the effect experienced when you click on that link in your browser, or use it as the target of a cURL command as shown below: curl -i https://kidehen3.solid.openlinksw.com:8444/profile/card#me I can go much further, and construct actual subject→predicate→object sentences that collectively describe me, as the output of that cURL command demonstrates. @prefix solid: <http://www.w3.org/ns/solid/terms#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix pim: <http://www.w3.org/ns/pim/space#> . @prefix schema: <http://schema.org/> . @prefix ldp: <http://www.w3.org/ns/ldp#> . @prefix acl: <http://www.w3.org/ns/auth/acl#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix cert: <http://www.w3.org/ns/auth/cert#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix : <#> . <> a foaf:PersonalProfileDocument ; foaf:maker :me ; foaf:primaryTopic :me . :me a foaf:Person ; a schema:Person ; foaf:name "Kingsley Idehen" ; solid:account </> ; # link to the account uri solid:oidcIssuer <https://solid.openlinksw.com:8444> ; pim:storage </> ; # root storage ldp:inbox </inbox/> ; pim:preferencesFile </settings/prefs.ttl> ; # private settings/preferences solid:publicTypeIndex </settings/publicTypeIndex.ttl> ; solid:privateTypeIndex </settings/privateTypeIndex.ttl> . This basic principle also applies to Folders and the Files they contain, meaning that you have a massive collection of puzzle-pieces that describe a variety of things while also being amenable to progressive retrieval and association. In other words, puzzle-pieces come together on request, in response to specific actions like producing a solution to a query expressed in SPARQL. Here's an example of a SPARQL Query (leveraging a Virtuoso pragma that conditionally invokes its built-in Crawler) that demonstrates this powerful feature based on the content of various combinations of Files and Folders. ## Invokes the Virtuoso Sponger Middleware DEFINE input:grab-all "yes" SELECT DISTINCT ?o WHERE { # SPARQL Query Body comprising URI-variables # and URI-constants { <https://kidehen3.solid.openlinksw.com:8444/public/> ldp:contains ?o . OPTIONAL { ?o owl:sameAs ?o2 } } UNION { <https://melvin.solid.live/public/> ldp:contains ?o . OPTIONAL { ?o owl:sameAs ?o2 } } UNION { <https://drive.verborgh.org/public/RWWCrewQA/> ldp:contains ?o . OPTIONAL { ?o owl:sameAs ?o2 } } UNION { <https://timbl.com/timbl/Public/Test/2018/> ldp:contains ?o . OPTIONAL {?o owl:sameAs ?o2} } } Query Results Page (click on the live link to see results) We can go much further as shown by the live demonstrations in the posts listed below: Bookmarks Collection Activity Stream comprising Posts that I've indicated I like Other tools and frameworks that already understand the Small Data pattern include - Tabulator - now the default Data Browser for Solid Semantic Web Client Library Conclusion Small Data is Big Data that's Accessible (via Hyperlinks as Entity Identifiers) , Understandable (via Subject, Predicate, Object sentences that produce Hyperdata), and Actionable (REST-ful interactions with Actions that manifest as Hyperdata). The net effect of this approach to data access, integration, and exploitation enables the construction of Knowledge Graphs that manifest progressively as a Semantic Web of Linked Data deployed across public and/or private networks (a/k/a Hybrid Clouds). Related Understanding Data What is Small Data - YouTube Video by Riccardo Osti 5-Pillars of Master Data - Presentation by Scott Taylor Simple Linked Data Deployment Tutorial What is the Virtuoso Sponger Middleware about, and why is it important? What is a Virtuoso SPARQL Endpoint, and why is it important? What is a SPARQL Endpoint, and why is it important? How Do You Know You Need a Multi-Model RDBMS? Virtuoso Home Page Free Virtuoso Evaluation and Download Page - for Windows, Linux, and macOS Free Virtuoso Evaluation License for Windows Free Virtuoso Evaluation License for Linux Free Virtuoso Evaluation License of macOS Current Entry-Level License Offers for Virtuoso across Linux, Windows, and macOS Pay-As-You-Go (PAGO) Virtuoso AMI in the Amazon Web Services (AWS) Cloud Virtuoso 8.x (Closed Source) Docker Container Virtuoso 7.x (Open Source) Docker Container <a href="https://medium.com/media/8b41ae27e74bc74c0b566370de6c4d30/href">https://medium.com/media/8b41ae27e74bc74c0b566370de6c4d30/href</a> What is Small Data, and why is it Important? was originally published in OpenLink Virtuoso Weblog on Medium, where people are continuing the conversation by highlighting and responding to this story.
title	What is Small Data, and why is it Important?
datePublished	2019-04-11T22:09:48Z
type	BlogPosting
is topic of	urn:webdev:www.news http://www.openlinksw.com/DAV/oplweb3/data/urn-openlink-news-rss-posts.ttl urn:openlink:community:forum:posts urn:openlink:news:rss:posts
is blogPost of	https://medium.com/feed/virtuoso-blog https://linkeddata.uriburner.com/about/id/entity/https/medium.com/feed/virtuoso-blog

Faceted Search & Find service v1.17_git122 as of Jan 03 2023

Alternative Linked Data Documents: iSPARQL | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 08.03.3330 as of Apr 5 2024, on Linux (x86_64-generic-linux-glibc25), Single-Server Edition (30 GB total memory, 26 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software

About: https://medium.com/virtuoso-blog/what-is-small-data-and-why-is-it-important-fbf5f267884?source=rss----f99cc4bb2945---4 Goto Sponge NotDistinct Permalink

Small Data as a solution to the challenges posed by the 5Vs of Data

Benefits?

How does it work?

Conclusion

Related