Difference between Dataset, Model, Graph, etc. in Jena
Note: I have taken this answer from this link. I hope to add some other relevant concepts to it.
- A DataSet is a collection of models (one being the Default Model, any others being Named Models) that you expect will have new triples added to it over time. You can read and write on DataSets.
- A Model is a collection of statements - this is what you typically aim your SPARQL queries at. If you SPARQL a Dataset and don't use a 'FROM NAMED' clause, you're querying the Default Model.
- A Graph is a collection of triples. Every Model can be turned into a Graph, to provide a somewhat closer representation of the RDF, OWL, and SPARQL standards.
- A DatasetGraph is a container for Graphs, that provides the infrastructure for Default and Named Graphs.
- Some people prefer the DatasetGraph / Graph representation, which gives you a different suite of methods to call. It's really a matter of preference - both the Model and Graph approaches will get the job done, though it seems to me that the Model Model approach is a bit higher-level and user-friendly.
- A typical workflow for data analysis would have you identify a DataSource, like DBpedia. You'd define a bunch of different Datasets by querying DBpedia with CONSTRUCT statements. Now, you have static snapshots that you can use for your analysis work. In many Datasets, you'll just have one model, the default model. However, sometimes you want the added complexity of Named Models, in which case your Datasets will have a few Models in each.
- If you're doing analysis, you usually want to set up DataSources (as proxies) to each repository you want to define your Datasets with. You'll be in charge of persisting your Datasets, and can determine when to refresh them with new data by requiring your sources (if you even want to refresh your data). The persistence of Models, then, will come naturally as a result of the Dataset persistence.
Graphically, some of the above concepts can be depicted in the below diagram as:
Other relevant stuff can be found at the following links:
1. Criag Trim: Working with DataSets using Jena