Class: RDF::RDFa::Reader
- Inherits:
-
RDF::Reader
- Object
- RDF::Reader
- RDF::RDFa::Reader
- Includes:
- Expansion, Util::Logger
- Defined in:
- vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb,
vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader/rexml.rb,
vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader/nokogiri.rb
Overview
An RDFa parser in Ruby
This class supports Nokogiri for HTML
processing, and will automatically select the most performant
implementation (Nokogiri or LibXML) that is available. If need be, you
can explicitly override the used implementation by passing in a
:library
option to Reader.new
or Reader.open
.
Based on processing rules described here:
Direct Known Subclasses
Defined Under Namespace
Constant Summary collapse
- XHTML =
"http://www.w3.org/1999/xhtml"
- SafeCURIEorCURIEorIRI =
Content model for @about and @resource. In RDFa 1.0, this was URIorSafeCURIE
{ :"rdfa1.0" => [:safe_curie, :uri, :bnode], :"rdfa1.1" => [:safe_curie, :curie, :uri, :bnode], }
- TERMorCURIEorAbsIRI =
Content model for @datatype. In RDFa 1.0, this was CURIE Also plural TERMorCURIEorAbsIRIs, content model for @rel, @rev, @property and @typeof
{ :"rdfa1.0" => [:term, :curie], :"rdfa1.1" => [:term, :curie, :absuri], }
- NC_REGEXP =
This expression matches an NCName as defined in XML-NAMES
Regexp.new( %{^ ( [a-zA-Z_] | \\\\u[0-9a-fA-F]{4} ) ( [0-9a-zA-Z_\.-/] | \\\\u([0-9a-fA-F]{4}) )* $}, Regexp::EXTENDED)
- TERM_REGEXP =
This expression matches an term as defined in RDFA-CORE
For the avoidance of doubt, this definition means a 'term' in RDFa is an XML NCName that also permits slash as a non-leading character.
Regexp.new( %{^ (?!\\\\u0301) # ́ is a non-spacing acute accent. # It is legal within an XML Name, but not as the first character. ( [a-zA-Z_] | \\\\u[0-9a-fA-F]{4} ) ( [-0-9a-zA-Z_\.\/] | \\\\u([0-9a-fA-F]{4}) )* $}, Regexp::EXTENDED)
Constants included from Util::Logger
Instance Attribute Summary collapse
-
#host_language ⇒ :xml, ...
readonly
Host language.
-
#implementation ⇒ Module
Returns the XML implementation module for this reader instance.
-
#repository ⇒ RDF::Repository
readonly
Repository used for collecting triples.
-
#version ⇒ :"rdfa1.0", :"rdfa1.1"
readonly
Version.
Attributes inherited from RDF::Reader
Attributes included from Enumerable
Class Method Summary collapse
-
.options ⇒ Object
RDFa Reader options.
Instance Method Summary collapse
-
#each_statement {|statement| ... }
Iterates the given block for each RDF statement in the input.
-
#each_triple {|subject, predicate, object| ... }
Iterates the given block for each RDF triple in the input.
-
#extract_script(el, input, type, **options, &block) ⇒ Object
Extracts RDF from script element, or embeded RDF/XML.
-
#initialize(input = $stdin, **options) {|reader| ... } ⇒ reader
constructor
Initializes the RDFa reader instance.
Methods included from Util::Logger
#log_debug, #log_depth, #log_error, #log_fatal, #log_info, #log_recover, #log_recovering?, #log_statistics, #log_warn, #logger
Methods included from Expansion
#copy_properties, #expand, #rule
Methods inherited from RDF::Reader
#base_uri, #canonicalize?, #close, each, #each_pg_statement, #encoding, #fail_object, #fail_predicate, #fail_subject, for, format, #intern?, #lineno, open, #prefix, #prefixes, #prefixes=, #read_statement, #read_triple, #rewind, #to_sym, to_sym, #valid?, #validate?
Methods included from Util::Aliasing::LateBound
Methods included from Enumerable
add_entailment, #canonicalize, #canonicalize!, #dump, #each_graph, #each_object, #each_predicate, #each_quad, #each_subject, #each_term, #entail, #enum_graph, #enum_object, #enum_predicate, #enum_quad, #enum_statement, #enum_subject, #enum_term, #enum_triple, #graph?, #graph_names, #invalid?, #method_missing, #object?, #objects, #predicate?, #predicates, #project_graph, #quad?, #quads, #respond_to_missing?, #statement?, #statements, #subject?, #subjects, #supports?, #term?, #terms, #to_a, #to_h, #to_set, #triple?, #triples, #valid?, #validate!
Methods included from Isomorphic
#bijection_to, #isomorphic_with?
Methods included from Countable
Methods included from RDF::Readable
Constructor Details
#initialize(input = $stdin, **options) {|reader| ... } ⇒ reader
Initializes the RDFa reader instance.
306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb', line 306 def initialize(input = $stdin, **, &block) super do @options = {reference_folding: true}.merge(@options) @repository = RDF::Repository.new @options[:rdfagraph] = case @options[:rdfagraph] when 'all' then [:output, :processor] when String, Symbol then @options[:rdfagraph].to_s.split(',').map(&:strip).map(&:to_sym) when Array then @options[:rdfagraph].map {|o| o.to_s.to_sym} else [] end.select {|o| [:output, :processor].include?(o)} @options[:rdfagraph] << :output if @options[:rdfagraph].empty? @library = case [:library] when nil # Use Nokogiri when available, and REXML otherwise: defined?(::Nokogiri) ? :nokogiri : :rexml when :nokogiri, :rexml [:library] else raise ArgumentError.new("expected :rexml or :nokogiri, but got #{[:library].inspect}") end require "rdf/rdfa/reader/#{@library}" @implementation = case @library when :nokogiri then Nokogiri when :rexml then REXML end self.extend(@implementation) detect_host_language_version(input, **) add_info(@doc, "version = #{@version}, host_language = #{@host_language}, library = #{@library}, rdfagraph = #{@options[:rdfagraph].inspect}, expand = #{@options[:vocab_expansion]}") begin initialize_xml(input, **) rescue add_error(nil, "Malformed document: #{$!.}") end add_error(nil, "Empty document") if root.nil? add_error(nil, doc_errors.map(&:message).uniq.join("\n")) if !doc_errors.empty? # Section 4.2 RDFa Host Language Conformance # # The Host Language may require the automatic inclusion of one or more Initial Contexts @host_defaults = { vocabulary: nil, uri_mappings: {}, initial_contexts: [], } if @version == :"rdfa1.0" # Add default term mappings @host_defaults[:term_mappings] = %w( alternate appendix bookmark cite chapter contents copyright first glossary help icon index last license meta next p3pv1 prev role section stylesheet subsection start top up ).inject({}) { |hash, term| hash[term] = RDF::URI("http://www.w3.org/1999/xhtml/vocab#") + term; hash } end case @host_language when :xml, :svg @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT] when :xhtml1 @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT, XHTML_RDFA_CONTEXT] when :xhtml5, :html4, :html5 @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT, HTML_RDFA_CONTEXT] end block.call(self) if block_given? end end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method in the class RDF::Enumerable
Instance Attribute Details
#host_language ⇒ :xml, ... (readonly)
Host language
85 86 87 |
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb', line 85 def host_language @host_language end |
#implementation ⇒ Module
Returns the XML implementation module for this reader instance.
101 102 103 |
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb', line 101 def implementation @implementation end |
#repository ⇒ RDF::Repository (readonly)
Repository used for collecting triples.
95 96 97 |
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb', line 95 def repository @repository end |
#version ⇒ :"rdfa1.0", :"rdfa1.1" (readonly)
Version
90 91 92 |
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb', line 90 def version @version end |
Class Method Details
.options ⇒ Object
RDFa Reader options
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb', line 249 def self. super + [ RDF::CLI::Option.new( symbol: :vocab_expansion, datatype: TrueClass, default: false, control: :checkbox, on: ["--vocab-expansion"], description: "Perform OWL2 expansion on the resulting graph.") {true}, RDF::CLI::Option.new( symbol: :host_language, datatype: %w(xml xhtml1 xhtml5 html4 html5 svg), default: :html5, control: :select, on: ["--host-language HOSTLANG", %w(xml xhtml1 xhtml5 html4 html5 svg)], description: "Host Language. One of xml, xhtml1, xhtml5, html4, or svg") do |arg| arg.to_sym end, RDF::CLI::Option.new( symbol: :rdfagraph, datatype: %w(output processor both), default: :output, control: :select, on: ["--rdfagraph RDFAGRAPH", %w(output processor both)], description: "Used to indicate if either or both of the :output or :processor graphs are output.") {|arg| arg.to_sym}, ] end |
Instance Method Details
#each_statement {|statement| ... }
This method returns an undefined value.
Iterates the given block for each RDF statement in the input.
Reads to graph and performs expansion if required.
418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 |
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb', line 418 def each_statement(&block) if block_given? unless @processed || @root.nil? # Add prefix definitions from host defaults @host_defaults[:uri_mappings].each_pair do |prefix, value| prefix(prefix, value) end # parse parse_whole_document(@doc, RDF::URI(base_uri)) # Look for Embedded RDF/XML unless @root.xpath("//rdf:RDF", "rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#").empty? extract_script(@root, @doc, "application/rdf+xml", **@options.merge(base_uri: base_uri)) do |statement| @repository << statement end end # Look for Embedded microdata unless @root.xpath("//@itemscope").empty? begin require 'rdf/microdata' add_debug(@doc, "process microdata") @repository << RDF::Microdata::Reader.new(@doc, **) rescue LoadError add_debug(@doc, "microdata detected, not processed") end end # Perform property copying copy_properties(@repository) if @options[:reference_folding] # Perform vocabulary expansion (@repository) if @options[:vocab_expansion] @processed = true end # Return statements in the default graph for # statements in the associated named or default graph from the # processed repository @repository.each do |statement| case statement.graph_name when nil yield statement if @options[:rdfagraph].include?(:output) when RDF::RDFA.ProcessorGraph yield RDF::Statement.new(*statement.to_triple) if @options[:rdfagraph].include?(:processor) end end if validate? && log_statistics[:error] raise RDF::ReaderError, "Errors found during processing" end end enum_for(:each_statement) end |
#each_triple {|subject, predicate, object| ... }
This method returns an undefined value.
Iterates the given block for each RDF triple in the input.
483 484 485 486 487 488 489 490 |
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb', line 483 def each_triple(&block) if block_given? each_statement do |statement| block.call(*statement.to_triple) end end enum_for(:each_triple) end |
#extract_script(el, input, type, **options, &block) ⇒ Object
Extracts RDF from script element, or embeded RDF/XML
380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 |
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-ea6265716853/lib/rdf/rdfa/reader.rb', line 380 def extract_script(el, input, type, **, &block) add_debug(el, "script element of type #{type}") begin # Formats don't exist unless they've been required case type.to_s when 'application/csvm+json' then require 'rdf/tabular' when 'application/ld+json' then require 'json/ld' when 'application/rdf+xml' then require 'rdf/rdfxml' when 'text/ntriples' then require 'rdf/ntriples' when 'text/turtle' then require 'rdf/turtle' end rescue LoadError end @readers ||= {} reader = @readers[type.to_s] = RDF::Reader.for(content_type: type.to_s) unless @readers.has_key?(type.to_s) if reader = @readers[type.to_s] add_debug(el, "=> reader #{reader.to_sym}") # Wrap input in a RemoteDocument with appropriate content-type and base doc = if input.is_a?(String) RDF::Util::File::RemoteDocument.new(input, content_type: type.to_s, **) else input end reader.new(doc, **).each(&block) else add_debug(el, "=> no reader found") end end |