Class: RDF::RDFa::Reader

Inherits:
RDF::Reader show all
Includes:
Expansion, Util::Logger
Defined in:
vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb,
vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader/rexml.rb,
vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader/nokogiri.rb

Overview

An RDFa parser in Ruby

This class supports Nokogiri for HTML processing, and will automatically select the most performant implementation (Nokogiri or LibXML) that is available. If need be, you can explicitly override the used implementation by passing in a :library option to Reader.new or Reader.open.

Based on processing rules described here:

Direct Known Subclasses

Microdata::RdfaReader

Defined Under Namespace

Modules: Nokogiri, REXML

Constant Summary collapse

XHTML =
"http://www.w3.org/1999/xhtml"
SafeCURIEorCURIEorIRI =

Content model for @about and @resource. In RDFa 1.0, this was URIorSafeCURIE

{
  :"rdfa1.0" => [:safe_curie, :uri, :bnode],
  :"rdfa1.1" => [:safe_curie, :curie, :uri, :bnode],
}
TERMorCURIEorAbsIRI =

Content model for @datatype. In RDFa 1.0, this was CURIE Also plural TERMorCURIEorAbsIRIs, content model for @rel, @rev, @property and @typeof

{
  :"rdfa1.0" => [:term, :curie],
  :"rdfa1.1" => [:term, :curie, :absuri],
}
NC_REGEXP =

This expression matches an NCName as defined in XML-NAMES

Regexp.new(
%{^
  (  [a-zA-Z_]
   | \\\\u[0-9a-fA-F]{4}
  )
  (  [0-9a-zA-Z_\.-/]
   | \\\\u([0-9a-fA-F]{4})
  )*
$},
Regexp::EXTENDED)
TERM_REGEXP =

This expression matches an term as defined in RDFA-CORE

For the avoidance of doubt, this definition means a 'term' in RDFa is an XML NCName that also permits slash as a non-leading character.

Regexp.new(
%{^
  (?!\\\\u0301)             # ́ is a non-spacing acute accent.
                            # It is legal within an XML Name, but not as the first character.
  (  [a-zA-Z_]
   | \\\\u[0-9a-fA-F]{4}
  )
  (  [-0-9a-zA-Z_\.\/]
   | \\\\u([0-9a-fA-F]{4})
  )*
$},
Regexp::EXTENDED)

Constants included from Util::Logger

Util::Logger::IOWrapper

Instance Attribute Summary collapse

Attributes inherited from RDF::Reader

#options

Attributes included from Enumerable

#existentials, #universals

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Util::Logger

#log_debug, #log_depth, #log_error, #log_fatal, #log_info, #log_recover, #log_recovering?, #log_statistics, #log_warn, #logger

Methods included from Expansion

#copy_properties, #expand, #rule

Methods inherited from RDF::Reader

#base_uri, #canonicalize?, #close, each, #each_pg_statement, #encoding, #fail_object, #fail_predicate, #fail_subject, for, format, #intern?, #lineno, open, #prefix, #prefixes, #prefixes=, #read_statement, #read_triple, #rewind, #to_sym, to_sym, #valid?, #validate?

Methods included from Util::Aliasing::LateBound

#alias_method

Methods included from Enumerable

add_entailment, #canonicalize, #canonicalize!, #dump, #each_graph, #each_object, #each_predicate, #each_quad, #each_subject, #each_term, #entail, #enum_graph, #enum_object, #enum_predicate, #enum_quad, #enum_statement, #enum_subject, #enum_term, #enum_triple, #graph?, #graph_names, #invalid?, #method_missing, #object?, #objects, #predicate?, #predicates, #project_graph, #quad?, #quads, #respond_to_missing?, #statement?, #statements, #subject?, #subjects, #supports?, #term?, #terms, #to_a, #to_h, #to_set, #triple?, #triples, #valid?, #validate!

Methods included from Isomorphic

#bijection_to, #isomorphic_with?

Methods included from Countable

#count, #empty?

Methods included from RDF::Readable

#readable?

Constructor Details

#initialize(input = $stdin, **options) {|reader| ... } ⇒ reader

Initializes the RDFa reader instance.

Parameters:

  • input (IO, File, String) (defaults to: $stdin)

    the input stream to read

  • options (Hash{Symbol => Object})

    any additional options (see RDF::Reader#initialize)

Options Hash (**options):

  • :library (Symbol)

    One of :nokogiri or :rexml. If nil/unspecified uses :nokogiri if available, :rexml otherwise.

  • :vocab_expansion (Boolean) — default: false

    whether to perform OWL2 expansion on the resulting graph

  • :reference_folding (Boolean) — default: true

    whether to perform RDFa property copying on the resulting graph

  • :host_language (:xml, :xhtml1, :xhtml5, :html4, :html5, :svg) — default: :html5

    Host Language

  • :version (:"rdfa1.0", :"rdfa1.1") — default: :"rdfa1.1"

    Parser version information

  • :processor_callback (Proc) — default: nil

    Callback used to provide processor graph triples.

  • :rdfagraph (Array<Symbol>) — default: [:output]

    Used to indicate if either or both of the :output or :processor graphs are output. Value is an array containing on or both of :output or :processor.

  • :vocab_repository (Repository) — default: nil

    Repository to save loaded vocabularies.

Yields:

  • (reader)

    self

Yield Parameters:

Yield Returns:

  • (void)

    ignored

Raises:



300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb', line 300

def initialize(input = $stdin, **options, &block)
  super do
    @options = {reference_folding: true}.merge(@options)
    @repository = RDF::Repository.new

    @options[:rdfagraph] = case @options[:rdfagraph]
    when 'all' then [:output, :processor]
    when String, Symbol then @options[:rdfagraph].to_s.split(',').map(&:strip).map(&:to_sym)
    when Array then @options[:rdfagraph].map {|o| o.to_s.to_sym}
    else  []
    end.select {|o| [:output, :processor].include?(o)}
    @options[:rdfagraph] << :output if @options[:rdfagraph].empty?

    @library = case options[:library]
      when nil
        # Use Nokogiri when available, and REXML otherwise:
        defined?(::Nokogiri) ? :nokogiri : :rexml
      when :nokogiri, :rexml
        options[:library]
      else
        raise ArgumentError.new("expected :rexml or :nokogiri, but got #{options[:library].inspect}")
    end

    require "rdf/rdfa/reader/#{@library}"
    @implementation = case @library
      when :nokogiri then Nokogiri
      when :rexml    then REXML
    end
    self.extend(@implementation)

    detect_host_language_version(input, **options)

    add_info(@doc, "version = #{@version},  host_language = #{@host_language}, library = #{@library}, rdfagraph = #{@options[:rdfagraph].inspect}, expand = #{@options[:vocab_expansion]}")

    begin
      initialize_xml(input, **options)
    rescue
      add_error(nil, "Malformed document: #{$!.message}")
    end
    add_error(nil, "Empty document") if root.nil?
    add_error(nil, doc_errors.map(&:message).uniq.join("\n")) if !doc_errors.empty?

    # Section 4.2 RDFa Host Language Conformance
    #
    # The Host Language may require the automatic inclusion of one or more Initial Contexts
    @host_defaults = {
      vocabulary:       nil,
      uri_mappings:     {},
      initial_contexts: [],
    }

    if @version == :"rdfa1.0"
      # Add default term mappings
      @host_defaults[:term_mappings] = %w(
        alternate appendix bookmark cite chapter contents copyright first glossary help icon index
        last license meta next p3pv1 prev role section stylesheet subsection start top up
        ).inject({}) { |hash, term| hash[term] = RDF::URI("http://www.w3.org/1999/xhtml/vocab#") + term; hash }
    end

    case @host_language
    when :xml, :svg
      @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT]
    when :xhtml1
      @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT, XHTML_RDFA_CONTEXT]
    when :xhtml5, :html4, :html5
      @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT, HTML_RDFA_CONTEXT]
    end

    block.call(self) if block_given?
  end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class RDF::Enumerable

Instance Attribute Details

#host_language:xml, ... (readonly)

Host language

Returns:

  • (:xml, :xhtml1, :xhtml5, :html4, :html5, :svg)


85
86
87
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb', line 85

def host_language
  @host_language
end

#implementationModule

Returns the XML implementation module for this reader instance.

Returns:

  • (Module)


101
102
103
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb', line 101

def implementation
  @implementation
end

#repositoryRDF::Repository (readonly)

Repository used for collecting triples.

Returns:



95
96
97
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb', line 95

def repository
  @repository
end

#version:"rdfa1.0", :"rdfa1.1" (readonly)

Version

Returns:

  • (:"rdfa1.0", :"rdfa1.1")


90
91
92
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb', line 90

def version
  @version
end

Class Method Details

.optionsObject

RDFa Reader options



249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb', line 249

def self.options
  super + [
    RDF::CLI::Option.new(
      symbol: :vocab_expansion,
      datatype: TrueClass,
      on: ["--vocab-expansion"],
      description: "Perform OWL2 expansion on the resulting graph.") {true},
    RDF::CLI::Option.new(
      symbol: :host_language,
      datatype: %w(xml xhtml1 xhtml5 html4 svg),
      on: ["--host-language HOSTLANG", %w(xml xhtml1 xhtml5 html4 svg)],
      description: "Host Language. One of xml, xhtml1, xhtml5, html4, or svg") do |arg|
        arg.to_sym
    end,
    RDF::CLI::Option.new(
      symbol: :rdfagraph,
      datatype: %w(output processor both),
      on: ["--rdfagraph RDFAGRAPH", %w(output processor both)],
      description: "Used to indicate if either or both of the :output or :processor graphs are output.") {|arg| arg.to_sym},
  ]
end

Instance Method Details

#each_statement {|statement| ... }

This method returns an undefined value.

Iterates the given block for each RDF statement in the input.

Reads to graph and performs expansion if required.

Yields:

  • (statement)

Yield Parameters:



412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb', line 412

def each_statement(&block)
  if block_given?
    unless @processed || @root.nil?
      # Add prefix definitions from host defaults
      @host_defaults[:uri_mappings].each_pair do |prefix, value|
        prefix(prefix, value)
      end

      # parse
      parse_whole_document(@doc, RDF::URI(base_uri))

      # Look for Embedded RDF/XML
      unless @root.xpath("//rdf:RDF", "rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#").empty?
        extract_script(@root, @doc, "application/rdf+xml", **@options.merge(base_uri: base_uri)) do |statement|
          @repository << statement
        end
      end

      # Look for Embedded microdata
      unless @root.xpath("//@itemscope").empty?
        begin
          require 'rdf/microdata'
          add_debug(@doc, "process microdata")
          @repository << RDF::Microdata::Reader.new(@doc, **options)
        rescue LoadError
          add_debug(@doc, "microdata detected, not processed")
        end
      end

      # Perform property copying
      copy_properties(@repository) if @options[:reference_folding]

      # Perform vocabulary expansion
      expand(@repository) if @options[:vocab_expansion]

      @processed = true
    end

    # Return statements in the default graph for
    # statements in the associated named or default graph from the
    # processed repository
    @repository.each do |statement|
      case statement.graph_name
      when nil
        yield statement if @options[:rdfagraph].include?(:output)
      when RDF::RDFA.ProcessorGraph
        yield RDF::Statement.new(*statement.to_triple) if @options[:rdfagraph].include?(:processor)
      end
    end

    if validate? && log_statistics[:error]
      raise RDF::ReaderError, "Errors found during processing"
    end
  end
  enum_for(:each_statement)
end

#each_triple {|subject, predicate, object| ... }

This method returns an undefined value.

Iterates the given block for each RDF triple in the input.

Yields:

  • (subject, predicate, object)

Yield Parameters:



477
478
479
480
481
482
483
484
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb', line 477

def each_triple(&block)
  if block_given?
    each_statement do |statement|
      block.call(*statement.to_triple)
    end
  end
  enum_for(:each_triple)
end

#extract_script(el, input, type, **options, &block) ⇒ Object

Extracts RDF from script element, or embeded RDF/XML



374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
# File 'vendor/bundler/ruby/3.3.0/bundler/gems/rdf-rdfa-fb11c7e2d467/lib/rdf/rdfa/reader.rb', line 374

def extract_script(el, input, type, **options, &block)
  add_debug(el, "script element of type #{type}")
  begin
    # Formats don't exist unless they've been required
    case type.to_s
    when 'application/csvm+json' then require 'rdf/tabular'
    when 'application/ld+json'   then require 'json/ld'
    when 'application/rdf+xml'   then require 'rdf/rdfxml'
    when 'text/ntriples'         then require 'rdf/ntriples'
    when 'text/turtle'           then require 'rdf/turtle'
    end
  rescue LoadError
  end

  @readers ||= {}
  reader = @readers[type.to_s] = RDF::Reader.for(content_type: type.to_s) unless @readers.has_key?(type.to_s)
  if reader = @readers[type.to_s]
    add_debug(el, "=> reader #{reader.to_sym}")
    # Wrap input in a RemoteDocument with appropriate content-type and base
    doc = if input.is_a?(String)
      RDF::Util::File::RemoteDocument.new(input, content_type: type.to_s, **options)
    else
      input
    end
    reader.new(doc, **options).each(&block)
  else
    add_debug(el, "=> no reader found")
  end
end