Class: RDF::RDFa::Reader

Inherits:
RDF::Reader show all
Includes:
Expansion, Util::Logger
Defined in:
vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader.rb,
vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader/rexml.rb,
vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader/nokogiri.rb

Overview

An RDFa parser in Ruby

This class supports Nokogiri for HTML processing, and will automatically select the most performant implementation (Nokogiri or LibXML) that is available. If need be, you can explicitly override the used implementation by passing in a :library option to Reader.new or Reader.open.

Based on processing rules described here:

Direct Known Subclasses

Microdata::RdfaReader

Defined Under Namespace

Modules: Nokogiri, REXML

Constant Summary

XHTML =
"http://www.w3.org/1999/xhtml"
SafeCURIEorCURIEorIRI =

Content model for @about and @resource. In RDFa 1.0, this was URIorSafeCURIE

{
  :rdfa1.0" => [:safe_curie, :uri, :bnode],
  :rdfa1.1" => [:safe_curie, :curie, :uri, :bnode],
}
TERMorCURIEorAbsIRI =

Content model for @datatype. In RDFa 1.0, this was CURIE Also plural TERMorCURIEorAbsIRIs, content model for @rel, @rev, @property and @typeof

{
  :rdfa1.0" => [:term, :curie],
  :rdfa1.1" => [:term, :curie, :absuri],
}
NC_REGEXP =

This expression matches an NCName as defined in XML-NAMES

Regexp.new(
%{^
  (  [a-zA-Z_]
   | \\\\u[0-9a-fA-F]{4}
  )
  (  [0-9a-zA-Z_\.-/]
   | \\\\u([0-9a-fA-F]{4})
  )*
$},
Regexp::EXTENDED)
TERM_REGEXP =

This expression matches an term as defined in RDFA-CORE

For the avoidance of doubt, this definition means a 'term' in RDFa is an XML NCName that also permits slash as a non-leading character.

Regexp.new(
%{^
  (?!\\\\u0301)             # ́ is a non-spacing acute accent.
                            # It is legal within an XML Name, but not as the first character.
  (  [a-zA-Z_]
   | \\\\u[0-9a-fA-F]{4}
  )
  (  [-0-9a-zA-Z_\.\/]
   | \\\\u([0-9a-fA-F]{4})
  )*
$},
Regexp::EXTENDED)

Instance Attribute Summary collapse

Attributes inherited from RDF::Reader

#options

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Util::Logger

#log_debug, #log_depth, #log_error, #log_fatal, #log_info, #log_recover, #log_recovering?, #log_statistics, #log_warn, #logger

Methods included from Expansion

#copy_properties, #expand, #rule

Methods inherited from RDF::Reader

#base_uri, #canonicalize?, #close, each, #encoding, #fail_object, #fail_predicate, #fail_subject, for, format, #intern?, #lineno, open, #prefix, #prefixes, #prefixes=, #read_statement, #read_triple, #rewind, to_sym, #to_sym, #valid?, #validate?

Methods included from Util::Aliasing::LateBound

#alias_method

Methods included from Enumerable

add_entailment, #dump, #each_graph, #each_object, #each_predicate, #each_quad, #each_subject, #each_term, #entail, #enum_graph, #enum_object, #enum_predicate, #enum_quad, #enum_statement, #enum_subject, #enum_term, #enum_triple, #graph_names, #has_graph?, #has_object?, #has_predicate?, #has_quad?, #has_statement?, #has_subject?, #has_term?, #has_triple?, #invalid?, #method_missing, #objects, #predicates, #project_graph, #quads, #respond_to_missing?, #statements, #subjects, #supports?, #terms, #to_a, #to_h, #to_set, #triples, #valid?, #validate!

Methods included from Isomorphic

#bijection_to, #isomorphic_with?

Methods included from Countable

#count, #empty?

Methods included from RDF::Readable

#readable?

Constructor Details

#initialize(input = $stdin, options = {}) {|reader| ... } ⇒ reader

Initializes the RDFa reader instance.

Parameters:

  • input (IO, File, String) (defaults to: $stdin)

    the input stream to read

  • options (Hash{Symbol => Object}) (defaults to: {})

    any additional options (see RDF::Reader#initialize)

Options Hash (options):

  • :library (Symbol)

    One of :nokogiri or :rexml. If nil/unspecified uses :nokogiri if available, :rexml otherwise.

  • :vocab_expansion (Boolean) — default: false

    whether to perform OWL2 expansion on the resulting graph

  • :reference_folding (Boolean) — default: true

    whether to perform RDFa property copying on the resulting graph

  • :host_language (:xml, :xhtml1, :xhtml5, :html4, :html5, :svg) — default: :html5

    Host Language

  • :version (:"rdfa1.0", :"rdfa1.1") — default: :"rdfa1.1"

    Parser version information

  • :processor_callback (Proc) — default: nil

    Callback used to provide processor graph triples.

  • :rdfagraph (Array<Symbol>) — default: [:output]

    Used to indicate if either or both of the :output or :processor graphs are output. Value is an array containing on or both of :output or :processor.

  • :vocab_repository (Repository) — default: nil

    Repository to save loaded vocabularies.

Yields:

  • (reader)

    self

Yield Parameters:

Yield Returns:

  • (void)

    ignored

Raises:



300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
# File 'vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader.rb', line 300

def initialize(input = $stdin, options = {}, &block)
  super do
    @options = {reference_folding: true}.merge(@options)
    @repository = RDF::Repository.new

    @options[:rdfagraph] = case @options[:rdfagraph]
    when 'all' then [:output, :processor]
    when String, Symbol then @options[:rdfagraph].to_s.split(',').map(&:strip).map(&:to_sym)
    when Array then @options[:rdfagraph].map {|o| o.to_s.to_sym}
    else  []
    end.select {|o| [:output, :processor].include?(o)}
    @options[:rdfagraph] << :output if @options[:rdfagraph].empty?

    @library = case options[:library]
      when nil
        # Use Nokogiri when available, and REXML otherwise:
        defined?(::Nokogiri) ? :nokogiri : :rexml
      when :nokogiri, :rexml
        options[:library]
      else
        raise ArgumentError.new("expected :rexml or :nokogiri, but got #{options[:library].inspect}")
    end

    require "rdf/rdfa/reader/#{@library}"
    @implementation = case @library
      when :nokogiri then Nokogiri
      when :rexml    then REXML
    end
    self.extend(@implementation)

    detect_host_language_version(input, options)

    parse_lib = if @library == :nokogiri && @host_language == :html5
      begin
        require 'nokogumbo' unless defined?(::Nokogumbo)
        :nokobumbo
      rescue LoadError
        :nokogiri
      end
    else
      @library
    end

    parse_lib = @library == :nokogiri && defined?(::Nokogumbo) ? :nokogumbo : @library
    add_info(@doc, "version = #{@version},  host_language = #{@host_language}, library = #{parse_lib}, rdfagraph = #{@options[:rdfagraph].inspect}, expand = #{@options[:vocab_expansion]}")

    begin
      initialize_xml(input, options)
    rescue
      add_error(nil, "Malformed document: #{$!.message}")
    end
    add_error(nil, "Empty document") if root.nil?
    add_error(nil, doc_errors.map(&:message).uniq.join("\n")) if !doc_errors.empty?

    # Section 4.2 RDFa Host Language Conformance
    #
    # The Host Language may require the automatic inclusion of one or more Initial Contexts
    @host_defaults = {
      vocabulary:       nil,
      uri_mappings:     {},
      initial_contexts: [],
    }

    if @version == :rdfa1.0"
      # Add default term mappings
      @host_defaults[:term_mappings] = %w(
        alternate appendix bookmark cite chapter contents copyright first glossary help icon index
        last license meta next p3pv1 prev role section stylesheet subsection start top up
        ).inject({}) { |hash, term| hash[term] = RDF::URI("http://www.w3.org/1999/xhtml/vocab#") + term; hash }
    end

    case @host_language
    when :xml, :svg
      @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT]
    when :xhtml1
      @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT, XHTML_RDFA_CONTEXT]
    when :xhtml5, :html4, :html5
      @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT, HTML_RDFA_CONTEXT]
    end

    block.call(self) if block_given?
  end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class RDF::Enumerable

Instance Attribute Details

#host_language:xml, ... (readonly)

Host language

Returns:

  • (:xml, :xhtml1, :xhtml5, :html4, :html5, :svg)


85
86
87
# File 'vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader.rb', line 85

def host_language
  @host_language
end

#implementationModule

Returns the XML implementation module for this reader instance.

Returns:

  • (Module)


101
102
103
# File 'vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader.rb', line 101

def implementation
  @implementation
end

#repositoryRDF::Repository (readonly)

Repository used for collecting triples.

Returns:



95
96
97
# File 'vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader.rb', line 95

def repository
  @repository
end

#version:"rdfa1.0", :"rdfa1.1" (readonly)

Version

Returns:

  • (:"rdfa1.0", :"rdfa1.1")


90
91
92
# File 'vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader.rb', line 90

def version
  @version
end

Class Method Details

.optionsObject

RDFa Reader options



249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# File 'vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader.rb', line 249

def self.options
  super + [
    RDF::CLI::Option.new(
      symbol: :vocab_expansion,
      datatype: TrueClass,
      on: ["--vocab-expansion"],
      description: "Perform OWL2 expansion on the resulting graph.") {true},
    RDF::CLI::Option.new(
      symbol: :host_language,
      datatype: %w(xml xhtml1 xhtml5 html4 svg),
      on: ["--host-language HOSTLANG", %w(xml xhtml1 xhtml5 html4 svg)],
      description: "Host Language. One of xml, xhtml1, xhtml5, html4, or svg") do |arg|
        arg.to_sym
    end,
    RDF::CLI::Option.new(
      symbol: :rdfagraph,
      datatype: %w(output processor both),
      on: ["--rdfagraph RDFAGRAPH", %w(output processor both)],
      description: "Used to indicate if either or both of the :output or :processor graphs are output.") {|arg| arg.to_sym},
  ]
end

Instance Method Details

#each_statement {|statement| ... }

This method returns an undefined value.

Iterates the given block for each RDF statement in the input.

Reads to graph and performs expansion if required.

Yields:

  • (statement)

Yield Parameters:



392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
# File 'vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader.rb', line 392

def each_statement(&block)
  if block_given?
    unless @processed || @root.nil?
      # Add prefix definitions from host defaults
      @host_defaults[:uri_mappings].each_pair do |prefix, value|
        prefix(prefix, value)
      end

      # parse
      parse_whole_document(@doc, RDF::URI(base_uri))

      def extract_script(el, input, type, options, &block)
        add_debug(el, "script element of type #{type}")
        begin
          # Formats don't exist unless they've been required
          case type.to_s
          when 'application/csvm+json' then require 'rdf/tabular'
          when 'application/ld+json'   then require 'json/ld'
          when 'application/rdf+xml'   then require 'rdf/rdfxml'
          when 'text/ntriples'         then require 'rdf/ntriples'
          when 'text/turtle'           then require 'rdf/turtle'
          end
        rescue LoadError
        end

        if reader = RDF::Reader.for(content_type: type.to_s)
          add_debug(el, "=> reader #{reader.to_sym}")
          # Wrap input in a RemoteDocument with appropriate content-type and base
          doc = if input.is_a?(String)
            RDF::Util::File::RemoteDocument.new(input,
                                                options.merge(
                                                  content_type: type.to_s,
                                                  base_uri: base_uri
                                                ))
          else
            input
          end
          reader.new(doc, options).each(&block)
        else
          add_debug(el, "=> no reader found")
        end
      end

      # Look for Embedded RDF/XML
      unless @root.xpath("//rdf:RDF", "rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#").empty?
        extract_script(@root, @doc, "application/rdf+xml", @options) do |statement|
          @repository << statement
        end
      end

      # Look for Embedded scripts
      @root.css("script[type]").each do |el|
        type = el.attribute("type")

        text = el.inner_html.sub(%r(\A\s*\<!\[CDATA\[)m, '').sub(%r(\]\]>\s*\Z)m, '')

        extract_script(el, text, type, @options) do |statement|
          @repository << statement
        end
      end

      # Look for Embedded microdata
      unless @root.xpath("//@itemscope").empty?
        begin
          require 'rdf/microdata'
          add_debug(@doc, "process microdata")
          @repository << RDF::Microdata::Reader.new(@doc, options)
        rescue LoadError
          add_debug(@doc, "microdata detected, not processed")
        end
      end

      # Perform property copying
      copy_properties(@repository) if @options[:reference_folding]

      # Perform vocabulary expansion
      expand(@repository) if @options[:vocab_expansion]

      @processed = true
    end

    # Return statements in the default graph for
    # statements in the associated named or default graph from the
    # processed repository
    @repository.each do |statement|
      case statement.graph_name
      when nil
        yield statement if @options[:rdfagraph].include?(:output)
      when RDF::RDFA.ProcessorGraph
        yield RDF::Statement.new(*statement.to_triple) if @options[:rdfagraph].include?(:processor)
      end
    end

    if validate? && log_statistics[:error]
      raise RDF::ReaderError, "Errors found during processing"
    end
  end
  enum_for(:each_statement)
end

#each_triple {|subject, predicate, object| ... }

This method returns an undefined value.

Iterates the given block for each RDF triple in the input.

Yields:

  • (subject, predicate, object)

Yield Parameters:



500
501
502
503
504
505
506
507
# File 'vendor/bundler/ruby/2.5.0/bundler/gems/rdf-rdfa-2ba4cee3e285/lib/rdf/rdfa/reader.rb', line 500

def each_triple(&block)
  if block_given?
    each_statement do |statement|
      block.call(*statement.to_triple)
    end
  end
  enum_for(:each_triple)
end