Skip to content

Commit

Permalink
don't convert TMM HTML to Markdown
Browse files Browse the repository at this point in the history
...i.e. HTML _is_ allowed in certain Fulcrum metadata fields :-)
  • Loading branch information
conorom committed Aug 17, 2023
1 parent 07f49b4 commit 54ea71a
Show file tree
Hide file tree
Showing 5 changed files with 0 additions and 62 deletions.
1 change: 0 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,6 @@ gem 'pragmatic_segmenter', '~> 0.3'
gem 'prawn', '~> 2.2'

gem 'redcarpet', '~> 3.5.1'
gem 'reverse_markdown'

# see HELIO-4484 and https://github.com/samvera/hyrax/pull/5961
gem 'redlock', '>= 0.1.2', '< 2.0'
Expand Down
3 changes: 0 additions & 3 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -901,8 +901,6 @@ GEM
sass-rails
twitter-bootstrap-rails
retriable (3.1.2)
reverse_markdown (1.1.0)
nokogiri
rexml (3.2.5)
riiif (1.4.1)
railties (>= 4.2, < 6)
Expand Down Expand Up @@ -1248,7 +1246,6 @@ DEPENDENCIES
resque (~> 2.4.0)
resque-pool
resque-web (~> 0.0.12)
reverse_markdown
riiif (= 1.4.1)
rsolr (~> 2.0.1)
rspec-context-private
Expand Down
8 changes: 0 additions & 8 deletions app/services/html_to_markdown_service.rb

This file was deleted.

26 changes: 0 additions & 26 deletions lib/tasks/tmm/tmm_csv_monograph_create_update.rake
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,6 @@ namespace :heliotrope do
# in order to offer the ability to blank out metadata we need to merge in some nils
attrs = blank_metadata.merge(attrs)

# TMM has some fields with HTML tags in it. This functionality will have to be manually tested as...
# part of HELIO-2298
attrs = maybe_convert_to_markdown(attrs)

# sending new_monograph param here because of a weird FCREPO bug that affects Hyrax work *creation* only
# https://github.com/samvera/hyrax/issues/3527
attrs = cleanup_characters(attrs, new_monograph)
Expand Down Expand Up @@ -141,28 +137,6 @@ namespace :heliotrope do
puts "***TITLE ROW HAS UNEXPECTED VALUES!*** These columns will be skipped: #{unexpecteds.join(', ')}\n\n" if unexpecteds.present?
end

def maybe_convert_to_markdown(attrs)
attrs_out = {}
attrs.each do |key, value|
if value.present?
# TODO: maybe stop converting HTML to Markdown as HTML should work just fine in Fulcrum fields, theoretically
# otherwise a check like this might be better than listing fields:
# if ActionController::Base.helpers.strip_tags(value) != value
attrs_out[key] = if ['title', 'description'].include? key.downcase
# 1) HTMLEntities is cleaning up the many HTML entity and decimal codes in the TMM HTML data
# 2) the calls to gsub are getting rid of an inordinate number of non-breaking spaces,...
# which appear in large numbers in the TMM data for seemingly no reason.
Array(HtmlToMarkdownService.convert(HTMLEntities.new.decode(value.first.gsub('&#160;', ' ').gsub('&nbsp;', ' '))))
else
value
end
else
attrs_out[key] = nil
end
end
attrs_out
end

# this method expects HTMLEntities to have done its work in converting entities and decimal codes
# TODO: this should happen somewhere else -- in RowData, or even in a before_save callback on both Monographs and FileSets?
def cleanup_characters(attrs, new_monograph)
Expand Down
24 changes: 0 additions & 24 deletions spec/services/html_to_markdown_service_spec.rb

This file was deleted.

0 comments on commit 54ea71a

Please sign in to comment.