Skip to content

Resolved problem with title.text being None#53

Closed
Telofy wants to merge 1 commit into
buriy:masterfrom
Telofy:master
Closed

Resolved problem with title.text being None#53
Telofy wants to merge 1 commit into
buriy:masterfrom
Telofy:master

Conversation

@Telofy
Copy link
Copy Markdown

@Telofy Telofy commented Nov 6, 2014

I’ve gotten this error:

Traceback (most recent call last):
  File "bin/fetcher", line 46, in <module>
    sys.exit(htmlfx.fetcher.run())
  File "/home/ferret/html-fx/htmlfx/fetcher.py", line 347, in run
    blacklist, default_extractor).listen()
  File "/home/ferret/html-fx/htmlfx/fetcher.py", line 259, in listen
    item = self.process(feed_item)
  File "/home/ferret/html-fx/htmlfx/fetcher.py", line 232, in process
    'title': gist.title,
  File "/home/ferret/html-fx/src/utilofies/utilofies/stdlib.py", line 97, in __get__
    obj.__dict__[self.__name__] = self.func(obj)
  File "/home/ferret/html-fx/htmlfx/extractor.py", line 85, in title
    return self.readability.title()
  File "/home/ferret/.buildout/eggs/readability_lxml-0.3.0.2-py2.7.egg/readability/readability.py", line 136, in title
    return get_title(self._html(True))
  File "/home/ferret/.buildout/eggs/readability_lxml-0.3.0.2-py2.7.egg/readability/htmls.py", line 46, in get_title
    if title is None or len(title.text) == 0:
TypeError: object of type 'NoneType' has no len()

Do you think my change addresses it properly?

@buriy
Copy link
Copy Markdown
Owner

buriy commented Nov 6, 2014

Hi Telofy,

Thanks, you're right,

I've fixed it at one place but haven't fixed at another place.
Compare with line 59 at the same file:
https://github.com/Telofy/python-readability/blob/a355c6ea72961a431cbc3a8fac35557188543e5c/readability/htmls.py#L59

I don't remember yet why it's there and why I don't use your way...

Anyway, I'll fix it over the weekend.

BTW: I finally started to use it myself (in another project), now can't decide where to write a long post on how to apply the module correctly to get not 95% of article texts correctly but rather 99.5% of them.

@Telofy
Copy link
Copy Markdown
Author

Telofy commented Nov 7, 2014

Oh, 99.5% sounds tremendous. Please do link me that blog post when it’s done!

@Karmak23
Copy link
Copy Markdown

Any news on this PR ?

in latest readability from PyPI, I got this :

Traceback (most recent call last):
  File "test.py", line 77, in <module>
    test()
  File "test.py", line 62, in test
    extractor = ftr.process(url)
  File "/home/olive/sources/python-ftr/ftr/process.py", line 148, in ftr_process
    if extractor.process(html=content):
  File "/home/olive/sources/python-ftr/ftr/extractor.py", line 620, in process
    self._auto_extract_if_failed()
  File "/home/olive/sources/python-ftr/ftr/extractor.py", line 512, in _auto_extract_if_failed
    title = readabilitized.title().strip()
  File "/home/olive/.virtualenvs/1flow/lib/python2.7/site-packages/readability/readability.py", line 136, in title
    return get_title(self._html(True))
  File "/home/olive/.virtualenvs/1flow/lib/python2.7/site-packages/readability/htmls.py", line 46, in get_title
    if title is None or len(title.text) == 0:
TypeError: object of type 'NoneType' has no len()

This happens on http://wheelyric.com/lyrics/121#2. For information, I use readability as a fallback / helper in python-ftr.

@Karmak23
Copy link
Copy Markdown

PS: I know why you test explicitely is not None at the other place: that's probably because you got FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead. at some point while coding.

@buriy buriy closed this in e4bcbe5 Mar 16, 2015
@buriy
Copy link
Copy Markdown
Owner

buriy commented Mar 16, 2015

Could you use a github version or do you need a pypi release?

@buriy
Copy link
Copy Markdown
Owner

buriy commented Mar 16, 2015

Released both ( https://pypi.python.org/pypi/readability-lxml and https://github.com/buriy/python-readability/releases/tag/0.3.0.6 )
I do "wget http://wheelyric.com/lyrics/121#2" and get
Length: 0 [text/html]

P.S. I'm going to finish soon v0.5 with py3 support, improved logging and on a file-based news sites downloader.
Sorry for delay.

@Karmak23
Copy link
Copy Markdown

No problem for the delay, we all have busy lives ;-)
Thanks for the release. Nice to have pushed it on pypi !
I must leave now but will update/close the issue given my result when I come back.

regards,

@Karmak23
Copy link
Copy Markdown

Just to let you know, version 0.3.0.6 works perfectly. Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants