Resolved problem with title.text being None by Telofy · Pull Request #53 · buriy/python-readability

Telofy · 2014-11-06T13:54:10Z

I’ve gotten this error:

Traceback (most recent call last):
  File "bin/fetcher", line 46, in <module>
    sys.exit(htmlfx.fetcher.run())
  File "/home/ferret/html-fx/htmlfx/fetcher.py", line 347, in run
    blacklist, default_extractor).listen()
  File "/home/ferret/html-fx/htmlfx/fetcher.py", line 259, in listen
    item = self.process(feed_item)
  File "/home/ferret/html-fx/htmlfx/fetcher.py", line 232, in process
    'title': gist.title,
  File "/home/ferret/html-fx/src/utilofies/utilofies/stdlib.py", line 97, in __get__
    obj.__dict__[self.__name__] = self.func(obj)
  File "/home/ferret/html-fx/htmlfx/extractor.py", line 85, in title
    return self.readability.title()
  File "/home/ferret/.buildout/eggs/readability_lxml-0.3.0.2-py2.7.egg/readability/readability.py", line 136, in title
    return get_title(self._html(True))
  File "/home/ferret/.buildout/eggs/readability_lxml-0.3.0.2-py2.7.egg/readability/htmls.py", line 46, in get_title
    if title is None or len(title.text) == 0:
TypeError: object of type 'NoneType' has no len()

Do you think my change addresses it properly?

buriy · 2014-11-06T20:11:53Z

Hi Telofy,

Thanks, you're right,

I've fixed it at one place but haven't fixed at another place.
Compare with line 59 at the same file:
https://github.com/Telofy/python-readability/blob/a355c6ea72961a431cbc3a8fac35557188543e5c/readability/htmls.py#L59

I don't remember yet why it's there and why I don't use your way...

Anyway, I'll fix it over the weekend.

BTW: I finally started to use it myself (in another project), now can't decide where to write a long post on how to apply the module correctly to get not 95% of article texts correctly but rather 99.5% of them.

Telofy · 2014-11-07T03:50:37Z

Oh, 99.5% sounds tremendous. Please do link me that blog post when it’s done!

Karmak23 · 2015-03-16T16:04:25Z

Any news on this PR ?

in latest readability from PyPI, I got this :

Traceback (most recent call last):
  File "test.py", line 77, in <module>
    test()
  File "test.py", line 62, in test
    extractor = ftr.process(url)
  File "/home/olive/sources/python-ftr/ftr/process.py", line 148, in ftr_process
    if extractor.process(html=content):
  File "/home/olive/sources/python-ftr/ftr/extractor.py", line 620, in process
    self._auto_extract_if_failed()
  File "/home/olive/sources/python-ftr/ftr/extractor.py", line 512, in _auto_extract_if_failed
    title = readabilitized.title().strip()
  File "/home/olive/.virtualenvs/1flow/lib/python2.7/site-packages/readability/readability.py", line 136, in title
    return get_title(self._html(True))
  File "/home/olive/.virtualenvs/1flow/lib/python2.7/site-packages/readability/htmls.py", line 46, in get_title
    if title is None or len(title.text) == 0:
TypeError: object of type 'NoneType' has no len()

This happens on http://wheelyric.com/lyrics/121#2. For information, I use readability as a fallback / helper in python-ftr.

Karmak23 · 2015-03-16T16:07:14Z

PS: I know why you test explicitely is not None at the other place: that's probably because you got FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead. at some point while coding.

buriy · 2015-03-16T16:21:00Z

Could you use a github version or do you need a pypi release?

buriy · 2015-03-16T16:50:47Z

Released both ( https://pypi.python.org/pypi/readability-lxml and https://github.com/buriy/python-readability/releases/tag/0.3.0.6 )
I do "wget http://wheelyric.com/lyrics/121#2" and get
Length: 0 [text/html]

P.S. I'm going to finish soon v0.5 with py3 support, improved logging and on a file-based news sites downloader.
Sorry for delay.

Karmak23 · 2015-03-16T17:10:28Z

No problem for the delay, we all have busy lives ;-)
Thanks for the release. Nice to have pushed it on pypi !
I must leave now but will update/close the issue given my result when I come back.

regards,

Karmak23 · 2015-03-19T23:53:13Z

Just to let you know, version 0.3.0.6 works perfectly. Thanks !

Resolved problem with title.text being None

a355c6e

buriy closed this in e4bcbe5 Mar 16, 2015

Jujustella30 approved these changes Apr 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolved problem with title.text being None#53

Resolved problem with title.text being None#53
Telofy wants to merge 1 commit into
buriy:masterfrom
Telofy:master

Telofy commented Nov 6, 2014

Uh oh!

buriy commented Nov 6, 2014

Uh oh!

Telofy commented Nov 7, 2014

Uh oh!

Karmak23 commented Mar 16, 2015

Uh oh!

Karmak23 commented Mar 16, 2015

Uh oh!

buriy commented Mar 16, 2015

Uh oh!

buriy commented Mar 16, 2015

Uh oh!

Karmak23 commented Mar 16, 2015

Uh oh!

Karmak23 commented Mar 19, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Telofy commented Nov 6, 2014

Uh oh!

buriy commented Nov 6, 2014

Uh oh!

Telofy commented Nov 7, 2014

Uh oh!

Karmak23 commented Mar 16, 2015

Uh oh!

Karmak23 commented Mar 16, 2015

Uh oh!

buriy commented Mar 16, 2015

Uh oh!

buriy commented Mar 16, 2015

Uh oh!

Karmak23 commented Mar 16, 2015

Uh oh!

Karmak23 commented Mar 19, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants