Parsing HTML is easy. Libraries like Beautiful Soup give you an compact and straight forward interface to process websites in your preferred programming language. But this is only the first step. The interesting question is: How to extract the meaningful content of HTML? I tried to find a answer to this questions during the last […]