Skip to content

Commit 6e19053

Browse files
committed
Merge pull request #10 from tegansnyder/merge_temp
Merge changes with upstream
2 parents dd90805 + a173fad commit 6e19053

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+5318
-4531
lines changed

.gitattributes

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
/tests export-ignore
2+
/.scrutinizar.yml export-ignore
3+
/.travis.yml export-ignore
4+
/CHANGELOG.md export-ignore
5+
/CONTRIBUTING.md export-ignore
6+
/LICENSE.md export-ignore
7+
/README.md export-ignore
8+
/phpunit.php export-ignore
9+
/phpunit.xml export-ignore

.scrutinizer.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
filter:
2+
paths: [src/*]
3+
excluded_paths: [tests/*]
4+
checks:
5+
php:
6+
code_rating: true
7+
remove_extra_empty_lines: true
8+
remove_php_closing_tag: true
9+
remove_trailing_whitespace: true
10+
fix_use_statements:
11+
remove_unused: true
12+
preserve_multiple: false
13+
preserve_blanklines: true
14+
order_alphabetically: true
15+
fix_php_opening_tag: true
16+
fix_linefeed: true
17+
fix_line_ending: true
18+
fix_identation_4spaces: true
19+
fix_doc_comments: true
20+
tools:
21+
external_code_coverage:
22+
timeout: 600
23+
runs: 3
24+
php_code_coverage: false
25+
php_code_sniffer:
26+
config:
27+
standard: PSR2
28+
filter:
29+
paths: ['src']
30+
php_loc:
31+
enabled: true
32+
excluded_dirs: [vendor, test]
33+
php_cpd:
34+
enabled: true
35+
excluded_dirs: [vendor, test]

.travis.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
language: php
22

33
php:
4-
- 5.4
5-
- 5.5
64
- 5.6
5+
- 7.0
76
- hhvm
87

98
install:
@@ -16,3 +15,5 @@ script:
1615

1716
after_script:
1817
- php vendor/bin/coveralls
18+
- wget https://scrutinizer-ci.com/ocular.phar
19+
- php ocular.phar code-coverage:upload --format=php-clover build/logs/clover.xml

CHANGELOG.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
### Development
2+
3+
## 1.7.0
4+
5+
- Added .scrutinizer.yml to repo
6+
- Reformated code to PSR-1/2
7+
- Improved the test coverage and some small code changes
8+
- Added removeAttribute and removeAllAttributes tag methods fixes #57
9+
- Added replaceNode method implements #52
10+
- Added a delete method. fixes #43
11+
- Added semicolon after &#10 for linebreak preservation. fixes #62
12+
- Removed code that removed <code> tag fixes #60
13+
- Added new test related to #63
14+
- Refactored the nodes into inner and leaf nodes
15+
- Fixed Strings example in README
16+
- Close this header so the markdown will render properly
17+
- Added preserve line break option. Defaults to false.
18+
19+
20+
## 1.6.9
21+
22+
- Added Changelog
23+
- Fixed issue with spaces befor closing tag Fixes #45
24+
- Fixed some code quality issues found by scrutinizer
25+
- Added Scrutinizer to README
26+
- Reformated code to comply with PSR-1/2
27+
- Added preserve line break option. Defaults to false. fixes #40
28+
- Updated the tests
29+
- Added options: cleanupInput, removeScripts and removeStyles
30+
31+
## 1.6.8
32+
33+
- Added comments and reformated some code
34+
- Added support for non-escaped quotes in attribute value fixes #37
35+
- Cleaned up the comments and php docs
36+
- Removed version in composer json
37+
- Updated composer version
38+
- Refactoring out isChild method.
39+
- Updated in code documentation
40+
- Updated composer
41+
42+
$$ 1.6.7
43+
44+
- Added tests for the new array access
45+
- Added feature to allow array usage of html node. fixes #26
46+
- Update HtmlNode.php
47+
- Added test to cover issue #28
48+
- FIX: File name is longer than the maximum allowed path
49+
50+
## 1.6.6
51+
52+
- Replaced preg_replace with mb_ereg_replace
53+
- Added child selector fixes #24
54+
- Updated the dev version of phpunit
55+
56+
## 1.6.5
57+
58+
- Fixed bug when no attribute tags are last tag (with out space). fixes #16
59+
- Fixed some documentation inconsistencies fixes #15
60+
- Made loadStr a public methor Fixes #18
61+
- Update a problem with the README fixes #11
62+
- Added setAttribute to the node fixes #7
63+
- Check if open_basedir is enabled: Dont use CURLOPT_FOLLOWLOCATION
64+
65+
## 1.6.4
66+
67+
- Added tests and updated README
68+
- Updated the tests and moved some files
69+
- Added the option to enforce the encoding
70+
- Fixed a problem with handeling the unknown child exception
71+
- Updated some tests
72+
- Added coverall badge and package
73+
74+
## 1.6.3
75+
76+
- Added initial support for 'strict' parsing option
77+
- Added an optional paramter to enable recursive text
78+
- Added appropriat Options tests
79+
- Changed all exception to specific objects
80+
- Added a whitespaceTextNode option and test
81+
- Added support for an options array
82+
83+
## 1.6.2
84+
85+
- Standardised indentation for easyer reading on github
86+
- Update AbstractNode.php
87+
- Added a test for hhvm in my travis.yml
88+
- Added a LICENSE.md file for MIT
89+
- Added build status to README
90+
- Added travis.yml
91+
- Changed the file name of the abstract node
92+
- fixed code in collection class where instance of arrayIterator is to be rturned
93+
- Updated documentation
94+
- Added a curl interface and a simple curl class.
95+
- Removed the Guzzle dependancy
96+
- Abstracted the Node class as it should have been done in the first place
97+
- Added integrity checks for the cached html
98+
- Added some basic caching of the dom html
99+
- Added a toArray() method to the collection and a test
100+
101+
## 1.6.1
102+
103+
- Moved back to using guzzle so expections are thrown when their was an error with loading a url
104+
- Added tests for the Static Facade Fixed a few issues brought to light from the new tests
105+
- Added a static facade
106+
- Changed encoding to be a local attribute instead of a static attribute
107+
- Solved issue #2 When you attempt to load an html page from a URL using loadFromUrl the encoding is incorrect.
108+
- Added easyer loading of files and urls. Still have a problem with encoding while loading from url.
109+
- Added guzzle and loadFromUrl option
110+
- Fixed an issue with no value attributes
111+
- Added magic and each methods to the collection. Plus some tests
112+
- Added a collection object
113+
- Added charset encoding
114+
- fixed a bug with closing tags If a closing tag did not have an opening tag it would cause the scan to end instead of ignoring the closing tag.

README.md

Lines changed: 52 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,24 +13,27 @@ Special thanks to the original author (@paquettg) and contributors. Thanks!
1313
PHP Html Parser
1414
==========================
1515

16-
Version 1.6.7
16+
Version 1.7.0
1717

1818
[![Build Status](https://travis-ci.org/paquettg/php-html-parser.png)](https://travis-ci.org/paquettg/php-html-parser)
1919
[![Coverage Status](https://coveralls.io/repos/paquettg/php-html-parser/badge.png)](https://coveralls.io/r/paquettg/php-html-parser)
20+
[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/paquettg/php-html-parser/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/paquettg/php-html-parser/?branch=master)
2021

2122
PHPHtmlParser is a simple, flexible, html parser which allows you to select tags using any css selector, like jQuery. The goal is to assiste in the development of tools which require a quick, easy way to scrap html, whether it's valid or not! This project was original supported by [sunra/php-simple-html-dom-parser](https://github.com/sunra/php-simple-html-dom-parser) but the support seems to have stopped so this project is my adaptation of his previous work.
2223

2324
Install
2425
-------
2526

26-
This package can be found on [packagist](https://packagist.org/packages/paquettg/php-html-parser) and is best loaded using [composer](http://getcomposer.org/). We support php 5.4, 5.5, and hhvm 2.3.
27+
This package can be found on [packagist](https://packagist.org/packages/paquettg/php-html-parser) and is best loaded using [composer](http://getcomposer.org/). We support php 5.6, 7.0, and hhvm 2.3.
2728

2829
Usage
2930
-----
3031

3132
You can find many examples of how to use the dom parser and any of its parts (which you will most likely never touch) in the tests directory. The tests are done using PHPUnit and are very small, a few lines each, and are a great place to start. Given that, I'll still be showing a few examples of how the package should be used. The following example is a very simplistic usage of the package.
3233

3334
```php
35+
// Assuming you installed from Composer:
36+
require "vendor/autoload.php";
3437
use PHPHtmlParser\Dom;
3538

3639
$dom = new Dom;
@@ -47,6 +50,8 @@ Loading Files
4750
You may also seamlessly load a file into the dom instead of a string, which is much more convinient and is how I except most developers will be loading the html. The following example is taken from our test and uses the "big.html" file found there.
4851

4952
```php
53+
// Assuming you installed from Composer:
54+
require "vendor/autoload.php";
5055
use PHPHtmlParser\Dom;
5156

5257
$dom = new Dom;
@@ -78,6 +83,8 @@ Loading Url
7883
Loading a url is very similar to the way you would load the html from a file.
7984

8085
```php
86+
// Assuming you installed from Composer:
87+
require "vendor/autoload.php";
8188
use PHPHtmlParser\Dom;
8289

8390
$dom = new Dom;
@@ -92,6 +99,8 @@ $html = $dom->outerHtml; // same result as the first example
9299
What makes the loadFromUrl method note worthy is the `PHPHtmlParser\CurlInterface` parameter, an optional second parameter. By default, we use the `PHPHtmlParser\Curl` class to get the contents of the url. On the other hand, though, you can inject your own implementation of CurlInterface and we will attempt to load the url using what ever tool/settings you want, up to you.
93100

94101
```php
102+
// Assuming you installed from Composer:
103+
require "vendor/autoload.php";
95104
use PHPHtmlParser\Dom;
96105
use App\Services\Connector;
97106

@@ -108,10 +117,12 @@ Loading Strings
108117
Loading a string directly, with out the checks in `load()` is also easely done.
109118

110119
```php
120+
// Assuming you installed from Composer:
121+
require "vendor/autoload.php";
111122
use PHPHtmlParser\Dom;
112123

113124
$dom = new Dom;
114-
$dom->loadStr('<html>String</html>', [])
125+
$dom->loadStr('<html>String</html>', []);
115126
$html = $dom->outerHtml;
116127
```
117128

@@ -123,6 +134,8 @@ Options
123134
You can also set parsing option that will effect the behavior of the parsing engine. You can set a global option array using the `setOptions` method in the `Dom` object or a instance specific option by adding it to the `load` method as an extra (optional) parameter.
124135

125136
```php
137+
// Assuming you installed from Composer:
138+
require "vendor/autoload.php";
126139
use PHPHtmlParser\Dom;
127140

128141
$dom = new Dom;
@@ -137,12 +150,36 @@ $dom->load('http://google.com', [
137150
$dom->load('http://gmail.com'); // will not have whitespaceTextNode set to false.
138151
```
139152

140-
At the moment we support 3 options, strict, whitespaceTextNode and enforceEncoding. Strict, by default false, will throw a `StrickException` if it find that the html is not strict complient (all tags must have a clossing tag, no attribute with out a value, etc.).
153+
At the moment we support 7 options.
154+
155+
**Strict**
156+
157+
Strict, by default false, will throw a `StrickException` if it find that the html is not strict complient (all tags must have a clossing tag, no attribute with out a value, etc.).
158+
159+
**whitespaceTextNode**
141160

142161
The whitespaceTextNode, by default true, option tells the parser to save textnodes even if the content of the node is empty (only whitespace). Setting it to false will ignore all whitespace only text node found in the document.
143162

163+
**enforceEncoding**
164+
144165
The enforceEncoding, by default null, option will enforce an charater set to be used for reading the content and returning the content in that encoding. Setting it to null will trigger an attempt to figure out the encoding from within the content of the string given instead.
145166

167+
**cleanupInput**
168+
169+
Set this to `true` to skip the entire clean up phase of the parser. If this is set to true the next 3 options will be ignored. Defaults to `false`.
170+
171+
**removeScripts**
172+
173+
Set this to `false` to skip removing the script tags from the document body. This might have adverse effects. Defaults to `true`.
174+
175+
**removeStyles**
176+
177+
Set this to `false` to skip removing of style tags from the document body. This might have adverse effects. Defaults to `true`.
178+
179+
**preserveLineBreaks**
180+
181+
Preserves Line Breaks if set to `true`. If set to `false` line breaks are cleaned up as part of the input clean up process. Defaults to `false`.
182+
146183
Static Facade
147184
-------------
148185

@@ -181,3 +218,14 @@ $tag = $a->getTag();
181218
$tag->setAttribute('class', 'foo');
182219
echo $a->getAttribute('class'); // "foo"
183220
```
221+
222+
It is also possible to remove a node from the tree. Simply call the `delete` method on any node to remove it from the tree. It is important to note that you should unset the node after removing it from the `DOM``, it will still take memory as long as it is not unset.
223+
224+
```php
225+
$dom = new Dom;
226+
$dom->load('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
227+
$a = $dom->find('a')[0];
228+
$a->delete();
229+
unset($a);
230+
echo $dom; // '<div class="all"><p>Hey bro, <br /> :)</p></div>');
231+
```

composer.json

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
{
22
"name": "tegansnyder/php-html-parser",
33
"type": "library",
4+
"version": "1.7.0",
45
"description": "An HTML DOM parser. It allows you to manipulate HTML. Find tags on an HTML page with selectors just like jQuery.",
5-
"version": "1.6.4",
66
"keywords": ["html", "dom", "parser"],
77
"homepage": "https://github.com/tegansnyder/php-html-parser",
88
"license": "MIT",
@@ -18,13 +18,9 @@
1818
"homepage": "http://www.tegdesign.com"
1919
}
2020
],
21-
"require": {
22-
"php": ">=5.4",
23-
"paquettg/string-encode": "~0.1.0"
24-
},
2521
"require-dev": {
26-
"phpunit/phpunit": "~4.8.0",
27-
"satooshi/php-coveralls": "~0.6.0",
22+
"phpunit/phpunit": "~5.3.0",
23+
"satooshi/php-coveralls": "~1.0.0",
2824
"mockery/mockery": "~0.9.0"
2925
},
3026
"autoload": {

0 commit comments

Comments
 (0)