You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PHPHtmlParser is a simple, flexible, html parser which allows you to select tags using any css selector, like jQuery. The goal is to assiste in the development of tools which require a quick, easy way to scrap html, whether it's valid or not! This project was original supported by [sunra/php-simple-html-dom-parser](https://github.com/sunra/php-simple-html-dom-parser) but the support seems to have stopped so this project is my adaptation of his previous work.
22
23
23
24
Install
24
25
-------
25
26
26
-
This package can be found on [packagist](https://packagist.org/packages/paquettg/php-html-parser) and is best loaded using [composer](http://getcomposer.org/). We support php 5.4, 5.5, and hhvm 2.3.
27
+
This package can be found on [packagist](https://packagist.org/packages/paquettg/php-html-parser) and is best loaded using [composer](http://getcomposer.org/). We support php 5.6, 7.0, and hhvm 2.3.
27
28
28
29
Usage
29
30
-----
30
31
31
32
You can find many examples of how to use the dom parser and any of its parts (which you will most likely never touch) in the tests directory. The tests are done using PHPUnit and are very small, a few lines each, and are a great place to start. Given that, I'll still be showing a few examples of how the package should be used. The following example is a very simplistic usage of the package.
32
33
33
34
```php
35
+
// Assuming you installed from Composer:
36
+
require "vendor/autoload.php";
34
37
use PHPHtmlParser\Dom;
35
38
36
39
$dom = new Dom;
@@ -47,6 +50,8 @@ Loading Files
47
50
You may also seamlessly load a file into the dom instead of a string, which is much more convinient and is how I except most developers will be loading the html. The following example is taken from our test and uses the "big.html" file found there.
48
51
49
52
```php
53
+
// Assuming you installed from Composer:
54
+
require "vendor/autoload.php";
50
55
use PHPHtmlParser\Dom;
51
56
52
57
$dom = new Dom;
@@ -78,6 +83,8 @@ Loading Url
78
83
Loading a url is very similar to the way you would load the html from a file.
79
84
80
85
```php
86
+
// Assuming you installed from Composer:
87
+
require "vendor/autoload.php";
81
88
use PHPHtmlParser\Dom;
82
89
83
90
$dom = new Dom;
@@ -92,6 +99,8 @@ $html = $dom->outerHtml; // same result as the first example
92
99
What makes the loadFromUrl method note worthy is the `PHPHtmlParser\CurlInterface` parameter, an optional second parameter. By default, we use the `PHPHtmlParser\Curl` class to get the contents of the url. On the other hand, though, you can inject your own implementation of CurlInterface and we will attempt to load the url using what ever tool/settings you want, up to you.
93
100
94
101
```php
102
+
// Assuming you installed from Composer:
103
+
require "vendor/autoload.php";
95
104
use PHPHtmlParser\Dom;
96
105
use App\Services\Connector;
97
106
@@ -108,10 +117,12 @@ Loading Strings
108
117
Loading a string directly, with out the checks in `load()` is also easely done.
109
118
110
119
```php
120
+
// Assuming you installed from Composer:
121
+
require "vendor/autoload.php";
111
122
use PHPHtmlParser\Dom;
112
123
113
124
$dom = new Dom;
114
-
$dom->loadStr('<html>String</html>', [])
125
+
$dom->loadStr('<html>String</html>', []);
115
126
$html = $dom->outerHtml;
116
127
```
117
128
@@ -123,6 +134,8 @@ Options
123
134
You can also set parsing option that will effect the behavior of the parsing engine. You can set a global option array using the `setOptions` method in the `Dom` object or a instance specific option by adding it to the `load` method as an extra (optional) parameter.
$dom->load('http://gmail.com'); // will not have whitespaceTextNode set to false.
138
151
```
139
152
140
-
At the moment we support 3 options, strict, whitespaceTextNode and enforceEncoding. Strict, by default false, will throw a `StrickException` if it find that the html is not strict complient (all tags must have a clossing tag, no attribute with out a value, etc.).
153
+
At the moment we support 7 options.
154
+
155
+
**Strict**
156
+
157
+
Strict, by default false, will throw a `StrickException` if it find that the html is not strict complient (all tags must have a clossing tag, no attribute with out a value, etc.).
158
+
159
+
**whitespaceTextNode**
141
160
142
161
The whitespaceTextNode, by default true, option tells the parser to save textnodes even if the content of the node is empty (only whitespace). Setting it to false will ignore all whitespace only text node found in the document.
143
162
163
+
**enforceEncoding**
164
+
144
165
The enforceEncoding, by default null, option will enforce an charater set to be used for reading the content and returning the content in that encoding. Setting it to null will trigger an attempt to figure out the encoding from within the content of the string given instead.
145
166
167
+
**cleanupInput**
168
+
169
+
Set this to `true` to skip the entire clean up phase of the parser. If this is set to true the next 3 options will be ignored. Defaults to `false`.
170
+
171
+
**removeScripts**
172
+
173
+
Set this to `false` to skip removing the script tags from the document body. This might have adverse effects. Defaults to `true`.
174
+
175
+
**removeStyles**
176
+
177
+
Set this to `false` to skip removing of style tags from the document body. This might have adverse effects. Defaults to `true`.
178
+
179
+
**preserveLineBreaks**
180
+
181
+
Preserves Line Breaks if set to `true`. If set to `false` line breaks are cleaned up as part of the input clean up process. Defaults to `false`.
182
+
146
183
Static Facade
147
184
-------------
148
185
@@ -181,3 +218,14 @@ $tag = $a->getTag();
181
218
$tag->setAttribute('class', 'foo');
182
219
echo $a->getAttribute('class'); // "foo"
183
220
```
221
+
222
+
It is also possible to remove a node from the tree. Simply call the `delete` method on any node to remove it from the tree. It is important to note that you should unset the node after removing it from the `DOM``, it will still take memory as long as it is not unset.
0 commit comments