Quantcast
Channel: The Unofficial Scrapebox FAQ
Viewing all 240 articles
Browse latest View live

Why does scrapebox freeze or hang – Fast Poster

$
0
0

Why does scrapebox freeze or hang - Fast Poster

There are many reasons why scrapebox can freeze or hang.

Before figuring out what is causing Scrapebox to freeze, you need to identify what type of freeze is happening.

Black Window -   If your Scrapebox window has gone completely black, then generally it means that it froze up because of a lack of resources.  You ran out of available processor power or you ran out of Ram.

You can also see a black window if you are using RDP to connect to a remote server such as a VPS or Dedicated server.  This happens because your pc and the server just aren't exchanging data fast enough.  Generally if you leave this alone it will fix its self, or you can log out of your RDP session and log back in.  If you are constantly getting black windows using RDP, then its either your pc or the server just doesn't have enough resources, or the internet connection is too slow or too busy.

If your constantly getting black windows on your local machine then you probably need a better processor or more Ram or you need to not run so many things on the pc at once.

Sockets/Ports waiting - As you run scrapebox, windows opens up connections and closes them.  The is true for almost every part of scrapebox, its how windows works.  If you have too many connections going and windows doesn't correctly close down the open connections they can stay open until the TCP timer shuts them down.  If this happens, windows will open all of the available ports that it can allocate.  Meaning all ports are open, not being used and you can't do anything.

When this happens the ScrapeBox window can be moved around and everything appears normal, but the numbers of posting/scraping etc... don't go up, they just stay at what they are.  This behavior is especially true with free proxies and more prevelant in machines running versions of windows prior to Vista Service Pack 2, including XP.  Although Vista and 7 are still susceptible to this.

The fix here is to turn your connections down and your timeouts up, or if it never kicks in, then restart your pc.

White Ghost Window/Not Responding - There is a white ghost window over top of the scrapebox window and it says (Not Responding) at the top.  This generally means that scrapebox is just busy and is not currently able to display the window.  Most of the time if you leave it alone and give it some time, it will come back and everything will be fine.

If leaving it alone isn't fixing the problem then you can check out the following reasons that its freezing.  Here are the most common ones:

  • Malformed data
  • Antivirus / Firewall Software
  • TCP stack
  • Resources

Malformed data

This includes anyfiles you load into scrapebox. If you have malformed urls, such as:

http:/ww.go/bad-data

or if you have poorly spun text.

{anchor1||anchor2|{

This could be in any of the fields.

Checking for Malformed data

Back to basics - Create a new names file, new emails file, new comments file, and new websites file.  Put only one line of text in each file. Do not use any spun text.  If you are able to comment to the same list that was freezing before then its likely it was somewhere in your names/emails/websites/comments files.  If scrapebox still hangs then its likely that the issue is in the blog list you are trying to comment to and you need to check for bad urls.

URLs - Uncheck the "Randomize comment poster blogs list" under settings. Then post with scrapebox. When it freezes note the position where it froze. Like list status 500/4563. Then also note your connections settings, for instance if you had 50 connections going at once.

Then import your list into the blog commenter section and hit the E. Then look for line 500 or wherever the list stopped at. Then look at 50 ulrs before and after that line (or whatever your connections were set to). If you find any malformed urls fix or delete them.

Malformed urls can be anything that is not formatted right.  Like: http:/ww.go/bad-data

Spin Syntax - Load your files in the commenter section.  Hit test comments.  Hit the spin again button 20-30 times rapidly.  If scrapebox locks up you likely have bad spin syntax coding.  Note that just because it doesn't lock up in this step doesn't mean that your spin syntax isn't the cause of scrapebox freezing and you should manually double check it.

AntiVirus / Firewall Software

Antivirus and Firewall software can shut down scrapeboxes access to the internet and cause all sorts of issues.  The simple solution here is temporarily disable all Antivirus and Firewall software including windows firewall and see if solves your problem.

TCP Stack

Half-open connections limit can cause an issue.  Generally this is only true if you are running Windows Vista SP1 or older, including XP.  The default half-open connections limit was removed by Microsoft in Windows Vista SP2 and newer, including Windows 7.  If it’s a half-open connections problem, Windows Event Viewer will show an event saying "TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts".  If this is the case you just ran out of available connections.  The quickest fix is to restart your pc.  If you keep having this issue, then try running less simultaneous instances of scrapebox or turning your connections down and your timeouts up.  Upgrading to Vista SP 2 or newer will ultimately fix this problem.

Resources

Lack of resources can also cause this.  Not a fast enough processor, not enough RAM.  Quick fixes here are to not run as many instances of scrapebox at once, not run as many other things in the background while you are running scrapebox.   Turn your connections down, and your timeouts up.  Don't work with as large of a list.  Most of what scrapebox does is stored in memory(RAM) and the larger the list you work with the more space this is going to take.  If the memory fills up, things can go bad.

You can of course always upgrade your processor/RAM.

Back to Basics

In general its important to remember the basics, especially when none of the above works.

When all else fails

You can either contact support: http://www.scrapebox.com/contact-us

or

Take a nap.

or both.  :)


Can scrapebox use Socks 4 and Socks 5 proxies?

$
0
0

Can scrapebox use Socks 4 and Socks 5 proxies?

Yes scrapebox can use SOCKS proxies, but you need to append a S to the beginning of each proxy. Like

195.195.195.195:80

Would be

S195.195.195.195:80

How do I filter adult urls or content in scrapebox?

$
0
0

How do I filter adult urls or content in scrapebox?

Load the list of urls you want to filter in the Urls harvested section.  Then go to remove/filter >> remove urls containing entries from.

Then it will ask you to load a text file.  Take the below terms and any other you want to add and put them in a text file and save them.  Then load the list when prompted by scrapebox, after clicking remove urls containing entries from.

Of course some of these terms could filter out good urls as well, such as "bride" for instance, but this is the best alternative for filtering adult urls.

List:

sex
porn
pron
szex
xxx
x-live
x-video
xvideo
hentai
erotic
chick
tit
boob
slut
anal
poker
babe
blonde
brunette
russian
bride
fuk
redhead
penis
dick
blowjob
oral
gay
lesbian
pussy
vagina
gangbang
bondage
adult
teen
girl
woman
dirty
fuck
ass
bitch
shit
butt

If you export failed entries from posting, then repost to the failed ones, why do more succeed the 2nd time around?

$
0
0

If you export failed entries from posting, then repost to the failed ones, why do more succeed the 2nd time around?

There are many reason for this behavior.
- The site could have been off line or overloaded the first time around
- The connection could have timed out
- The proxy used the first time could have been bad

All this could mean that a 2nd. run could post some additional comments.  There are too many variables to try and control every time.  Best practice to maximize a run, is to lower connection and increase timeouts.  If you need maximum links from your runs, plan on reposing to the failed.

How can I scrape all the urls for a domain?

$
0
0

How can I scrape all the urls for a domain?

You can trim the url to root, so you have the domain homepage.  Then you can use the site operator to scrape the addtional urls.

site:http://www.rootdomain.com

Tip:  When using search operators, its important that you understand what operators do and how they work with the different engines.  Some Google searches on search operators will lend a lot of useful information.

Fast and Slow Poster Error Codes

$
0
0

Fast and Slow Poster Error Codes

Login,Failed - This means that the blog requires you to login in order to post a comment.  Since Scrapebox does not support this, the post failed.

 

What tokens can be used in the Learning Mode Addon Poster?

$
0
0

What tokens can be used in the Learning Mode Addon Poster?

The Learning Mode Poster Addon can use the below tokens.  Note: these tokens will not work with Fast and Slow poster in the main scrapebox window.  For the tokens that work with fast and slow poster, see here.

  • %USERNAME% will be replaced with the users name from the file loaded in for usernames.
  • %USEREMAIL%  will be replaced with the users email from the file loaded in for emails.
  • %USERURL% will be replaced with the users website from the file loaded in the user url slot.

How do I specify the Anchor I want to use for my links?

$
0
0

How do I specify the Anchor I want to use for my links?

Whatever you place in the names section is what will be used as your anchor.


How do you lock a specific anchor/name to a specific url?

$
0
0

How do you lock a specific anchor/name to a specific url?

You can lock a specific anchor to a specific url when posting.  This allows you to post to many urls with different anchors in the same posting run.   This feature is called Link Lock.

You use link lock in the websites.txt file.  It should be formatted like this:

http://www.site.com {anchor1|anchor2|etc..}
http://www.site1.com {anchor1|anchor2|etc..}

You still are required place something in the names.txt file, as this is the backup file that will be used if something goes wrong.  For instance if your spin syntax was messed up, scrapebox may not be able to understand what anchor you wanted, so it would use a line from the names file.

Does scrapebox come with proxies to use?

$
0
0

Does scrapebox come with proxies to use?

Scrapebox has a proxy harvester, where it will go and scrape proxies from websites for you.  It comes included with some sources, but more importantly it gives you the option to add your own custom sources.

How do I add my own custom proxy sources?

$
0
0

How do I add my own custom proxy sources?

In the Select Engines & Proxies section click:

Manage Proxies -> Harvest Proxies -> Add Source

You can add both HTTP and SOCKS proxy sources.

Proxy sources must be a regular web page that lists proxies and does not require a login.   For instance, if the list is contained in a flash popup or a downloadable list it won't work.  Only a regular HTML style web page will work.

Additionally the proxies have to be updated on the very url you specify.  So if you specify a home page of a site that lists new proxies, if the newest proxies are listed on the home page it will work.  If the proxies are listed on new sub pages of the site, scrapebox will Not spider the site to find the pages with the proxies. So you need to find a static url that is updated.

What are the minimum system requirements for running Scrapebox?

$
0
0

What are the minimum system requirements for running Scrapebox?

Scrapebox doesn't have an officially stated minimum requirements.  However generally speaking its not terribly taxing on a system.

It does require the Windows OS.  It has been tested on Windows XP, Windows Vista and Windows 7, as well as Windows 2003 and 2008 server.

Generally speaking I have seen it run on some pretty minimal systems and VPS, however when you want to run more then 1 instance it obviously begins to consume more resources for each instance you run.  Also working with larger files will use more resources.  So the more you want to use it the more powerful of a system you will need.

Scrapebox says X number were successful when posting, but the links aren’t there

$
0
0

Scrapebox says X number were successful when posting, but the links aren't there

If you have ever commented on a blog manually then you might remember that sometimes you will get a notification from the blog saying that the comment was received.  However many times you get no notification from the blog.  When scrapebox receives that notification from the blog that the comment was received it reports it as successful.  When it receives no notification it assumes that it failed.  This does not mean that it did in fact fail, but scrapebox can not determine if it was successful or not.   Many of the blogs that show failed in scrapebox, actually received your comment.

The blog commenter is just that, a commenter.  It places the comments.  Most comments will then go to moderation and be approved or denied.  Some comments will be placed on auto approve blogs. Which is where the comment is automatically approved.  So the successful post, just means the comment was submitted.  It does not mean that the link is live on the site, that will be up to the admin to approve it or deny it.  Unless it is auto approve, in which case it goes live immediately.  If you check links right after you comment, those are going to be the ones that are auto approve.

How do I scrape all of the backlinks/inlinks to any given url?

If I reformat my hard drive or get a new computer what do I do?


How do the different Mac and PC licenses Work?

$
0
0

How do the different Mac and PC licenses Work?

Mac and PC licenses can be purchased from our website using the same “Add to Cart” button. After purchase you will be sent the download page link where you can choose to install either the PC or Mac version of ScrapeBox. If you choose to activate the product on Windows, then your license can only be used on Windows machines and if you activate the product on a Mac it can only be used on Mac operating systems.
Switching the license from one platform to another is not possible, for example if you have a PC license and change computers to a Mac you will need to purchase another license to use the Mac version. Likewise if you have a Mac license, you will need to purchase an additional license to also run ScrapeBox on a PC.

Why are my results not showing properly for my given google.tld in rank tracker?

$
0
0

Why are my results not showing properly for my given google.tld in rank tracker?

Google has made changes that affect Rank Tracking

On October 27th 2017 google made changes that will affect rank tracking in Scrapebox Rank Tracker, if you are tracking google ranks.

The basics is that Google used to let you input any google you want, like google.co.uk or google.fr and it would give you results from that google. So if you used google.co.uk you get UK results and google.fr would give you results from France, including local results.

That is no longer the case. Google now gives you results based on the geo location of your IP, REGARDLESS of what google you choose.

So if your IP is in Paris, France and you go to google.com and type in food. You will now get results from Paris, France. If you go to google.co.uk, you will still get results from Paris France.

So the google you choose is irrelevant now, they are only returning results from the ip you choose. So you can go to any google around the world and get the same results.

You can read more about it here:

https://www.theverge.com/2017/10/27/16561848/google-search-always-local-results-ignores-domain

and

https://productforums.google.com/forum/#!topic/websearch/AzcsFmuFPEg/discussion

What this means for Rank Tracking

While you can still manually adjust your location results in a browser, this uses javascript. Scrapebox uses raw sockets and threads, as these provide many advantages, however they do not support javascript.

So the only way to get localized results in the Scrapebox Rank Tracker, is to use an IP from the local you want.

So if you want French results you need a IP from France, if you want German Results you need an ip from Germany etc..

Yes I agree, google is just making it more and more difficult for all of us.

Proxies

I have had some people ask where they can get various ips from specific countries. I wrote up a review of a service that generally can give you fixed ips from a given country (as long as they have ips from that country) and they have a lot of different countries.

The ips stay fixed for the life that you pay for them. Meaning if you buy them and keep paying for 10 months you keep the same ip for 10 months. So once you set it up you don't have to keep messing around with it. For the private proxies you "can" request a new ip every 30 days, but by default the ips stay fixed.

I also asked them to make a discount for my readers and they did. Further note, this is a good discount, which is why this is NOT an affiliate link:

http://scrapeboxfaq.com/squid-scrapebox-proxies-review-and-information

Discount Code: loopline20

 

How to use

Unfortunately the below applies to Scrapebox for Windows Only, this method will Not work on the mac version of Scrapebox, due to how mac works.

If you are wondering how to make rank tracker use different ips for different projects, there is no way to do that specifically in Rank Tracker.

So the way you would do it is to simply run multiple instances of it.

So say you want to rank track from 3 different countries. You setup 3 different scrapebox folders, 1 for each country.

Install rank tracker in each of them and then setup a given countries project(s) in that. So we could do

Google France

Google USA

Google Germany

Setup a folder for each and then in the rank tracker that is attached to the Google France one, you put in the French IP(s).

For the Google USA, you put in the USA ips and for the google Germany put in the German ips.

Then when you run each, it will be using its own ips. Here is a video on how to setup more then 1 instance:

https://www.youtube.com/watch?v=aZzdE6ybu38

Contact Form Poster Says Failed

$
0
0

Contact Form Poster Says Failed

Quite often the contact form poster will failed, but it was actually successful.

Scrapebox looks at the response content when posting and its looking for specific markers to find out if a post is successful or not.

If you are working with a site that has modified the default "thanks for posting" message then scrapebox won't be able to match the success message with one in its list.  So if Scrapebox does not know 100% certain that the post succeeded it assumes it failed.

So you simply need to train scrapebox on what is a successful message for the sites/languages you are working with.

You first need to find the definition file.  So if you look when posting, in the platforms column it will tell you the name.  So for example assume its WP Contact, which is a common one.

Then you go into your

Scrapebox folder Configuration Platforms folder

Find the WPContact.ini file and edit it in notepad or similar.

Your looking for the

Success=

line.

 

You will see several entries there, each separated by the pipe key, which looks like |

So just add on your responses and separate them by pipe keys like it is there already.

Below you can find some additional possible success responses that you may want to add to your definition file. Please note that proceeding and trailing html markers are not "required", just the success text is enough.  However if you do not include html markers then you may get false positives.

wpcf7-mail-sent-ok" role="alert">Thank you for your message. It has been sent.</div>

 

 

 

 

 

Why when I click stop the stop and start buttons grey out but nothing stops and I have to force close with task manager – aka locked threads?

$
0
0

Why when I click stop the stop and start buttons grey out but nothing stops and I have to force close with task manager - aka locked threads?

That means that something has locked 1 or more of the threads.  This can be security software such as  anti-virus, malware checkers and firewalls.   So you should whitelist scrapebox in all security software and then you can whitelist the entire scrapebox folder as well.

Further any program that accesses the internet can lock threads, things like skype, utorrent etc…  So you can try closing down any unneeded programs.  Then if its working you can turn programs back on 1 by 1 to find the culprit.

Further computer optimization software can lock threads so you can shut any such software down.

Take note: that disabling security software (such as anti-virus, malware checkers and firewalls) often only stops new rules form forming, but allows existing rules to still fire.  So you have to fully whitelist in the security software or uninstall the security software(as a test).

Further some security softwar requires you to whitelist in more then one place before it takes effect.

Also note that disabling a router firewall, does actually fully disable it.

Basically you have to sort out what is locking the threads, because scrapebox is forced to wait until all threads are released.  On occasion it can be your operating system that does it, so you can try restarting your machine and/or lowering total connections.

One other thing to note is that this can happen with proxies that keep returning small amounts of data, it won't trigger the timeout because teh connections is still active.  So try a test using no proxies or make sure you are using some quality private proxies.

Lastly if your running mac, you can try lowering the connections.  Mac has terrible error handling when it comes to lots of errors stacking up quickly.  So if there are too many errors stacking up too quick mac can choke, so lowering the threads fixes this.  This is a non issue on windows.

How to build the regex for phone number scraping

$
0
0

How to build the regex for phone number scraping

Your looking to \ before any characters like + and ( or ) but not dashes.

Add a real space when there is a space and then do a

\d{x}

Where X is the quantity of digits in that set.  So you can see my example and then the 3 examples built into scrapebox and make up your masks for your other formats.

 

Ok, lets take one of the included USA examples

1-555-555-5555

With a regex of

\d{1}-\d{3}-\d{3}-\d{4}

 

Now let me break that down

First you backslash out the digits.

Then the d represents the number of digits, hence the d.

Then the number n brackets is the number of digits.

So the first section is

\d{1}

So backslash then d for digits then {1) because there is 1 digit in the first part of the phone number.  So it matches on the

1

That you see in

1-555-555-5555

 

Then dashes are just there.  So the dash after the 1 is

-

So that so far gives us

\d{1}-

Then for the new group of numbers we backslash it out, then d for digits then {3} because there are 3 digits in the next part.

So

\d{3}

That matches on the

555

Which thus far gives us

\d{1}-\d{3}

Which matches

1-555

So hopefully that makes sense.  Then we continue on for the rest, and sense there is then 1 more set of 3 digits that next part of the regex is the same

\d{1}-\d{3}-\d{3}-

Then that last segment has 4 digits so it’s a 4 in there instead of a 3

\d{1}-\d{3}-\d{3}-\d{4}

So that matches

 

1-555-555-5555

 

~~~~~~~~~~~~~~~

 

Lets take another format

+444 333-222 111

The regex would be

\+\d{3} \d{3}-\d{3} \d{3}

 

So you put in the plus, then backslash then d and then the {3} which matches the 444 in that number and then a space.

Then follow thru.

 

~~~~~~~~~~~~~~~

 

Another Example

 

+43(0)7243-50724

\+\d{2}\(\d{1}\)\d{4}-\d{5}

 

~~~~~~~~~~~~~~~

 

Notes: Dashes and spaces  are just there, so you can just put them in, they don't need to be backslashed out.

Plus signs and parentheses DO need to be backslashed out.

 

Any regex should have the leading ^ and ending $ removed. This means match the start and end of the line, however when scraping stuff from HTML the data isn’t going to sit perfectly on the start and end of a line of source code there’s going to be other HTML and content before and after the data being scraped.

The regex here should all work http://www.regexlib.co/Search.aspx?k=phone%20number but if it starts with ^ and ends with $ they simply need to be removed.

Viewing all 240 articles
Browse latest View live