Do You Want To Know Pages

Monday, August 17, 2009

MACHINE LEARNING IN ROBOTICS

MACHINE LEARNING IN ROBOTICS

NEPAL COLLEGE OF
INFORMATION TECHNOLOGY
Balkumari, Lalitpur
Pokhara University

Submitted By: -Suman Bhattarai, Santosh Parajuli, Prashanna Jha
Submitted To: - Niranjan Khakurel
7129, 7130, 7117
Elx. & Comm. (NCIT)
4th Sem. (2009/2010)


ABSTRACT
For the robots to be truly flexible they need to learn to adapt to partially know or dynamic environment, to teach themselves new task and to compensate for sensor and effectors defect. For robot to learning its performance based entirely on real world environment feedback, the robots specification and algorithm must be constructed so as to enable data efficient learning. This talk present three example on machine learning on physical robot. First, the ability to get from one place to another. Second, robots whose main sensor is digital camera are often equipped with color map that map each pixel value to an associated color label.Third,we present a technique for autonomous sensor and actuator model induction on mobile robot.


1. INTRODUCTION
The problem of robot learning is essentially one of getting robots to do tasks without the need for explicitly programming them. Programming robots is extremely challenging, for many reasons. Sensors on a robot, such as sonar, behave in a complex unpredictable manner in typical unstructured environments, such as a crowded office building. Thus, understanding sensors is not sufficient; one has to also model how they work in a particular task environment. Robot learning forces us to deal with issue if embedded system, where the learner is situated in an unknown dynamic environment. The issues that arise in robot learning are quite different from those that may arise in say a knowledge acquisition task for an expert system, for example real time constraint may not be important.
In this paper, we view robot learning as a special case of the general problem of machine learning. Machine learning is a subfield of artificial intelligence (AI) whose ultimate goal is to replace explicit programming by teaching. We define teaching broadly in this paper to mean any form of instruction, ranging from examples of the desired behavior, domain knowledge of the task, or even weak performance feedback. Teaching is usually less arduous and more effective than explicit programming. Consider the problem of instructing a human to operate an automated teller machine. We can show the person how to operate the machine by actually going through the motions ourselves. Clearly, this mode of teaching would be far more effective for robots also, rather than painstakingly programming the robot to operate the machine.
If we view the ultimate goal of AI as bringing to reality system such as the R2D2 robot (from the movie star wars), then it is clear that we must fundamental capabilities, such as learning, in the context of real robot (or sufficiently realistic simulated ones).Although we may not reach our ultimate destination for some time to come, there are many intermediate goals along the way which will offer ample terms of useful practical system and scientific insights.



2. What Things Should Robots Learn?

Before delving into the problem of robot learning, it might be instructive to briefly discuss the sorts of knowledge that would be useful for robots to learn. We can distinguish at least three types of such knowledge.

i. Sensor Noise: Most robot sensors, including inexpensive devices such as sonar as well as very expensive systems such as laser range scanners are unreliable. Such transducers sometimes fail to see an object, or alternatively misjudge its distance. Thus, state descriptions computer from robot sensors are bound to be inaccurate to some degree.

ii. Stochastic Actions: Due to sensing errors as well as due to the inherent complexity of the real world, robot actions may rarely appear to be deterministic. For example if the robot picks up an object, the object may slip and fall sometimes and other times the action may be successful.

iii. Real-time Response: Robots must be capable of reactive planning, that is a robot must respond to unforeseen circumstances in real time. For example, a robot operating in an office environment must be ever alert to the possibility of collisions from some unanticipated obstacle in its path, ranging from people moving randomly around to junk left around in corridors.

iv. Online learning: The training data for teaching a robot may not be available offline. Consequently, a robot may be required to explore its environment to collect sufficient samples of the necessary experience. Incrementally is a desirable trait for any robot learning algorithm, since the training data will be acquired only over time.

iv. Limited Training Time: A robot cannot have the luxury of training for months of real-time, although such extended runs are common in simulated systems (e.g. For a learning algorithm to be effective on a real robot it must be able to produce satisfactory results from training experience that can be collected in a few hours or less (although the data collected could be processed offline for a much longer time, if necessary).

vi. Situated Representations: A robot becomes aware of its environment primarily through its sensors. Thus, a learning algorithm must be able to work with the limitations of sensors used. For example, a navigation robot may not be able to sense its exact coordinate location on a map, and must deal with this localization problem.

All of the above factors conspire to make the robot learning problem extremely difficult.

3. APPLICATION:

3.1. HARD TO PROGRAM KNOWLEDGE:

In any given task, we can usually distinguish between information that we can easily hardwire into the robot from that which would involve a lot of human effort, consider the problem of getting a robots to operate a VCR designed for a human. It would be convenient to just show the robot how to operate the VCR by actually going through the motions ourselves. Clearly, this mode of teaching is much simpler and more effective than if we had to write a program for moving and coordinating the robots joints, planning whole arms and specifying the end-point compliances.

3.2. UNKNOWN INFORMATION:-

Sometimes the information necessary to program the robots is simply not readily available. For e.g., we might want to have a robot explore an unknown terrain, say on Mars or in the deep sea. Clearly in such situation it is impressive that the robots be able to learn a map of the environment by exploring it. Or consider a generic household robots- the factory has programmed it to vacuum but it must learn for itself the layout of its buyer home.

3.3. CHANGING ENVIROMENT:-

The world is dynamic place. Objects move around from one place to another, or appear and disappear. Even if we had a complete model of the environment to begin with, this knowledge could quickly become obsolete in a very dynamic environment.


Interaction between environment, performance, knowledge & learning.

There are also slower changes, such as in the calibration of the robots own sensor and effectors. Thus, it would be beneficial if robots could constantly update its knowledge of both its internal and external environment.

3.4. SENSOR NOISES:-

Most cheap-to-build robot sensors, such as sonar, are unreliable. Sonar transducers sometimes fail to see an object, or alternatively misjudged its distance. Thus, state descriptions computed from such sensors are bound to have inaccuracies in them, and some kind of probabilistic averaging is required.

3.5. NON DETERMINISTIC ACTION:-

Since the robot has an incomplete model of its environment, action will not always have similar effects. For example, if the robots pick up an object, the object may slip and fall sometimes, and other times the actions may be successful. Planning become difficult because one has to allow for situations when a given action sequence fails to accomplish a goal.

3.6. REACTIVITY:-

A robot must respond to unforeseen circumstances in real time. For example, a robot crossing a busy street intersection cannot afford the luxury of remaining motionless while it is computing the consequence of its current plan. In terms of learning any reasonable algorithm must be tractable in that every step of the algorithm must terminate quickly.

3.7. LIMITED TRAINING TIME:-

Extended trails of thousand of steps are very common in simulations, but are impractical for real robots. For learning algorithm to be effective on real robots, it must converge in few thousand steps or less. Thus the training time available on robots is very limited.

3.8. GROUNDEDNESS:-

All the information that is available to a robots must come ultimately from its sensor (or be hardwired from the start).since the state information is computed from sensors, a learning algorithm must able to work with the limitations of perceptual devices. For example, a navigation robot may not be able to sense its exact coordinate location on map. Simulation results that assume such exact location information are of limited value for real robots.

3. A 3-D Credit Assignment Problem:-

In this section we present a general characterization of the robot learning problem. Essentially, we view the robot learning problem as one of learning a policy function from some set of sensory states S to some set of actions A. The states and actions can be discrete or continuous, Examples of policy functions include control behaviors for mobile robots, such as avoiding obstacles, following walls, and moving a robot arm to pick up some object. The policies can be stationary that is the mapping is a time invariant function, or it can be non-stationary. The problem of learning policy functions generally involves solving a three-dimensional credit assignment problem, as shown in Figure


4. ADVANCES LEARNING IN ROBOTICS:-

General robots nowadays, such as industrial robots, have minimum number of degree of freedom and minimum variety of sensor only sufficient for presupposed situations. Such robots can achieve restricted tasks properly in restricted environments. However, they cannot deal with unexpected situation approximately since they are equipped with only necessary and sufficient number of actuators and sensors.
A robots is a universal machine, ought to have adaptability, ability to estimate approximate control parameter and /or structure to achieve a given task in an environment. So as to have such adaptively against changes of task and environment, robots need to larger number of actuator and more variety. General control architecture for a less-DOF robot is shown. Sensor data I compared with a given task and am fed into the controller. Then the controller calculates the command for actuators. The may have ability if it can achieve several task t a time. In addition, the robots will be more adaptive if it can dynamically assign its degree of freedom to each task since this means the robot can estimate the parameter but also an appropriate control structure.

6. ADVANTAGES:
6.1. Simple in design
6.2. More feasibility.
6.3. Versatile application.
6.4. Flexibility in application.
6.5. Convenience.
6.6. Effectiveness.

7. DISCUSSION:-

We now summarize the four learning paradigms in Table 1 according to how they address the credit assignment problem. In the inductive learning paradigm, the temporal credit assignment problem is solved by the teacher. The structural credit assignment is solved using some function approximation algorithm, such as a decision tree or a neural net. The inductive learning paradigm does not specifically address the task credit assignment problem, although extensions such as the lifelong learning approach of learning domain invariance appear very promising. In explanation based learning, both the temporal and the structural credit assignment problem are solved by the teachers who provides both examples and a domain theory for safely generalizing the examples, One way to address the task credit assignment problem is through learning action models such as done in the EBNN algorithm. In the reinforcement learning paradigm, the temporal credit assignment problem is addressed through using an algorithm like Q-learning. The structural credit assignment problem is addressed through using some standard function approximate. The task credit assignment problem can be addressed by learning general action models. Finally, the evolutionary learning paradigm addresses the temporal credit assignment problem through using the bucket brigade algorithm. The structural credit assignment problem is addressed through the genetic operators for transforming policies_ whereas the classifier system framework deals with the problem of transferring learning across multiple tasks.
Table No: - 1

Paradigm
Temporal Structural Task
Induction Teacher Function approximates Invariance
EBL Teacher Domain Theory Action models
RL Q-Learning Function approximates Action models
EVILUTION Bucket brigade Genetics operator Classifier system


Table No: - 1
Comparing different robot learning paradigms based on how they address the credit assignment problem.
Robot learning is one of the most interesting and difficult machine learning problems. A good way to conclude this look at robot learning is to propose a challenge problem for robot learning. An excellent choice for such a challenge problem comes from the COG humanoid robot being developed at MIT. This system raises a number of very fundamental learning issues in integrating different modalities such as vision, action, language, and cognition. It is quite possible that in attempting such an integrated system_ we will uncover completely new machine learning paradigm.

No comments: